[OmniOS-discuss] multithreaded gzip (or equivalent) and moving some files while preserving file trees

Valrhona valrhona at gmail.com
Sat Aug 17 23:05:40 UTC 2013


> Lzop uses a completely different compression format.  Its default
> compression is a bit less compression than 'gzip -3' but it is much more CPU
> efficient so it is able to achieve "wire speed" level compression rates on
> modern CPUs, and without relying on threading. There are some other
> compressors which are even faster than lzop but their compression formats
> are often not stable.
>
> A problem with most threaded compressors is that more effective compression
> algorithms (e.g. lzma as used by 'xz' and 'lzip') require very large data
> chunk sizes so that the compression algorithm works effectively.  This means
> that the input data size needs to be large (hundreds of MB or even
> gigabytes) in order for the multi-thread chunk size to be large enough.  Zfs
> send streams or tar/cpio archives of large directory trees are likely to be
> large enough but many/most ordinary files don't qualify.  As a result, using
> threading may result in much less compression.

Thanks for the insights. I basically had a structural question: most
compression I know if analyzes the whole file, and then
algorithmically looks for redundancies and patterns that can be
compressed. For streaming data, obviously how much of a chunk to
analyze makes a huge difference. I think probably the most common is
video data, but the nice thing about that is that, roughly speaking,
most frames are similar to adjacent ones, so the window in time that a
video compressor must look can actually be pretty small, and therefore
efficient. For a ZFS stream, this seems impractical, especially for
terabyte-sized chunks. There isn't any reason to believe that the
compressor will be able to find similar regions, depending on file
layout. So it's not clear to me how efficient, from an architectural
standpoint, threading is, depending on how it is implemented.

Do you know, from an architectural standpoint, how LZOP differs from
the others have been mentioned, such that there would be a fundamental
reason to prefer one format over the other for ZFS stream data?

Thanks!

Peter


More information about the OmniOS-discuss mailing list