Re: DEFLATE block size algorithms
- From: Mark Adler <madler@xxxxxxxxxxxxxxxxxx>
- Date: Tue, 26 Feb 2008 09:19:29 -0800 (PST)
On Feb 26, 5:38 am, jdgrii8...@xxxxxxxxx wrote:
What method is currently used in zlib and 7-Zip to determine the size
of each data block in the DEFLATE stream?
zlib simply has a fixed number of symbols per block, defined by the
memLevel parameter. The default is 16K symbols. (A symbol is either
a literal or a length/distance pair.) The current version of zlib
does not attempt to adjust the block size as a function of the data.
Note that gzip is different, and did/does attempt to intelligently
vary the block size, but the algorithm used turned out to have little
benefit, and so zlib went to the fixed block size approach.
In general if the statistics of the data are relatively static, larger
blocks are better since the code description overhead is amortized
over more symbols. If the statistics of the data are more dynamic,
smaller blocks are better, since the codes can adapt more readily to
the changing nature of the data. Obviously there is a balance to be
struck there that is highly data dependent.
If the user happens to know that the data to compress is about to
change in character, then you can tell zlib to emit a block and start
a new one by flushing (Z_SYNC_FLUSH).
Your approach to combine small adjacent blocks into larger ones when
beneficial is a good one, but will be a little time consuming.
Especially if you're only using Huffman coding, in which case most of
the time will be spent fiddling with blocks. It would be interesting
to see the gain achievable and the cost in execution time for image
data.
Mark
.
- References:
- DEFLATE block size algorithms
- From: jdgrii8338
- DEFLATE block size algorithms
- Prev by Date: Re: DEFLATE block size algorithms
- Next by Date: compression challenge
- Previous by thread: Re: DEFLATE block size algorithms
- Next by thread: Re: DEFLATE block size algorithms
- Index(es):
Relevant Pages
|
|