Re: LZMA parameters
- From: Malcolm Taylor <me@xxxxxx>
- Date: Thu, 17 Nov 2005 16:06:14 +1300
Hi Nicolas,
-a{N} Compression mode [0, 2] : 0 is fast, 2 compresses better. Can't find if it has something to do with a number of passes, which I doubt.
Don't know, and am curious myself :)
-fb{N}Fast bytes [5, 255] or maybe [5, 272] : According to Igor Pavlov himself on the SF forum, "Match finder checks only this number of bytes". I don't think this has anything to do with windowing, but maybe with the upper limit of the size of a match. The bigger the better, whatever the file is, according to my tests. It also slows down the compression but I don't care about that.
This one is hard to explain... To my knowledge (please correct me if I am wrong), this refers to the optimal parsing algorithm. The algorithm tries many different combinations of matches to find the best one. If a match is found that is over the fb value, then it will not be optimised, and will just be used straight.
This speeds up corner cases such as pic.
-lc{N}Literal Context bits [0, 8] : Defaulted to 3, it is documented that "sometimes lc=4 gives gain for big files". I'm still wondering what that is and what a bigger value would mean.
The context for the literal coder is 2^(lc) long. The longer it is, the better the statistics, but also the slower it adapts. A tradeoff, which is why 3 or 4 is reccommended.
-lp{N} Literal Pos bits [0, 4] : Looks like it's intended for adjusting to periodical data with a 2^N byte period. Seems clear that for example 32 bits aligned data should take advantage of "-lp2". By default set to 0, it's said that any other value should be used with "-lc0". I'm still wondering why.
This allows you to add some bits from the position to the literal context. Again, it will reduce the adaption speed of the literal model, and so most of the time will hurt compression.
-pb{N} Pos Bits [0, 4] : Said to be "intended for periodical data when period is equal 2^N". What about "literal" position bits then ? Have "-pb" and "-lp" always to be the same ?
IIRC this refers to the match flag encoding for MRU matches, allowing them to have a few bits of the position as context. Again, only useful in very periodic files.
-mf{MF_ID}Match Finder : The match finder algorithm seems to be the next most important setting after the dictionary size. When dealing with small files, binary trees seem to be enough. Patricia trees maybe are better
IIRC, in theory, a binary tree or a patricia tree should be perfect (ie. find all possible matches). However in practice, the implementations probably have limits to depth in order to avoid worst case performance (eg. pic), and so the various match finders will perform differently in different situations.
A general rule though:
Pick a hash table if using the fastest, unoptimised parsing mode (perhaps this is -a0??), as this will help you get a faster result.
Otherwise pick bt4 or similar as the binary tree tends to give the best speed results over a wide range of files.
Hash tables are useless for optimal parsing, because they do not provide many options to the optimiser. They also have bad worst case behaviour when matching to every byte.
Malcolm .
- Follow-Ups:
- Re: LZMA parameters
- From: Ignorant
- Re: LZMA parameters
- References:
- LZMA parameters
- From: Nicolas \"Nescafe\" Le Gland
- LZMA parameters
- Prev by Date: LZMA parameters
- Next by Date: BWC question for Willem
- Previous by thread: LZMA parameters
- Next by thread: Re: LZMA parameters
- Index(es):