Re: LZ77 optimal test set available? Best parsing algorithm?
- From: "cr88192" <cr88192@xxxxxxxxxxxxxxxxxx>
- Date: Sat, 26 Aug 2006 13:28:24 +1000
"Ignorant" <shunya@xxxxxxxx> wrote in message
news:1156513816.367235.294770@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Nothing like a lz77 with lazy search for token parsing of input data ,
and a range coder
for representing the tokens from lz77. The huffman codec [ a simple
variant of arithmetic coding] is faster for a static collection of
tokens .[ As in the zip algorithm].
I.i know of languages without bit shift operators. I dont know of a
saleable processor
without bit shifting operations.
In theory there is no single best algorithm applicable over unknown
information content of data to be compressed.
Mahesh Naik
well, LZ77+Huffman, though fairly fast (for decompression) is not
necissarily as fast for compression.
the difficulty wrt compressor speed is largely "how fast can you look up
strings", which matters more with larger window sizes (and attempting to
pull off better compression).
now, there are a few possible ways I can come up with to try to optimize the
compressor:
test, adjust, and retest (what I have been doing already);
gathering statistics computationally, but adjusting and testing myself (can
be done either dynamically or as part of the development process);
using a genetic algorithm (likely by far the most involved, and probably
overkill, but may be able to adjust the settings a lot more effectively than
I could, and also give me an excuse to fiddle with basic forms of this
approach).
as far as eliminating bitwise operations. usually bytewise LZ77 is an
option. some of my older (bytewise) coders had usually used part of the
codespace for literals, and part for matches. there was no really good
option though, as problems came up, and there were drawbacks to various
solutions.
eg:
0x00..0xBF can be used for raw bytes, 0xC0 can be an end-of-stream marker,
0xC1 an escape code, and oxC2..FF for runs.
0x00..0xBF can be literal: good for text, not as good for other kinds of
data.
a rotating list or similar: better for more general data, but not as fast.
....
in some coders, where I expected the data to be more or less exclusively
text, I used 0x00..0x7F for literals, and 0x80..0xFF for matches.
even all this may require shifts and masks.
also possible may be:
0x00..0x7F: raw literal
0x80..0xBF: escapes
0xC0..0xFF: matches
0x80..0xBF actually takes 2 bytes, but encodes an escaped byte followed by a
non-escaped byte. 14 bits are used for data, the first coded byte having an
implied MSB of 1, and the next an implied MSB of 0.
so, a single escaped byte does not imply any additional space overhead, but
2 does.
0xC0 could be an end-of-stream marker, and 0xC1 escapes a single byte, and
0xC2 encodes 2 escaped bytes (saving 1 byte in this case).
0xC3..0xFF imply runs, and this is followed by an offset.
additionally, one can be more conservative, and use a special escape code
for runs themselves (runs are more expensive, but there is less risk of
overhead in the case of poorly compressible data).
eg: 0x00..0x7F, and 0x82..0xFF are as is.
0x80=end of stream
0x81=literal escape code (if next byte is 0 or 1), otherwise, it escapes a
run.
otherwise, here we could have a shorter run (MSB=0):
16 bit word, with the high 5 bits giving the length-3+2, and the low 10 an
offset (this one costs 3 bytes);
if the MSB=1, the next 8 bits give a length and the next 15 an offset (this
one costs 4).
this implies an implicit/definitional high-low ordering.
an LZSS like bit-packing approach is possible, but is imo an exra hassle to
implement (and I hadn't generally gotten it to be as fast as my approaches,
which had often been based on "masking and jumping").
....
.
- References:
- LZ77 optimal test set available? Best parsing algorithm?
- From: Jim Leonard
- Re: LZ77 optimal test set available? Best parsing algorithm?
- From: Pasi Ojala
- Re: LZ77 optimal test set available? Best parsing algorithm?
- From: cr88192
- Re: LZ77 optimal test set available? Best parsing algorithm?
- From: Ignorant
- LZ77 optimal test set available? Best parsing algorithm?
- Prev by Date: Re: LZ77 optimal test set available? Best parsing algorithm?
- Next by Date: Re: LZ77 optimal test set available? Best parsing algorithm?
- Previous by thread: Re: LZ77 optimal test set available? Best parsing algorithm?
- Next by thread: Re: LZ77 optimal test set available? Best parsing algorithm?
- Index(es):
Relevant Pages
|
|