Re: minor annoyance (Re: large dictionaries)
- From: "cr88192" <cr88192@xxxxxxxxxxxxxxxxxx>
- Date: Tue, 15 Aug 2006 15:37:25 +1000
"moogie" <budgetanime@xxxxxxxxxxxxxx> wrote in message
news:1155604551.801239.239510@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dont let the lack of comments discourage you... I would comment however
I do not have the experience or skill set to give intelligent replies i
am afraid.
yeah.
just figured though, maybe someone would have interesting comments.
I have found it is lame though. my algo (with current settings) only
sometimes gets better compression ratios than bzip2. it seems to depend a
lot on the file in question. english text is not so good, but a large
tarball of sourcecode is better it seems.
actually, wrt ratios, I suspect english text may demand more emphasis on
matches in the 3-6 byte range than on longer ones. likewise, since context
is much less important, bzip2's context loss is no big deal, and gzips small
context isn't much of a problem (basically, both gzip and bzip2 get fairly
similar file output sizes, and mine falls between them, but does so taking
the longest of all of them...).
the bitstream itself at least holds some promise, being itself only a
minorly modified version of deflate (and thus being naturally more generic
than my ability to write a good implementation of it...).
the only strong advantage I can imagine right now (vs, eg, bzip2) is that of
flexibility and at least partial backwards compatibility with deflate...
conformance may be a little looser than deflate:
conformance will demand at least complete support of deflate and deflate64;
an implementation would also be required to handle anything "reasonable".
eg:
a decoder may reject input if it has unreasonable values, such as some huge
window size (like 1GB or something...), unrecognized block types, ...
a few tweaks to the string lookup for english text:
ok, only minor improvement for a larger speed drop.
probably going to make these settings command line tunable...
maybe I will make basic combinations of settings available with a -1 to -9
scheme...
-0: uncompressed blocks
-1: fixed code blocks (32kB window)
-2: dynamic blocks (32kB window, tries for good ratios though)
-3: deflate64 style
-4: 256kB window, shallow searches
-5: 256kB window, deeper searches
-6: 1M window, shallow
-7: 1M window, deep
-8: 4M window, shallow
-9: 4M window, deep
options could be provided for more find control though, or possibly several
sets of modes...
actually, at this rate I may as well make it an actual tool (dumping the
compressed data to a file, ...).
then again, if I dump to a file, I would need to come up with a header
structure (rather than just dumping a raw stream).
this could probably just be a glorified magic, and maybe a few other fields,
eg:
{
byte magic[4]="BFZ\0"; //BFZ meaning 'BigFlate-Zip'.
byte flag;
byte cm; //method, 1
byte pad[2];
u32 crc32; //input checksum
u32 isize; //uncompressed size
}
or such...
or, I could use the gzip format, but would then produce files not decodable
by gzip, so that would be lame (and this is an experimental tool
anyways...).
me thinking:
encoding/decoding data in something like the gzip format would require me to
write a stream-able implementation (the current encoder/decoder doesn't deal
with streams, only buffers).
most half-assed though would be to implement a streamable enocder/decoder
that uses callbacks to simulate tweaked out file-io (vs. the approach used
in zlib involving contexts and buffers, which though more flexible, would
require more effort).
no big deal though...
or something...
.
- References:
- large dictionaries
- From: cr88192
- minor annoyance (Re: large dictionaries)
- From: cr88192
- Re: minor annoyance (Re: large dictionaries)
- From: moogie
- large dictionaries
- Prev by Date: Re: compression to be included in open hardware
- Next by Date: MPEG-4 Profile?
- Previous by thread: Re: minor annoyance (Re: large dictionaries)
- Next by thread: licensing LZO
- Index(es):
Relevant Pages
|
Loading