Re: misc idea: Mini-LZ




"LR" <sfsd@xxxxxxxx> wrote in message news:46e7f829$0$6273$edfadb0f@xxxxxxxxxxxxxxxxxxxxxxx
however, a recent idea that I have spec'ed out is something I am calling 'Mini-LZ'.
point, well, is that of being much more readily usable in a copy-paste manner than deflate.

Nice and simple, thanks :)

What license are you publicing this code under? Public Domain?


probably...
not that it was any real effort in my part, only slow due to the major time issue caused by having 4 classes giving me too much homework...

for right now, I am limited to activities that can be completed absent using nearly as much of my freetime...


be warned though that this code is completely untested thus far...


To make it thread safe, move static byte *chain[65536], *hash[4096]; to public scope and make it non-static.


moving them to the toplevel will not help.
making this thread safe would either require them to be on the stack (not good for arrays of this size), dynamically allocated (yes, good old malloc), or protected with mutexes.

now, in my case, I rarely use threads, so I didn't think of this at the time...

note that my style also assumes tweaking the code for each use case, which is something that LZ77 or LZSS based algos allow a little better than, say, LZ78 or LZW, LZP, ... which tend to be a little less flexible (LZW is restrictive on how one matches strings in the dictionary, and LZP assumes a particular hash function and means of handling hash chains).

note, however, that an LZP style coder can be much simpler and smaller than an LZSS style one, but is much less general...
in the past I have used specialized LZP style coders for tasks such as compressing strings within certain specialized fileformats, ...


well, it can also be noted that the compressor would not be terribly efficient (higher efficiency necessarily demanding a little more code...).

it may be worthwhile to pay the costs and have a seperate function for string lookups...
(this function could be static to lessen its overall impact).

note that this format gains a little more generality (can handle binary data, UTF-8, ...), at the cost of being slightly less efficient for ascii. a slightly more efficient and simpler algo could be possible and specialized for ascii (non-rotating valuespace subdivision), or it may not matter...

or such...


Lasse


.