Re: huffman tree save to file




<xxxxyz@xxxxxx> wrote in message
news:1131369984.016163.268610@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Hi,
>
> Could you suggest different ways for saving huffman tree in the
> compressed file header?
>
well, as noted by the last few responses, there are multiple ways.

storing the code lengths. this is what deflate does (more so, it huffman
compresses the code lengths as well).
pros: smaller tree size.
cons: harder to implement in my experience, and if your file is of
non-trivial size, imo, the huffman tree size doesn't matter that much.

storing the huffman tree in a more raw form.
this is simpler to implement, essentially, as one just loads and saves the
tree...
it is my oppinion that, typically, the data is large enough that the table
size doesn't matter much.

for example, assuming 9 bits per code (typically needed, eg, if you want to
use part of the codespace for algo related uses). with 1 bit as a node/leaf
indicator, for 256 symbols you need 2815 bits (or about 352 bytes), 512
symbols would need about 5631 bits (or 704 bytes).

if your entire compressed file is only a few kB, then, yes, this is a
problem. if your compressed file is, eg, like 100kB or more, imo, it doesn't
matter so much.


then again, one can also use arithmetic coding.
a paq-style coder is fairly easy to get working and can be decently fast
(ok, but significantly slower than huffman if some optimizations are used).
advantages are that it gets better compression, and is in my experience a
lot simpler to work with. messing with the huffman algo to write different
compressors is imo a pita (endlessly needing to jerk off the coding at the
bit-level, ...).

as a cost, it trudges along at a few MB/s, wheras with huffman one can get
decode speeds close to the 100MB/s mark on my computer (deflate actually
manages to go faster, but I suspect this is because lz77 offers slight "make
it faster" properties).


recently as a misc effort I made another coder that was able to pull off
decode speeds exceeding those of huffman (around 150-170MB/s on my
computer), but, as a cost, the compression ratio was rather poor.

> Thanks.
>


.



Relevant Pages

  • Re: Suggest a lossless highly efficient RLE library
    ... A while back when i first started getting interested in compression I ... these statistics created a huffman tree. ... It can be found with source code here: ...
    (comp.compression)
  • Next set of vetting
    ... Harrington's Compression Method, ... The First Fundamental is the Huffman. ... Pascals triangle has applications in many many different fields. ... possible outcomes of that many bits, ...
    (comp.compression)
  • Re: Next set of vetting
    ... design a model for a data source that fits to that. ... In typical compression the usage ... You forget here that the input of the huffman must be "out of balance" ... you will find that the result is in balance. ...
    (comp.compression)
  • Re: Huffman versus arithmetic compress for fixed static compression
    ... Huffman performance. ... but the arithmetic coder (still overlooking possible implementation ... huffman for this kind of file compression. ... token for decompression. ...
    (comp.compression)
  • Re: Sorry for the irritation
    ...  Yet the message got no response. ... Yet there's plenty of time here to debate Perpetual Compression! ... is much more flexible than Huffman, but why not use Huffman, ... arithmetic coding here for optimal results. ...
    (comp.compression)