Re: Compression in block-oriented data



Any pointers as to what else I should be doing / looking at?

BWT may be good, albeit I am not sure how optimal in general it is for
floating point data.

if the key payloads are stored in blocks of some sort (for example, B-Tree
leaves), it may be worthwhile to consider attempting to compress the blocks.

Key payloads will likely be stored in streams on top of blocks. There
will likely be two types of streams - byte-oriented, and record-
oriented.

Record oriented streams will be fixed size records. Byte-oriented
streams will be arbitrary length strings of bytes.

The streams will be addressed (pointed to) by the key structure, but
will not actually be contained in the B+-Tree itself.

in any case, deflate/zlib may still be a worthwhile option, just applied to
larger blocks rather than individual key values (a few kB is usually good,
and up to 32 or 64 kB should work well, but much more than this will not
really offer much improvement in terms of compression).


We're not sure exactly how large the records / streams will be. It
would be acceptable for compression to perform poorly if the caller of
the library we're building turns compression on for data that
shouldn't be compressed, I think. (the caller will be us).

however, I am not sure on the specifics (I have usually implemented trees
where the payload is stored in the leaves themselves, so I am not really
sure how you have structured your trees...).


We have structured our trees to be separate from the data streams
because our callers will need to be able to 'navigate' the tree
without looking at the data. For this reason, putting the tree
'nodes' all in close proximity to each other was more important than
having the data 'local' to the tree nodes. You can think of our
'database' as a set of keys with a single BLOB in each one. Our keys
are stored in the trees, the BLOBs are stored elsewhere in the file.
.



Relevant Pages

  • Re: Compression in block-oriented data
    ... Key payloads will likely be stored in streams on top of blocks. ... really offer much improvement in terms of compression). ... We have structured our trees to be separate from the data streams ...
    (comp.compression)
  • Re: Repeatable compression is possible and easy to do, heres how...
    ... You *may* find that todays accepted natural laws are ... I'm not saying this reasoning can be applied to data compression and ... different plane, new presentation, higher potentials for pattern ... in streams, so what if the potential is 30% files compressed at 463 ...
    (comp.compression)
  • Re: Repeatable compression is possible and easy to do, heres how...
    ... Establish three streams of pseudo-random data, ... love with disallows not only repeatable compression but regular ... values for a b c client. ...
    (comp.compression)
  • Re: zlib interface semi-broken
    ... implement one-shot de/compression of strings. ... functions that do one-shot compression and decompression. ... compression/decompression API for streams, a layer (potentially ... exceptions, much like the file class can throw EOFError. ...
    (comp.lang.python)