Re: Effective Lossy Compression of Bitstream



Arsène Lupin wrote:
| Hi, Thomas.
|
| Simplifying, I have a bitstring of ones and zeros, which the
| probability of a one is about 30%.
| I want to compress lossy this bitstring, and the process can erase
| some 1's, but it cannot add any ones.
|
|> IOW, you want to generate a sequence such that the conditional
|> entropy of the generated sequence given the original is maximal, and
|> the entropy of the generated sequence is bounded by a threshold?
|
| IOW, yes. I wanted a function to "transform" the original sequence, in
| a way that the new sequence could be better compressed, while
| retaining as much reliability (as I measure by the relation of the
| number of ones in the original bitstring and the compressed one) as
| possible.
|
| As an example, if the most ones follow a exponential function (ex:
| x^2), then I could transmit the function to recreate the sequence. Of
| course maybe not all '1's of the original could be recovered, but if
| the size to transmit the function is small, then the gain would
| justify it's use.
|
| Really sorry for not making myself much clear, to me it seemed an easy
| problem.
|
| Arsène.

Okay, now to get to the core of your desire.
How many bits per chunk of data stream? Is it variable in size or not
counted as important?

You state that the odds of "1's" appearing is 30% and no "1's" can be
added (per individual data chunk or for the meta-bit location of your data
stream?).
You also want parity on the number of "1's" in the uncompressed and
compressed streams.

To me, it sounds basically like you want an error-correction routine
with an emphasis on bit limiting per chunk.
The downside is that most error-correcting codes ADD DATA to the stream.
My basic suggestion is to think your problem through a bit more.

I could go on about some useful probability coding schemes, but without
more information about what you consider "acceptable noise loss", I would
just be wasting my time. If there are specific bits in your data chunks you
can throw away, then do it (while counting the "1's") and generate a random
data (30% "1's") set to replace it upon inaccurate data set recreation with
a limitation of using only the number of "1" bits that equal your prestored
counted amount.

There are ways to measure "noise density" to gauge an even higher
probability of recreating an imitation of your original data set, but that
also requires a USENET page or two of information to go over the basics.
Heck, I could even toss out some pseudo-random coding techniques to give you
an accurate high-parity match of your original data stream, but without
knowing what you can toss, why you are coding it (polling data looks like a
high match for your data set as a guess), and what you consider "noise" in a
more accurate manner then there is simply no point in throwing time and
energy at solving your problems for my own unpaid amusement.

Basic Error-Correcting Code info
http://en.wikipedia.org/wiki/Error_detection_and_correction

If your data stream is fairly consistent, then difference coding between
nearly identical chunks would be optimal.
If your data stream is fairly random, then there is a potential bonus to
code by relative locational ID's (if the data chunk is identical to the one
that is back 16 chunks ago, then just point to it).
If your data has a tendency to be identical for one bit location, then
it might be more optimal to code each bit location in your chunks as an
individual stream which is then Hauffman or Aritmitic-Coded using standard
compression routines.


.



Relevant Pages

  • Re: search for hex characters in a binary file and remove them
    ... I need to look for a sequence of hex characters in a binary file and ... The file could be slurped into a buffer then checked with a regular expression ... or it could be read in a chunk at a time, checked, then the chunk rolled out ... of the buffer minus the width of the sequence plus 18 bytes. ...
    (comp.lang.perl.misc)
  • Re: How to compress data into fixed chunks?
    ... I am trying to compress data into fixed chunks with chunk size equal ... to 1MB using deflate algorithm, ...
    (comp.compression)
  • Re: Partitioned decompression
    ... clients in a P2P way, so, clients requests chunks of this file to the ... this chunk is requested by other peer, ... and compress them for transmission. ...
    (comp.compression)
  • Re: Attempting to parse free-form ANSI text.
    ... What I want to do is have all ANSI sequences _removed_ from the output, ... control sequence, I want to basically turn it into a call to wxWidgets' ... But looking at your program with a candid mind I'd say that it is written to process a chunk of data in memory. ...
    (comp.lang.python)
  • deflate fastest in 32K chunks?
    ... I have noticed that, to compress a block of 1MB data, it is faster to ... do it in 32KB chunks than in a 1MB chunk. ... deflateInit(), deflateand deflateEndfunctions to ... main concern (as long as I still get decent compression ratio). ...
    (comp.compression)