Re: Pacbyte data compression patent



Matt Mahoney wrote:


The other (less obvious) flaw is that the scheme requires each of the
256 possible byte values to occur at least once in the 300 byte
sequence. The number of such sequences is (300!*256^44)/44! out of a
possible 256^300 sequences, or one in 2.806 * 10^56. So applying one
of 64K transforms is unlikely to put the input in the right form. You
would need 187.5 bits to encode enough transforms to guarantee the
required form for all possible inputs. When you add this plus the 44
bytes, you end up with exactly the same size as you started with, 300
bytes.

Yes, that was my intuition also. And even if it were possible
to guarantee getting the needed number of unique entries
after 2^16 attempts it would be a glacial process to do this
for each sequence of bytes.


The "obvious" flaw is that any random or recursive compression scheme
violates the counting argument. You can't compress all n-bit strings
because there are 2^n of them, but only 2^n - 1 possible compressed
codes shorter than n bits. Yet it is amazing how many people there are
who can't understand this simple proof. Invariably, they have a weak
understanding of mathematics. Just like the perpetual motion designers
of the last century who didn't think conservation of energy applied to

That's an excellent, succinct description of why no lossless
compression algorithm can be guaranteed to compress all strings.
I'd heard this before, but your second sentence sums it up well.
It's so obvious that I feel stoopid for not realizing it :(

Cheers, Eric

.



Relevant Pages

  • Re: What is the shortest bit sequence covering all 256 byte values?
    ... > How do you envision compression using such a bit sequence? ... compression is and what are the data patterns behind the electronic chips ... multiple bit 'window' over a bit sequence and as RLE? ... >) scheme which uses this kind of approach, ...
    (comp.compression)
  • =?iso-8859-1?q?Re:_Kolmorgorov_Complexity_and_Kim_=D8yhus?=
    ... >>Sure a protein string could be generated from by organic Turing machine ... >>For example, a proteins sequence, like a lactase sequence, could indeed ... > proper compression. ... >>The notion of randomness is dependent upon chaos. ...
    (talk.origins)
  • Re: =?iso-8859-1?q?Re:_Kolmorgorov_Complexity_and_Kim_=D8yhus?=
    ... There is no compression possible, ... compressed expression is shorter than the expressed string), ... makes no sense to ask whether the DNA-base sequence is shorter than the ... > language system code needed to compress and decompress the sequence. ...
    (talk.origins)
  • Re: 100 megaton bombs atop Saturn V rockets
    ... >>since it could be made to work without compression. ... Super" the scheme for a self-sustaining combustion wave in uncompressed ... > Probably not, a) because fusion fuel is pretty light, and b) because ... cryotank of liquid deuterium, like the Saturn upper stage or the Shuttle ...
    (sci.space.history)
  • Re: 100 megaton bombs atop Saturn V rockets
    ... >>since it could be made to work without compression. ... Super" the scheme for a self-sustaining combustion wave in uncompressed ... > Probably not, a) because fusion fuel is pretty light, and b) because ... cryotank of liquid deuterium, like the Saturn upper stage or the Shuttle ...
    (sci.space.policy)