Re: Pacbyte data compression patent
- From: "Matt Mahoney" <matmahoney@xxxxxxxxx>
- Date: 5 Sep 2006 06:30:40 -0700
boo@xxxxxxxxxxxxxxxx wrote:
Matt Mahoney wrote:
This is yet another patent on random data compression.
It claims to work like this: divide the input into 300 byte segments.
Assume that each of the 256 possible byte values occurs at least once.
Record the positions of the first occurence of each value. This takes
ceil(lg(300 choose 256)) = 178 bits. Also, record the order in which
these values occur. This takes ceil(lg(256!)) = 1684 bits for a total
of 1862 bits, which is less than the 2400 original bits.
I looked at the patent. First I should say I have very little
experience looking at patents. I found the purposely obtuse
description annoying. The patent is now nearly two
years old and since there's yet to be an implementation
that I'm aware of I've got to assume it's flawed.
At first glance it seems like an interesting idea. Purposely
'massaging' the data being compressed in hopes being
able to use a compact descriptor.
I don't see how it could work though. The desciptor layout
does indeed consume less bits than what it replaces, but
I don't see how it could be used to reconstruct the orginal
300 bytes. What about the information for the redundant
40 bytes?
It is possible that the 300 input bytes do not have all 256 different
values. In that case, apply one of 64K transforms by adding (mod 256)
a random 300 byte mask from a library until this requirement is met.
This adds 16 bits to record which mask was chosen, for a total 1878
bits. If none of these masks results in 256 unique values, then try
variations of this method on segments of 152 7-bit sequences or 77
6-bit sequences.
Assuming this works the process may be very slow. And I
don't see how this would gurantee getting the needed # of
unique values. The patent also claims a range of compression
ratios ("savings") depending on the number of bytes in the string.
Apparently 300 bytes was about optimal when using 8-bit (byte)
quantities, but it could vary up to 377 (?) bytes with a greatly
reduced compression ratio (more bits consumed by decsciption
index). The patent also states that higher compression ratios can
be obtained using greater than 8-bit quanities (9, 10... whatever).
Since this method works on arbitrary data, it can be applied
recursively (or so the patent claims).
I don't see how this could be done recursively.
The flaw in this method seems pretty obvious, but it has apparently
escaped the notice of patent examiners during the 2 years it has been
on file so far.
What is the flaw if you don't mind me asking?
Eric B
One obvious flaw, as you already pointed out, is that there is no way
to reconstruct the 44 non-unique bytes.
The other (less obvious) flaw is that the scheme requires each of the
256 possible byte values to occur at least once in the 300 byte
sequence. The number of such sequences is (300!*256^44)/44! out of a
possible 256^300 sequences, or one in 2.806 * 10^56. So applying one
of 64K transforms is unlikely to put the input in the right form. You
would need 187.5 bits to encode enough transforms to guarantee the
required form for all possible inputs. When you add this plus the 44
bytes, you end up with exactly the same size as you started with, 300
bytes.
The "obvious" flaw is that any random or recursive compression scheme
violates the counting argument. You can't compress all n-bit strings
because there are 2^n of them, but only 2^n - 1 possible compressed
codes shorter than n bits. Yet it is amazing how many people there are
who can't understand this simple proof. Invariably, they have a weak
understanding of mathematics. Just like the perpetual motion designers
of the last century who didn't think conservation of energy applied to
them too, they build incredibly complicated mechanisms, making it
harder and harder for themselves and others to find the inevitable
error buried in their logic. People will spend years of effort in this
fruitless approach, forming companies, hiring programmers, working 16
hour days, filing patents, etc. and the result is always the same.
Such a waste...
-- Matt Mahoney
.
- Follow-Ups:
- Re: Pacbyte data compression patent
- From: boo
- Re: Pacbyte data compression patent
- From: jacko
- Re: Pacbyte data compression patent
- References:
- Pacbyte data compression patent
- From: Sportman
- Re: Pacbyte data compression patent
- From: boo
- Pacbyte data compression patent
- Prev by Date: partly OT: hypothetical video streaming
- Next by Date: Re: FAQ Maintenance?
- Previous by thread: Re: Pacbyte data compression patent
- Next by thread: Re: Pacbyte data compression patent
- Index(es):
Relevant Pages
|
|