Re: Random Data compression by value question.
- From: "Matt Mahoney" <matmahoney@xxxxxxxxx>
- Date: 18 Sep 2005 17:26:06 -0700
Ernst wrote:
> I've been away.
>
>
> Has anyone succeeded in compressing Million digits yet?
Not yet as far as I know. I haven't worked on it since I previously
posted on the subject. I am aware of at least 50 bits of redundancy,
in that when the digits are arranged on 20,000 punched cards with 50
digits each, the sums of the digits in each of the 50 columns are even.
This is a consequence of the method used to remove the biases from the
hardware generator by adding adjacent cards modulo 10.
There is also a very weak correlation between adjacent cards, and
between cards with one intervening card (but not 2 or more). I believe
the data would be compressible if it were possible to recover the
original biased data, which would be possible by guessing one card,
then alternately adding and subtracting the other cards. This would be
easy to do because each of the digits on the guessed card could be
solved independently.
Unfortunately, I believe (since it isn't documented) the cards were
shuffled before being published, so this method won't work. If the
original order was maintained, then when you alternately add and
subtract cards the 50 sums should all be 0 mod 10, but they are not.
If the shuffle were random, then compression would not be possible even
if the permutation could be discovered, because the entropy of the
permutation (log 20000!) would almost certainly exceed the redundancy
in the hardware bias. However, I don't believe the shuffle was
thorough because if it was there would not be any correlation to the
first and second adjacent cards. This correlation means that many
cards which were originally adjacent were moved at most one card apart,
which typically happens when you shuffle a deck of cards only once or
twice.
This correlation by itself is not very useful for compression because
it is extremely weak and only provides a few bits of entropy.
-- Matt Mahoney
.
- Follow-Ups:
- Re: Random Data compression by value question.
- From: Ernst
- Re: Random Data compression by value question.
- References:
- Random Data compression by value question.
- From: Ernst
- Random Data compression by value question.
- Prev by Date: Random Data compression by value question.
- Next by Date: Highest Data Compression ratio
- Previous by thread: Random Data compression by value question.
- Next by thread: Re: Random Data compression by value question.
- Index(es):
Relevant Pages
|
|