Re: Random Data compression by value question.



Ernst wrote:
> I've been away.
>
>
> Has anyone succeeded in compressing Million digits yet?

Not yet as far as I know. I haven't worked on it since I previously
posted on the subject. I am aware of at least 50 bits of redundancy,
in that when the digits are arranged on 20,000 punched cards with 50
digits each, the sums of the digits in each of the 50 columns are even.
This is a consequence of the method used to remove the biases from the
hardware generator by adding adjacent cards modulo 10.

There is also a very weak correlation between adjacent cards, and
between cards with one intervening card (but not 2 or more). I believe
the data would be compressible if it were possible to recover the
original biased data, which would be possible by guessing one card,
then alternately adding and subtracting the other cards. This would be
easy to do because each of the digits on the guessed card could be
solved independently.

Unfortunately, I believe (since it isn't documented) the cards were
shuffled before being published, so this method won't work. If the
original order was maintained, then when you alternately add and
subtract cards the 50 sums should all be 0 mod 10, but they are not.

If the shuffle were random, then compression would not be possible even
if the permutation could be discovered, because the entropy of the
permutation (log 20000!) would almost certainly exceed the redundancy
in the hardware bias. However, I don't believe the shuffle was
thorough because if it was there would not be any correlation to the
first and second adjacent cards. This correlation means that many
cards which were originally adjacent were moved at most one card apart,
which typically happens when you shuffle a deck of cards only once or
twice.

This correlation by itself is not very useful for compression because
it is extremely weak and only provides a few bits of entropy.

-- Matt Mahoney

.



Relevant Pages

  • Re: Random Data compression by value question.
    ... >> Has anyone succeeded in compressing Million digits yet? ... > hardware generator by adding adjacent cards modulo 10. ... > If the shuffle were random, then compression would not be possible even ... I don't believe the shuffle was ...
    (comp.compression)
  • Re: Random Data compression by value question.
    ... >> digits each, the sums of the digits in each of the 50 columns are even. ... >> hardware generator by adding adjacent cards modulo 10. ... >> If the shuffle were random, then compression would not be possible even ... example, fpaq0, a simple order 0 coder, compresses the decimal file to ...
    (comp.compression)
  • Re: Enigma 1449 - All in order
    ... I wrote six non-zero consecutive digits ... on separate cards and gave two cards ... solutions work for the first option. ...
    (rec.puzzles)
  • Re: your assistance is requested
    ... with AES 256 would look nicely random, ... compressing the heck out of it, the same trick is going to be a lot ... Briefly, they generated 1 million random digits on 20,000 punched ... subtracting adjacent cards modulo 10, creating 20,000 new cards. ...
    (comp.compression)
  • Re: Either there is a God of infinite intelligence
    ... creation event and plenty that contradicts it. ... >The DNA code refers to mathematics and probability regarding evolution. ... out the cards in their new order, ... the odds of even one shuffle ending up being the same as ...
    (talk.origins)