Re: Theoretical Limits for Compression Algorithms and Random Sequences



Arsène Lupin <detentor@xxxxxxxxx> writes:
Good night, people!

It's proven by math that you can't find the 'optimal size' for a given
file, in a sense that an 'optimal size' is the file that can't be
further compressed. That file is it's Kolmogorov Complexity (also such
'optimal file' would be a random sequence).

I'd not call them 'optimal', as in the vast majority of cases
such files would be larger than the original. The optimal
thing to do universally is to just leave all files alone,
without even the option (which requires 1 extra bit) of being
able to compress them.

It's also pretty widespread that the counting argument proves that
there isn't a compression algorithm that provides compression for any
file.

Better to say "for all files", as "for any file" might invite
people to select one single file for you and say "but I can
compress that one".

The question is:

If our algorithm just compresses a few range of the all possible
files, can't we use our algorithm to 'fit' the range where the random
sequences lies? With that in mind, our algorithm would be able to
compress the theoric "uncompressed" sequences?

The "random" sequence lies over the majority of the input space.
All compression functions only make a tiny fraction of the input
space smaller.
(a) You can't perform any reversible mapping that would fit vast
majority inside tiny fraction.
(b) You'd still need to remember what that reversible mapping
was, which makes the tiny fraction that you can use vastly
smaller.

You forget - compression functions don't on the whole compress.

Compression functions are expansion functions which have a small
but useful set of failure cases.


When the discussion about random sequences arises, we still can't
prove that a given sequence is random or not. But is known some
properties of random sequences:

There is no such thing as a "random sequence". There are sequences
from random sources.

1 - Random sequences have the equal frequencies of it's elements (same
entropy, thus).

False. I just pulled 2 bits from a perfect entropy source, and they
were "0" and "0".

2 - Random sequences of a given alphabet has all elements of the
alphabet.

False. See above.

3 - Any subpart of a random sequence is random (if you take a sub-
portion of a random sequence, that sequence must continue being
random).

False. Let subpart({x_i}) = { x_i : x_i=0 }

Are those 3 statements above right? If they are right, then random
sequences can be compressed easily (I can easily build an algorithm
for doing so). That leds us to investigate which one of them is wrong.
What one would be?

Every one.

Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
.



Relevant Pages

  • Re: Chex Wat: Pi is "random" and "not predictable"?
    ... The digits of pi are not a sequence in the sense that knowing the ... without the proper reference algorithm. ... What is your prediction as to what will come next? ... This compression algorithm can then be used to predict ...
    (talk.origins)
  • Re: New Archiver called Scifer
    ... The BETA version algorithm is more a demonstration of bufferless ... automatically detect more patterns and would increase the compression ... Scifer algorithm assumes does not depend on the complexity of the data ... In the sequence you have given, ...
    (comp.compression)
  • Re: Theoretical Limits for Compression Algorithms and Random Sequences
    ... the counting argument proves only that compression is ... 'optimal file' would be a random sequence). ... there isn't a compression algorithm that provides compression for any ... prove that a given sequence is random or not. ...
    (comp.compression)
  • =?iso-8859-1?q?Re:_Kolmorgorov_Complexity_and_Kim_=D8yhus?=
    ... >>Sure a protein string could be generated from by organic Turing machine ... >>For example, a proteins sequence, like a lactase sequence, could indeed ... > proper compression. ... >>The notion of randomness is dependent upon chaos. ...
    (talk.origins)
  • Re: Chex Wat: Pi is "random" and "not predictable"?
    ... their output sequence. ... The fact is that an algorithm can also be used to ... first was about the probability of finding an apparent match. ... They may be matched within e at an infinite ...
    (talk.origins)