Re: Theoretical Limits for Compression Algorithms and Random Sequences
- From: Phil Carmody <thefatphil_demunged@xxxxxxxxxxx>
- Date: 17 Oct 2007 05:22:07 +0300
Arsène Lupin <detentor@xxxxxxxxx> writes:
Good night, people!
It's proven by math that you can't find the 'optimal size' for a given
file, in a sense that an 'optimal size' is the file that can't be
further compressed. That file is it's Kolmogorov Complexity (also such
'optimal file' would be a random sequence).
I'd not call them 'optimal', as in the vast majority of cases
such files would be larger than the original. The optimal
thing to do universally is to just leave all files alone,
without even the option (which requires 1 extra bit) of being
able to compress them.
It's also pretty widespread that the counting argument proves that
there isn't a compression algorithm that provides compression for any
file.
Better to say "for all files", as "for any file" might invite
people to select one single file for you and say "but I can
compress that one".
The question is:
If our algorithm just compresses a few range of the all possible
files, can't we use our algorithm to 'fit' the range where the random
sequences lies? With that in mind, our algorithm would be able to
compress the theoric "uncompressed" sequences?
The "random" sequence lies over the majority of the input space.
All compression functions only make a tiny fraction of the input
space smaller.
(a) You can't perform any reversible mapping that would fit vast
majority inside tiny fraction.
(b) You'd still need to remember what that reversible mapping
was, which makes the tiny fraction that you can use vastly
smaller.
You forget - compression functions don't on the whole compress.
Compression functions are expansion functions which have a small
but useful set of failure cases.
When the discussion about random sequences arises, we still can't
prove that a given sequence is random or not. But is known some
properties of random sequences:
There is no such thing as a "random sequence". There are sequences
from random sources.
1 - Random sequences have the equal frequencies of it's elements (same
entropy, thus).
False. I just pulled 2 bits from a perfect entropy source, and they
were "0" and "0".
2 - Random sequences of a given alphabet has all elements of the
alphabet.
False. See above.
3 - Any subpart of a random sequence is random (if you take a sub-
portion of a random sequence, that sequence must continue being
random).
False. Let subpart({x_i}) = { x_i : x_i=0 }
Are those 3 statements above right? If they are right, then random
sequences can be compressed easily (I can easily build an algorithm
for doing so). That leds us to investigate which one of them is wrong.
What one would be?
Every one.
Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
.
- References:
- Theoretical Limits for Compression Algorithms and Random Sequences
- From: Arsène Lupin
- Theoretical Limits for Compression Algorithms and Random Sequences
- Prev by Date: Re: Theoretical Limits for Compression Algorithms and Random Sequences
- Next by Date: Re: Theoretical Limits for Compression Algorithms and Random Sequences
- Previous by thread: Re: Theoretical Limits for Compression Algorithms and Random Sequences
- Next by thread: Re: Theoretical Limits for Compression Algorithms and Random Sequences
- Index(es):
Relevant Pages
|