Re: Probability, compressible sequences: Backgrounds
- From: a.denis1@xxxxxxxxx
- Date: Sun, 9 Mar 2008 15:55:35 -0700 (PDT)
On 7 Mar, 09:05, Thomas Richter <t...@xxxxxxxxxxxxxxxxx> wrote:
Very good explanation!
Thank you Thomas.
But let me try to explain my different point of view.
Hi folks,
since there was enough noise in this group once again, I afraid much
noise is caused simply because people do not use adequate definitions
for the subjects to be discussed here, and it's time to clean up the
mess. Folks, before you discuss, please get some background, really,
even if it only helps to reduce the noise level here.
Yes, it can be better but to obtain this result you wrote down 100
lines!
A "specific sequence" cannot be random. And as such, as long as you know
your sequence, you can compress it. Actually, to zero bit, as you know
what you're talking about, so there's no point in transmitting it in
first place.
In the compressed sequence you need to insert also the decompressor
otherwise it is trivial.
How can you compress a specific sequence wthin the decompressor to
zero?
"Entropy" and "compressibility" are statements that are not
defined on *specific* sequences, or on "subsequences of specific
sequences", but only on *random processes*. A *process* is something
different than a *sequence*.
I think it is possible ( not only possible but I think it is the only
way ) to speak about "Entropy" and "compressibility" about sequence
indipendently from processes.
By that I mean that, in order to define the terms, I need to have a
device (probably a coin) that generates symbols (head or tail) by an
external trigger (me, dropping the coin). I can measure the relative
frequencies of the symbols, and for a well defined random process, I
must ensure that these frequencies converge to probabilities as the
number of experiments grows to infinity. *This is nothing you can
prove*. It is an assumption you have to make about your device to have
the theory make some sense.
Second, compressibility does not speak about "compressing sequence A or
B". It speaks about "compressing sequences on average generated by a
random device as above". That means, operationally, you run a long
sequence of experiments, using the same device, understanding that it
doesn't change its statistics, and feed the output of each experiment
(each of which generating a lot of symobls) into your compression
algorithm. The *average* (expected) output length of your compression
algorithm, divided by the number of input symbols, is the
"compressibility" - *OF THE RANDOM PROCESS*. Note that I'm not talking
about a specific sequence, this doesn't make sense.
The reason why "compressors work", or rather, the philosophy behind
them, is to model (or rather "misunderstand") the input sequence "as if"
it had been generated by a random process, and build a compressor that
works well *on this* random process. Not on any *specific* sequence.
Second misunderstanding: "RAD" or "random process" does not imply that
the probabilities of all symbols of the random process are equal. That
is, in fact, the boring case as it is easy to show that you cannot
compress this (i.e. for any compression device, the *average* output
length is equal to the input size, taking the number of experiments and
the size of the sequence to infinity). Furthermore, in no way the
probability of the next symbol need to be independent of the symbols
before (i.e. there is no need for the process to be memory-less). This
is again the "boring case". Compressors work because they model the
input source by a random process that *does* have memory. And this is a
good model for text, language, programs, images... "Good" as in "it does
work". We "do not have" a random device that generates symbols *exactly*
as in text/speech/images... etc. But "it's close enough for the purpose
of compression to assume it" - that is, "a model".
Going back to the original statement: It would help a lot if, before you
would start discussing, you would start from a proper definition of
words. I'm not saying that the above one is the "only" or "correct" one,
but a lot of noise here could have been avoided if you would have stated
clearly what you're talking about.
Thanks for reading,
Thomas
I try to reassume your point of view let me know if I misunderstand.
You say that speaking about random sequence is nonsense, you need to
define the process and if the process is random the sequence produced
by the process will be random indipendently from the compressibility
of its serquences.
How can you define a random process?
A coin is a random process? Why?
You use a random process as hypothesis and then using this hypothesis
you define the sequence as random .
You can say ok this sequence is the result of a coin so we know this
is a random process and the sequences will be random but you need to
know that the process is random and to know a process is random you
need to analize its sequences ( yes there are other ways but ...) !
We know the coin flipping is a random process becouse in the past we
analize its sequences and we understand is it about impossible to
predict its result and the sequence is incompressbile.
For this reason I go in the opposite direction I claim that the only
way to define a random process is to identify its sequences as
random , not compressible.
Using the coin as device you can create a random process becouse this
process produce not compressible sequences.
Now you can try to flip a coin and you can give me a sequence
absolutely compressible and claim that it is random.
Using the compressibility principle I must say that process is not
random and this is a mistake but if you know only that sequence the
best claim possible is that the process is not random !
To say that the sequence is random you need to identify its process as
known random process and then you can give the correct answer , but
you need to know a lot of data , more than the amount of the sequence.
For these reason I say we can speak about random sequence, becouse the
only way to identify a process as random is to watch at its sequences.
For these reason I say we don't need to know the process to speak
about random.
And for these reason I claim
Random data is incompressible by definition!
Thank you, Denis.
.
- Follow-Ups:
- Re: Probability, compressible sequences: Backgrounds
- From: Thomas Richter
- Re: Probability, compressible sequences: Backgrounds
- References:
- Probability, compressible sequences: Backgrounds
- From: Thomas Richter
- Probability, compressible sequences: Backgrounds
- Prev by Date: Re: Compression using Random Number Generators?
- Next by Date: Re: Probability, compressible sequences: Backgrounds
- Previous by thread: Re: Probability, compressible sequences: Backgrounds
- Next by thread: Re: Probability, compressible sequences: Backgrounds
- Index(es):
Relevant Pages
|
Loading