Re: Entropy
- From: "Matt Mahoney" <matmahoney@xxxxxxxxx>
- Date: 24 Apr 2006 16:38:25 -0700
cerelaz wrote:
It's not a homework. I've finished my exams. This is a part of my thesis.
Probably I've done some mistakes to define the problem. I'm trying again.
to me, this seems like the question was just paraphrased from a textbook or
something.
No, but books that explain Hk entropy (informational entropy) are welcome.
I'll try to explain better my problem.
I'd like to find (if it exists) a relation between Hk and "an incremental
Hi" (that I define). For Hk I compute an k-order modeler and I compress
(with 0-order compresor like huffman or arith) the j-th char of the text
(T[j]) considering only previus k chars T[j-k-1, j-1]. (for k fixed i.e.
4-5 like ppm)
An i-order modeler is computed by full knowledge of the text by computing
the conditional probabilities of every symbol in the text.
Now, I split the text into no overlapping blocks of length s (with s>k)
and I compress each block with variable length-order modeler. I use a 0
order modeler for first char of each blocks, 1-order for the second,
2-order for the 3-th, ..., s-1-th order modeler for the s-th char of each
block. Hence, I use a lower i-order modeler for (k-1)*(n/s) symbols
instead of k-order (where i<k) but I use a greater j-order for
(s-k-1)*(n/s) symbols.
I don't want to take into consideration the space required for huffman
tables.
How can I find a relation between this schemes? Probably an expert eye
could help me by saying that a relation does not exist (i.e. it depends on
a particular text or division) or by saying an other way to see the
problem. For example, if k=0 the second scheme is better for all value of
s.
Thank you again
I'm sorry if I'm too prolix or dull :-)
Generally when you split the input into blocks, the compression will be
worse. However there is no exact relationship. For example, if the
statistics of the blocks differ greatly (e.g. text and images) then
compressing them separately will often improve compression overall.
The compression loss is mainly due to discarding the model and
retraining it for each block, rather than using a low order model for
the first few characters.
-- Matt Mahoney
.
- References:
- Entropy
- From: cerelaz
- Re: Entropy
- From: cr88192
- Re: Entropy
- From: cerelaz
- Entropy
- Prev by Date: Re: New Compression Method
- Next by Date: Re: Am I wise to convert all my AVIs?
- Previous by thread: Re: Entropy
- Next by thread: Using Fibonaci
- Index(es):