Re: Entropy




"cerelaz" <Cerealz@xxxxxxxxxxxx> wrote in message
news:pan.2006.04.24.07.47.58.50192@xxxxxxxxxxxxxxx
It's not a homework. I've finished my exams. This is a part of my thesis.
Probably I've done some mistakes to define the problem. I'm trying again.

ok.

to me, this seems like the question was just paraphrased from a textbook
or
something.

No, but books that explain Hk entropy (informational entropy) are welcome.

I'll try to explain better my problem.
I'd like to find (if it exists) a relation between Hk and "an incremental
Hi" (that I define). For Hk I compute an k-order modeler and I compress
(with 0-order compresor like huffman or arith) the j-th char of the text
(T[j]) considering only previus k chars T[j-k-1, j-1]. (for k fixed i.e.
4-5 like ppm)
An i-order modeler is computed by full knowledge of the text by computing
the conditional probabilities of every symbol in the text.


guess:
so by "Hk entropy" you mean "an order-k probability for a stream of
symbols". again, in the original post, there was no real definition as to
what was meant by "Hk" (nor any comment relating to 'k' meaning 'order', nor
even saying wether you were using single or multi-character symbol names).


as for higher order symbol sortering:

did this once before, but at the time the algo was rather slow.
dunno if this kind of algo can actually be made all that fast though
(without somewhat inflating memory requirements).

it was an algo consisting mostly of:
I have a big area of memory, showing the current symbol orders for every
possible context (initialized to 0..255 in all contexts);
for coding each symbol, its position is found in the ranking for that
context, the index is output, and the symbol is moved to the front of the
list;
the context is adjusted and the next symbol is processed.

problem: for order-2, it takes 16MB of memory. higher orders are possible,
either via using more memory, a hash code, or both (hash codes resulting in
occasional overlaping contexts, but keeping memory usage "sane").

in my case, the algo was rather slow. thinking now, a slightly faster algo
is possible (based on retaining only a reverse mapping for encode, and a
forward mapping for decode).


next problem: how is higher-order modelling sufficiently more effective
than, say, dictionary teqniques? dictionary teqniques are usally much
faster, much lighter on memory, and still fairly straightforwards to
implement. it often comes down a lot to minor differences in compression
ratios, for often very big differences in speed and memory use.


Now, I split the text into no overlapping blocks of length s (with s>k)
and I compress each block with variable length-order modeler. I use a 0
order modeler for first char of each blocks, 1-order for the second,
2-order for the 3-th, ..., s-1-th order modeler for the s-th char of each
block. Hence, I use a lower i-order modeler for (k-1)*(n/s) symbols
instead of k-order (where i<k) but I use a greater j-order for
(s-k-1)*(n/s) symbols.
I don't want to take into consideration the space required for huffman
tables.

variable order: shouldn't be necessary. if the context starts with some sane
value, then effectively the lower-order contexts are simulated as a
by-product.

for order-0, using a kind of filtering, the huffman tables should be no big
deal. above order-0, however, and the huffman tables will be huge.

I suspect you mean that you are taking the input stream of symbols,
performing some manner of transform (eg: possibly some kind of symbol
sorter), then running the results though huffman. ok...

How can I find a relation between this schemes? Probably an expert eye
could help me by saying that a relation does not exist (i.e. it depends on
a particular text or division) or by saying an other way to see the
problem. For example, if k=0 the second scheme is better for all value of
s.

dunno, I am confused here...

Thank you again

I'm sorry if I'm too prolix or dull :-)


.



Relevant Pages

  • Re: 2.6, 3.0, and truly independent intepreters
    ... have an additionalfieldthat would point back to their parent context ... one memory buffer to another memory buffer may be given two proxies as ... if a target context is idle you can enter it (acquiring its ... C and it's a simple object (no pointers). ...
    (comp.lang.python)
  • Re: Quagga as border router
    ... I'm going to reply this first response in full context, ... already have my router config pretty well done, on a flash memory card, ... substantial enough Cisco router can house the entire v4 route table via ... Quagga, but I really didn't hear any 'technical' reason as to the ...
    (freebsd-net)
  • Re: [PATCH 1/7] async: Asynchronous function calls to speed up kernel boot
    ... be called within the caller's context. ... complex because the caller wants to do a more complex thing. ... BUT it does not solve the caller not getting memory, ... the scheduled function is not allowed to make the metadata ...
    (Linux-Kernel)
  • Re: A reply to Marvin Minsky.
    ... It makes no sense either to talk about a hypothetical AI that behaves ... But in the context of AI, isn't it obvious where the typical functional ... or drop skin cells all over the floor. ... have similar visual and hearing discrimination, memory performance, operant ...
    (comp.ai.philosophy)
  • Re: Single-bit I/O
    ... > memory at the bit level, other than using 8-bit masks? ... > if I was implementing a compression algorithm like Huffman, ...
    (microsoft.public.vc.language)