Re: Need a bit of information about Compression




"Nishu" <naresh.attri@xxxxxxxxx> wrote in message
news:1144667277.765377.196120@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

cr88192 wrote:

"Nishu" <naresh.attri@xxxxxxxxx> wrote in message
news:1144416807.604959.239040@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
unless we encode the EOF character (end of file) how can we
practically
determine its end or decode it or store it?

typical approaches:
either encode a full length, or a length mod some constant.

Yes, full Length coding, arithmetic coding which is certainly better
option.
this approach is often used by arithmetic coders as well. when the length
is
reached, we know the data is done.


The question is"when the length is reached !!"
In arithmetic coders, we can take care of whole string at a time and
can convert it into a single number which lies between a certain range.
hence there cant be any problem about whether all of bits in last byte
are filled or not, but this doesnt hold for huffman coding.

the last byte doesn't matter.

you keep track of decoded length.

if you are decoding, say, 1024 symbols, then you stop as soon as you decode
those symbols, no problem, whatever is left is ignored.


But would u explain me how length mode CONST coding works?

mod const usually involves some manner of guess, eg, one decodes until
the
end of the compressed data is reached. one knows that some amount of the
trailing data is garbage. we then know that the true end of the data (mod
const) is less than the current end (mod the same const) (this may not
hold
per se, if it doesn't it means that the decoded data wrapped around), one
can then back up the requisite amount, and have the decoded data (at the
correct length).


One need to know which data is garbage and which is not !!

because you know where the end "should" be.

Problem is
to find out the proper end of the bit stream...

no, it is not.
you don't actually need to know this in this case.


and this can be done if we know how many '/', '1' and '0' are there.
but not otherwise. It seems as if we need extra counter to find out
them and then we know how many total no. of actual coded bits are
present.

storing coded bits is one way (possible with 3 bits, if one knows the
location of the last byte).

likewise, one can store how large it should be when decoded, and maybe use
the mod approach for this.


also common is to know from the data how long it is.
less common is to set up some special condition, where the eof is
implied
by
an otherwise impossible occurance (less likely in this case).

I didnt understand this. Isnt that "111" would be the special case..
that last case. and hence forth we know the eof and so will stop the
decoding.

yeah, but with only 3 symbols, having a code dedicated to eof is very
expensive, better to look for a cheaper special case (an encoder
impossibility or somesuch...).


yes, coded EOF is certainly expensive, but, then theres not a good
alternative suggested here right now. I'm finding myself helpless in
following Mr. Davids point of view. If u understand that, plz explain
me.

well, in my case I don't personally care all that much about bijective
coding...

easy enough to do things the good ol-fashioned way, and unless major
additional compression or speed gains pop up, I am not all that interested
personally.

and what special character or impossibilty??? we are dealing in bits!!!
so we need bits to assign special case and in case u dont encode eof, u
simply cant put it as it is in the bit stream. right? we dont have any
exception to point out eof.


note, we transform some input to some output. we define impossibility in
terms of the rules of the transform. if the data located can't possibly be
generated by the encoder (say, a run of the same symbol that is too long and
not handled by RLE), it could be used as an eof marker, with no additional
costs.

a simple case would be, eg, defining an RLE scheme where any run >=2 of the
last symbol is replaced with a run, so 3 occurances of the same symbol in a
row is impossible.

AAAABBBCAABBBB
A#3B#2CAAB#3AAA

in this case, the rules don't allow the AAA on the end, so decoding the AAA
would flag that an eof has been reached (likewise, to save space one could
do this with the highest probability symbol, rather than needing an extra,
low probability symbol).

Just that we can make a counter to count all the char and then encode
them.
so say, then we know, '/' == ten, '1' == 4, '2' == 2. then we have 1 x
10 + 2 x4 + 2 x2 = 22 bits. and hence we found the no. of coded bits.
Its a one alternative i can think of.

no need to store the bit counts. this wont really work with AC anyways...
most of my approaches would work with both AC and huffman coding.

say, for example, we know that all runs of 0 greater than or equal to
5
will
be encoded as an rle run. if we then encode, say, 8 consecutive 0's,
then
we
know that the eof is reached (some fudging would be involved with this
one,
a final 1 bit could be used, eg, to properly align the eof).

Yes, after doing Huffman, RLE too can be envoled for even better
compression. And in that if we mention the impossible case as the eof,
then yes, problem can be solved. and we can ignore eof while doing
huffman. Its certainly better option as you pointed out.

But still, if we just opt for huffman encoding, i'm not able find out
how can we determine the eof ?

this depends...

the eof symbol approach works, if the codespace is large enough to
justify
it. with, eg, >50 symbols, the eof symbol doesn't really matter, with 3
it
does.

Yes, eof is a good choice in case of large set of char and when u dont
know the length of the stream. And in this case, we dont know the
length of the stream.

so, we can encode it. that is what I was getting at.
if one, up front, is like: this stream is 31690 symbols, ok, we can decode
it.

likewise, if we state that the length mod 256 is 202, it can still be
decoded, as once a known end for the coded stream is reached, we can use the
mod to find the end of the decoded stream.

Plz help me, if i'm not able to follow you.

Regards,
Nishu



.



Relevant Pages

  • Re: Need a bit of information about Compression
    ... determine its end or decode it or store it? ... either encode a full length, or a length mod some constant. ... Yes, full Length coding, arithmetic coding which is certainly better ... less common is to set up some special condition, where the eof is implied ...
    (comp.compression)
  • Re: Need a bit of information about Compression
    ... determine its end or decode it or store it? ... either encode a full length, or a length mod some constant. ... less common is to set up some special condition, where the eof is implied ... the eof symbol approach works, if the codespace is large enough to justify ...
    (comp.compression)
  • Re: Need a bit of information about Compression
    ... either encode a full length, or a length mod some constant. ... Yes, full Length coding, arithmetic coding which is certainly better ... less common is to set up some special condition, where the eof is implied by ... Yes, after doing Huffman, RLE too can be envoled for even better ...
    (comp.compression)
  • Re: Sending floats over a client-server in Smalltalk
    ... The trick is knowing what to decode them ... Then encode the number in the remaining bytes. ... ByteString>>floatAt: byteIndex ... I could then take a string ...
    (comp.lang.smalltalk)
  • Re: How to flip a coin over e-mail?
    ... >between these outcomes. ... You encode the room descriptions with your key A, coding each entry ... there's one entry left on the list which you can decode to ...
    (sci.math)