Re: how to estimate compressibility ?
- From: Ulrich Lauther <ulrich.lauther@xxxxxxxxxxx>
- Date: Fri, 16 Jun 2006 20:17:41 +0000 (UTC)
Matt Mahoney <matmahoney@xxxxxxxxx> wrote:
: Ulrich Lauther wrote:
: > I have the following problem:
: > An algorithm generates a stream of characters that later on will be
: > compressed. In each step, I can select the character to generate from a set
: > of candidate characters using the last character (or possibly the last few
: > characters) generated to guide my decision.
: > Most - but of course not all - characters generated are zero.
: >
: > I assign a "cost" to each selection, which should reflect compressibility
: > of the generated sequence.
: > Currently, I assign a cost of 0 for a zero following a zero, a cost of 1
: > for a zero following a non-zero, and a cost of 2 in all other cases.
: >
: > Any better ideas?
: >
: > BTW, for compression I use a sequence of Burroughs-Wheeler transformation,
: > Move-to-front, Runlength-encoding, and Huffman-coding.
: I assume you are at the BWT+MTF+RLE stage, and now you want to assign
: costs to select Huffman codes? Your cost should be log 1/p, where p is
: the probability of the next symbol. (Cost will be proportional to your
: code length). You can estimate p by counting character occurrences.
: It will vary depending on the input. If your input is random data then
: BWT+MTF will have a uniform distribution without any long runs. If
: your input is highly compressible then BWT+MTF will have long runs of
: zeros and mostly small values otherwise.
: -- Matt Mahoney
No, I am way before that stage - and I could use any other compression
method.
As I said, I generate a data stream where I have a (limited) choice for
each character and I want to choose in a way that the resulting stream
becomes as compressible a possible.
Yes, I could use log 1/p as cost, but that whould take a very global look
at the problem and would not catch local properties of the data stream.
--
-lauther
[nosave]
----------------------------------------------------------------------------
Ulrich Lauther ph: +49 89 636 48834 fx: ... 636 42284
Siemens CT SE 6 Internet: Ulrich.Lauther@xxxxxxxxxxx
.
- Follow-Ups:
- Re: how to estimate compressibility ?
- From: Matt Mahoney
- Re: how to estimate compressibility ?
- References:
- how to estimate compressibility ?
- From: Ulrich Lauther
- Re: how to estimate compressibility ?
- From: Matt Mahoney
- how to estimate compressibility ?
- Prev by Date: Re: 20 Years of JPEG Celebrated With Software Launch
- Next by Date: Re: how to estimate compressibility ?
- Previous by thread: Re: how to estimate compressibility ?
- Next by thread: Re: how to estimate compressibility ?
- Index(es):
Relevant Pages
|
|