Re: answers please...
- From: Thomas Richter <thor@xxxxxxxxxxxxxxxxx>
- Date: Thu, 19 Jun 2008 23:05:52 +0200
jules Gilbert schrieb:
Okay, as I have said I have a method that does produce a non-uniform
number of output states, given a uniform number of input states.
(Here, non-uniform means a bell curve.)
Oh, if that is your problem or claim, this is easily done. If you have two
independent random sources, A and B, then A+B will always be a random
source with a non-uniform distribution. Actually, if you perform this step
a couple of times, you will arrive at a random source with a Gaussian
distribution. This is not a miracle, this is just the central limit
theorem. Proving this requires a bit of Fourier analysis and knowledge about
stable distributions, but once that is understood, it's about a page
(for the currently understood situation of having symmetric sources
with bounded variance, otherwise the proof is more technical.)
I have been testing a 1M 'file' of RAD, where my 'file' is a segment
of right-most bits from a well known/respected PSRNG and am producing
bell-curve output. (These runs have been made many millions of
times, and I have on occasion substituted other RNG's -- I don't think
I am 'seeing' an attribute of the RNG.) I offered to provide this
information to one of the mathematicians who post here, but the
private conversation was not helpful -- so once again, I am coming to
the group for assistance.
Well, here's my assistance: Google for "central limit theorem".
Also, I have also done this with actual files, including MN's 1M file
(well, 415k,) and produce non-uniform output. And, when I assume 1-
bit for the most frequent output state times the number of times that
symbol occurs, plus 2-bits for each of the next two most commonly
occurring symbols, again times the number of times each of those
symbols occur, etc... I get a size result that is smaller than the
input file. (Yes, that's right.)
If you think about this closer, you'll soon find that, for example for
the above simple system, A+B will have a range that is larger than
the input range, which will eat up your gain you gained by the non-uniform
distribution. Similar pitfalls will exist for related algorithms.
And I have a couple of dumb questions...
1) Why is Shannon-Fano a bad choice? (I understand that arithmetic
coding can be better,) but why is this not as good as Huffman?
It generates codes "top down" rather than "bottom up" - in some configurations,
it can be seen that the codes generated by it are non-optimal in the sense
that codes are possible that generate shorter average bit-length. For Huffman,
however, one can show that reaches the theoretically possible limit. Again,
a proof would be too long for this news group, but if you want to see an
example where Shannon-Fano does not create optimal codes, the Wikipedia carries
one: {0.35, 0.17, 0.17, 0.16, 0.15} - it should be sufficient to build the
corresponding Shannon-Fano code for this set, compare it with the code created
by the Huffman algorithm, and compute the average code-length for both. You'll
then see that the size of the Huffman code is smaller.
2) Wrt to my data, I have a number of different implementation
models, (the system supports three parameters, not all of the
cartesian product of these models actually work wrt compression,) but
among those that do, many produce fewer than 256 symbols. Has anyone
written a standard emitter library that can be used for
experimentation? (I've only been looking for such a library for about
ten years.)
Encoding an alphabet with a number of symbols not exactly a power of two is a
nice candidate for arithmetic coding (even if the probabilities are all
equal).
I have decided that my emitter will be built around S-F, it's simple
and is not protected by someone else's patents.
And, if I may ask, what's wrong with Huffman?
Besides, I think the relevant patents for arithmetic coding should be run
out by now.
So long,
Thomas
.
- Follow-Ups:
- Re: answers please...
- From: jules Gilbert
- Re: answers please...
- References:
- answers please...
- From: jules Gilbert
- answers please...
- Prev by Date: Re: answers please...
- Next by Date: Re: The PERFECT EQUATION = LOSSLESS COMPRESSION!!!! TRUE!!!!
- Previous by thread: Re: answers please...
- Next by thread: Re: answers please...
- Index(es):
Relevant Pages
|