Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: Thomas Richter <thor@xxxxxxxxxxxxxxxxx>
- Date: Wed, 09 May 2007 12:01:42 +0200
jules.stocks@xxxxxxxxx wrote:
No Thomas;
You didn't read my post very closely -- I am getting a bell curve from
ONE random data source, the client data itself -- no other information
is applied
(I do XOR past client input but I think that qualifies as one RAD
source.)
Jules, that is understood. It doesn't matter. It is effectively an outcome of the central limit theorem of probability theory you see here.
The theorem (and its generalization, Levy's stable distributions) holds
under very mild conditions, and especially if your source is "reasonably" i.i.d. and has finite variance.
No extra information is required, no control tables, no headers,
nothing. The process is one byte for one byte.
I claimed nothing else.
That should not produce the output I am obtaining -- you know this,
I'm sure -- and in fact the output I am getting is from the
'translation' process.
Look, as soon as you sort random data - are you doing this or not - you apply a transformation that modifies the data such that Levy applies. You find all the ingrediences (but not Levy's theorem) for example in Donald Knuth's "The Art of Programming", Vol. II, Section 3.4.1. In my 3rd edition, this is section B of 3.4.1 on page 121. Read there.
So, unless you spend all your time saying nasty things, you probably
could help; Think about this; I read a byte of RAD and emit a
different byte, where the new byte can be converted back to the input
byte but unlike the input byte, which exhibited a flat distribution,
the new replacement bytes exhibits a bell curve and is generally
numerically lower in value (the output values clash, producing the
distribution.)
I can also do that. For example, apply the Haar filter on the RAD data
a couple of times. Voila! Gaussian Statistics! It doesn't require any *specific* manipulation, general linear algebra is sufficient. And, it is fully reversible.
Perhaps I will be proven mistaken but I just don't think you can
manage this -- I think you are much better at nay-saying than you are
at helping.
So, again -- prove me wrong. How?
I don't prove you wrong. All I say is that it is pretty obvious that you should get Gaussian shapes with a very, very huge family of manipulations by the above theorem. However, you *cannot* gain compression by that.
I have a process that converts a flat RAD distribution into a bell-
curved distribution, without adding any extra information. I need to
re-introduce instances of patterns, (ie., to mimic the kinds of
'patterns' that documents possess,) as this is necessary to achieve
real compression;
I have one as well. It's called "Haar filter". In fact, any other halfway decent linear combination of samples would have done that. Even
invertible ones do that, like the Haar filter. There's nothing mysteriously about that. If you then later on compute the entropy of the
filtered and the original data - voila, no difference. This doesn't come as a surprise, it is a theorem.
So long,
Thomas
.
- References:
- a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: jules . stocks
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: Thomas Richter
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: jules . stocks
- a little (very little,) data, extracted from my front-end feeding a conventional compressor
- Prev by Date: Re: A Fast sorting algorithm for almost sorted data
- Next by Date: Re: Open source java implementations of PPM?
- Previous by thread: Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- Next by thread: Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- Index(es):
Relevant Pages
|
Loading