Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: jules.stocks@xxxxxxxxx
- Date: 3 May 2007 19:47:28 -0700
No Thomas;
You didn't read my post very closely -- I am getting a bell curve from
ONE random data source, the client data itself -- no other information
is applied
(I do XOR past client input but I think that qualifies as one RAD
source.)
No extra information is required, no control tables, no headers,
nothing. The process is one byte for one byte.
That should not produce the output I am obtaining -- you know this,
I'm sure -- and in fact the output I am getting is from the
'translation' process.
So, unless you spend all your time saying nasty things, you probably
could help; Think about this; I read a byte of RAD and emit a
different byte, where the new byte can be converted back to the input
byte but unlike the input byte, which exhibited a flat distribution,
the new replacement bytes exhibits a bell curve and is generally
numerically lower in value (the output values clash, producing the
distribution.)
Perhaps I will be proven mistaken but I just don't think you can
manage this -- I think you are much better at nay-saying than you are
at helping.
So, again -- prove me wrong. How?
I have a process that converts a flat RAD distribution into a bell-
curved distribution, without adding any extra information. I need to
re-introduce instances of patterns, (ie., to mimic the kinds of
'patterns' that documents possess,) as this is necessary to achieve
real compression;
On May 3, 8:23 am, Thomas Richter <t...@xxxxxxxxxxxxxxxxxxxxxxxxxx>
wrote:
jules.sto...@xxxxxxxxx wrote:
sort <INPUT | uniq -c | sort -r >OUTPUT
This simple unix script produces a histogram sufficient to show the
distribution
resulting from my process. Whereas the input is random, presumably
characterized
by a flat histogram, the histogram of my output is normally a nearly
perfect bell
curve.
Congrats. You just rediscovered stable distributions. In fact, for
any (halfway reasonable) statistical distribution, you'll always
get a bell-curve by looking at the statistics of the output. This is
because the distribution of, say, the max of two random variables
is the convolution of the two input distributions. By folding over
and over again, Levy's theorem on stable distributions tells us that
you arrive at a bell curve quickly.
(So tell me how to take advantage of this attribute, because all data
produced by
my process can be characterized this way.)
No surprise. You cannot take advantage of that to compress random
data, because the permutation that sorted your data is lost, and it
is the missing piece of information. BWT for example is a scheme that
takes advantage of it (but of course not on random data), so you
probably want to do some reading.
In other words!, I can
convert data from
a previously compressed file into data that exhibits a nice bell
curve.
No surprise.
Asking for help: I *really really seriously* encourage you to do some
reading first.
So long,
Thomas
.
- Follow-Ups:
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: Thomas Richter
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: John_H
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- From: Matt Mahoney
- Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- References:
- Prev by Date: bz2 block format and BZ2_bzBuffToBuffDecompress
- Next by Date: A Fast sorting algorithm for almost sorted data
- Previous by thread: Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- Next by thread: Re: a little (very little,) data, extracted from my front-end feeding a conventional compressor
- Index(es):
Relevant Pages
|
Loading