Re: your assistance is requested
- From: Thomas Richter <thor@xxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: 30 Apr 2007 19:33:17 GMT
jules.stocks@xxxxxxxxx wrote:
This program process's each input byte separately, that is without
respect to any other input or output bytes. (ie., compression applies
to an input stream, whether that stream is 1-byte long or 1-terabyte
in length.)
Does this mean that your random source model is i.i.d? Aren't you
sure that you could improve your performance by a higher order
model?
At present I simulate input from an actual RAD source by AND'ing 255 &
random(); Where the latter is the FreeBSD int function that returns a
32-bit random value.
So, question #1: Is this source reasonable? (One friend sez I should
shift right before using the value resulting from the call.)
NO! The typical rand() implementation (modular congruence) will just
return a permutation of the numbers 0..255 from this, repeated over
and over again. Namely, once you hit the same number again, you get
exactly the same sequence starting from that number.
If this is your random source, you would just need to encode
this permutation (and not the sequence).
You get better random generators by using the high-order bits.
Question #2: Is their a body of publicly available code on the 'net
which would permit me to measure the quality of the input resulting
from this method.
I'm not aware of what you might find on the net, but I would look
into the random generator chapter of Donald Knuth's "The Art of
Programming", Vol. 2 "Numerical and Seminumerical Algorithms". You
find a lot on properties of random generators there, and especially
on the modular congruence generator that is almost traditionally the
algorithm behind rand(). And no, it is *not* reasonable for your
purpose. It is not random at all.
I am very reluctant to use file-based RAD because, in the past, I have
used such sources and every time found that my program would process
files of that type (say GZIP or BZIP,) preferentially over other types
of input. It's important to me that the program exhibit reasonably
flat usage characteristics.
Then rand() isn't at all reasonably flat either.
This method is fast, and requires only a small amount of memory to
function. I have a process running 24X7 now, reading bytes from:
#define RADSRC (255 & random())
and being compressed, typically being translated into equivalent bytes
containing (typically,) two bits less than the original inut value.
One can actually compute how far one can get, provided the seeding
value is unknown and the modulo add add-value is unknown. All you need
to encode is the permutation rand() & 255 returns. If you know the
generator, all you need to encode is the seed. If you know that as well,
you don't need to compress.
(I hate to be obvious, but someone will complain if I don't add that
yes -- the compressed bytes are easily restorable to their original
value.)
Not if the input is truely random - you cannot get compression then
for *all* sequences. Its a theorem.
So, two questions:
a) Is RADSRC adaquate?
No.
b) where can I find good measuring tools?
See the literature reference.
So long,
Thomas
.
- References:
- your assistance is requested
- From: jules . stocks
- your assistance is requested
- Prev by Date: your assistance is requested
- Next by Date: Re: your assistance is requested
- Previous by thread: your assistance is requested
- Next by thread: Re: your assistance is requested
- Index(es):
Relevant Pages
|
|