Re: your assistance is requested



jules.stocks@xxxxxxxxx wrote:

This program process's each input byte separately, that is without
respect to any other input or output bytes. (ie., compression applies
to an input stream, whether that stream is 1-byte long or 1-terabyte
in length.)

Does this mean that your random source model is i.i.d? Aren't you
sure that you could improve your performance by a higher order
model?

At present I simulate input from an actual RAD source by AND'ing 255 &
random(); Where the latter is the FreeBSD int function that returns a
32-bit random value.

So, question #1: Is this source reasonable? (One friend sez I should
shift right before using the value resulting from the call.)

NO! The typical rand() implementation (modular congruence) will just
return a permutation of the numbers 0..255 from this, repeated over
and over again. Namely, once you hit the same number again, you get
exactly the same sequence starting from that number.

If this is your random source, you would just need to encode
this permutation (and not the sequence).

You get better random generators by using the high-order bits.

Question #2: Is their a body of publicly available code on the 'net
which would permit me to measure the quality of the input resulting
from this method.

I'm not aware of what you might find on the net, but I would look
into the random generator chapter of Donald Knuth's "The Art of
Programming", Vol. 2 "Numerical and Seminumerical Algorithms". You
find a lot on properties of random generators there, and especially
on the modular congruence generator that is almost traditionally the
algorithm behind rand(). And no, it is *not* reasonable for your
purpose. It is not random at all.

I am very reluctant to use file-based RAD because, in the past, I have
used such sources and every time found that my program would process
files of that type (say GZIP or BZIP,) preferentially over other types
of input. It's important to me that the program exhibit reasonably
flat usage characteristics.

Then rand() isn't at all reasonably flat either.

This method is fast, and requires only a small amount of memory to
function. I have a process running 24X7 now, reading bytes from:

#define RADSRC (255 & random())

and being compressed, typically being translated into equivalent bytes
containing (typically,) two bits less than the original inut value.

One can actually compute how far one can get, provided the seeding
value is unknown and the modulo add add-value is unknown. All you need
to encode is the permutation rand() & 255 returns. If you know the
generator, all you need to encode is the seed. If you know that as well,
you don't need to compress.

(I hate to be obvious, but someone will complain if I don't add that
yes -- the compressed bytes are easily restorable to their original
value.)

Not if the input is truely random - you cannot get compression then
for *all* sequences. Its a theorem.

So, two questions:

a) Is RADSRC adaquate?

No.

b) where can I find good measuring tools?

See the literature reference.

So long,
Thomas
.



Relevant Pages

  • Re: Repeated compression of previously compressed data is not impossible, Ive done it.
    ... > There are more inventors who have invented compressors/decompressors ... > 1,000 or 1,000,000 times compression better then the competition? ... compressing all files including files generated by random generators." ... particular has used as an excuse over and over and over again, ...
    (comp.compression)
  • Re: Fermats Last Theorem
    ... This has approximately as much to do with "geometry" known from the school as a wheel has with a spaceship. ... I think you still do not see the fundamental principle of compression. ... The problem is to find a good description of the data as output from a random source. ... Then, make a two-dimensional plot, plot x_n over x_(i.e. the y-coordinate of a point is given by the current digit, the x-coordinate ...
    (comp.compression)
  • Re: Fermats Last Theorem
    ... If you have a random source, ... the noon temperature of today *usually* (that ... in the geometry of the plots. ... What has this to do with compression? ...
    (comp.compression)
  • Re: Entropy
    ... The setting is the one I explained above: In Shannon's picture, a *communication* is the transmission of a message generated by a random source, and what Shannon's theory tells you is that there is a lower limit how much such a source can be compressed, where the bound depends on the nature of the source. ... This bound is then shown to be the entropy. ... In the Kolmogorov case, a message is an individual thing, but an infinitely long string for which you seek the shortest computer algorithm that re-generates the string. ... Note that this is something *different* than seeking the compression "on average". ...
    (comp.compression)
  • Re: Hillary Clinton and Barack Obama Discuss Space Policy
    ... As has been repeatedly explained to you, what Rand did was pose the ... fallacious tactics that you were engaging in. ... If you have a piece of information thast requires more ... compression. ...
    (sci.space.policy)