Re: compression type



On Jul 25, 5:05 am, Thomas Richter <t...@xxxxxxxxxxxxxxxxx> wrote:
mcjason schrieb:



Second point above. Please state what "random" means. You haven't done
so yet. Please do your homework - it's really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.

data where the trend tends to be few repeat occurances of a length of
data, where it's usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.

Not a very reasonable definition, but for the time being, let's take
this. According to this definition, the following string

1234567891012131415161718191202122232425262728293031323334353637383940...

is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.

Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string

1111111111111111111111111111111111111111111111111111111111111....

is as likely as the above.

I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.

I'm not saying this. *You* say this.

It's intuititive to think of this the way
the problem is well described. But I can't find anywhere the say so of
random being hard to compress isn't connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.

It's very limited to think that's the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.

*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don't. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it's the time for you to
deepen your research.

it's to say this proves how random is compressable, take it whatever
way you want I know it's right.

Using a definition of "random" that makes sense (your definition
doesn't, I wouldn't call either of the strings random), you cannot
compress random strings.

say for every length of data there can be a shape, a shape where it's
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.

That's a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don't give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.

What you don't seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.

In the end, you'll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it's a logical constraint about maps
between finite sets, a very elementary one.

now say for two lengths of data, a shape for each.

now.. this might be a little harder to believe is right.

I'm not arguing at this level - you don't seem to understand.

given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.

All very well, but you still need data to describe this "different", and
you'll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.

If you don't believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.

given a perfect idea of how this would work, shouldn't it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn't it though just to think of the most idea condition there
should be?

doesn't that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn't the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.

It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.

Again, if you don't believe me, construct this algorithm and you'll see
yourself.

So long,
        Thomas

I have an easy time believing one thing....

say for all there is to compress... put it in a geometry area.
now say it's just that.

now the file is just that, and 1 token to say that's what expands, is
just the block there in the geometry area.

so nothing different about the size really.

now.. instead of one block, this instead...

find every instance of BBBB, and seperate the block.

so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"

now say one BB block and the blocks before and after each BB

now the geometry area is with that

now one curved line as the token to draw that pattern.

so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?

so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.

no pigeonhole concept here because tokens aren't mixed with data, it's
the geometry area and curves outside it as all there is to expect.

to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it's way through easily maybe?

see how this proves random is compressable?

because in random data any size it's good to see BBBB once in a while,
but it's only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it's to say a seperate block and a curve slightly more
complicated for each time BBBB is found?

it's like.. easy to see maybe?
.



Relevant Pages

  • Re: compression type
    ... occurance of what repeats, for there to be a token, for how tokens ... see its construction algorithm. ... It's very limited to think that's the only way to compress, ... made different, to the other shape, where the math to say one shape ...
    (comp.compression)
  • Re: Attention Sean - question about CSI
    ... is possible to compress any given string to 1 bit. ... except the compressed data and the description of the algorithm, ... In our universe, there is one common reference: ...
    (talk.origins)
  • Re: Password alternatives
    ... their algorithm remaining secret, which in terms of cryptography is bad ... I'm not an expert on tokens. ... Unlike passwords, biometrics do have the problem of False Accept Rate ... passphrases as a string of characters, ...
    (Security-Basics)
  • Re: Attention Sean - question about CSI
    ... is possible to compress any given string to 1 bit. ... except the compressed data and the description of the algorithm, ... In our universe, there is one common reference: ...
    (talk.origins)
  • Re: Attention Sean - question about CSI
    ... It should compress very ... If I am allowed to choose the compression algorithm *after* ... which compresses that string to a single bit. ... from the binary code, and use it to decompress the data, then ...
    (talk.origins)