Re: compression type
- From: Thomas Richter <thor@xxxxxxxxxxxxxxxxx>
- Date: Fri, 25 Jul 2008 11:05:42 +0200
mcjason schrieb:
Second point above. Please state what "random" means. You haven't done
so yet. Please do your homework - it's really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it's usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
Not a very reasonable definition, but for the time being, let's take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940...
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
I'm not saying this. *You* say this.
It's intuititive to think of this the way
the problem is well described. But I can't find anywhere the say so of
random being hard to compress isn't connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It's very limited to think that's the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don't. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it's the time for you to
deepen your research.
it's to say this proves how random is compressable, take it whatever
way you want I know it's right.
Using a definition of "random" that makes sense (your definition
doesn't, I wouldn't call either of the strings random), you cannot
compress random strings.
say for every length of data there can be a shape, a shape where it's
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
That's a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don't give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don't seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you'll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it's a logical constraint about maps
between finite sets, a very elementary one.
now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
I'm not arguing at this level - you don't seem to understand.
given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
All very well, but you still need data to describe this "different", and
you'll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don't believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
given a perfect idea of how this would work, shouldn't it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn't it though just to think of the most idea condition there
should be?
doesn't that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn't the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don't believe me, construct this algorithm and you'll see
yourself.
So long,
Thomas
.
- Follow-Ups:
- Re: compression type
- From: mcjason
- Re: compression type
- From: mcjason
- Re: compression type
- References:
- compression type
- From: mcjason
- Re: compression type
- From: mcjason
- Re: compression type
- From: Jim Leonard
- compression type
- Prev by Date: Re: compression type
- Next by Date: Re: compression method
- Previous by thread: Re: compression type
- Next by thread: Re: compression type
- Index(es):
Relevant Pages
|