Re: Just for fun...
- From: "Trans" <transfire@xxxxxxxxx>
- Date: Thu, 29 Mar 2007 09:45:50 +0900
Sorry about the delayed response. I just have too much one my mind...
On Mar 23, 8:18 am, Mauricio Fernandez <m...@xxxxxxx> wrote:
Ah I see, you were surprised at deflate being so bad... I was surprised at
your surprise :)
Truly surprised. Makes me wonder why other formats like 7z aren't more
widely used. Is decompression speed really that much more important
than size?
Regarding your remark on the choice of the universal machine and some choices
not being "general compression", you could add that even if descriptive
complexity relative to two different universal machines differs at most by a
constant term, the latter does matter when we're approximating the algorithmic
complexity in practical scenarios...
I'm sorry. Would you care to put that in layman's terms? I did not
have the good fortunate of academic education in the field (oh, if
only!), so I cannot readily address your statement (assuming of course
you are using academic terms in good faith, and notjustbeing
intentionally obscure).
Sorry, I assumed you were familiar with Kolmogorov complexity and tried to
remove some redundancy [1] from my message based on that premise ;-) (you said
you had been thinking deeply about this so it wasn't too unreasonable :)
[although this is all redundant ultimately and could be replaced by a pointer
into e.g. Cover&Thomas' Elements of information theory; BTW IIRC there's a new
edition in the works?]
I will have to look for that book. Thanks.
What you hinted at in your prev. message is known as the
Kolmogorov(-Chaitin)/descriptive complexity, corresponding to the length of
the minimal description of a string using a program in a given Turing
machine/programming language. It is therefore relative to the universal
machine/language you choose, but if you pick two different ones the
corresponding complexities for a given string can differ at most by a constant
amount, the length of the string that emulates one machine on the other.
In theory, you often ignore such constant factors (just choose a string long
enough :), but in the example you gave, the machine included a full copy of
the KJV bible, allowing it to replicate it with a single instruction... and as
you said the description of the machine itself made all the difference and it
only worked for that particular string, making it of little interest. There's
a sort of trade-off between the complexity of the machine and that of the
programs you feed into it and both parts matter in practice: a machine able to
compress any English text to 1/1000th of its original size wouldn't help
residential users if it included/required a (mildly compressed?) copy of
Google's archive.
Thanks. Makes perfect sense and now I know what to call it. Which is
always good. :)
I probably wasn't 100% intellectually honest in my prev. message, I apologize
for that. I wasn't as much giving new information as trying to drag this
thread into a more information-theoretic OT, hoping some terminology would
trigger interesting replies.
Sounds like you know more about it than anyone else here. So I'm not
sure if I can hold your interest at all. But I might entertain you a
bit with a related story.
When I first learned about the concept of compression, I did some
experimentation. Now, at this point, I had no idea how it was done, or
anything about Huffman coding, or anything like that. But I generally
learn best by experimenting, and that usually means thought
experiments which, aside, also accounts a bit for my lack of "book
smarts". But in any case, I thought about how one might compress data
by algebraic means. I imagined the data as one giant number, and then
considered formula to rapidly approach that number. Exponents of
course grew rapidly, But I figured I needed to grow even faster, so I
"invented" the operation subsequent to "power". I called it, for no
apparent reason: "milk". So
3 milk 3 = 3 ** (3 ** 3)
And I then postulated the next operation as well, which I called
"cheese".
3 cheese 3 = 3 milk (3 milk 3)
Needles to say, 3 cheese 3 is a huge number. I took the final step and
abstracted the set of all such operation starting with addition (O0),
to multiplication (O1), to power (O2), milk (O3), etc. And so I
figured any large number could be represented as a vector Xi, such
that:
Number = Sum(i..0) Oi(Xi+1, Xi)
At that point the problem became trying to salve for this equation
with a given number. I must have filled up every chalk board at my
school working on that. My fellow students thought I had lost my mind!
And I did to as I found myself trying to perform Integration on this
beast! That in fact turned out to be impossible.
Pretty nuts, eh? Of course, I later realized, after solving the
equation manually a number of times, the solutions would be no more
compressed than the original figure. My original idea of "rapid
growth" was flawed. And so it was that I got my first hint of what I
now learn is called "Kolmogorov(-Chaitin)/descriptive complexity".
[1] I'm sensitive to redundancy (as an EE, I've also learned a bit about
it...;). This why I rarely post to ruby-talk anymore; there's very little
surprise in the questions/issues being discussed, and the responses to
new-but-actually-old questions I could give would hardly convey new
information. If everybody killed his messages when they don't pass the
"entropy criterion" the way I do, this ML would have a small fraction of its
current traffic; I cannot ask people to only post things of interest to me
(why would anybody consent to, and how would they know anyway), but I can try
to only post messages that would have interested me. For instance, I was going
to reply "inverse BWT!" to the OP when nobody had responded yet, but I
anticipated somebody would do eventually, so I killed my msg. I don't respond
to questions which are likely to be answered by somebody else (you can call it
proactive global redundancy control by local self-censoring ;).
A nice consequence of this is that it also filters out most unreasonable
content (would I be interested in some ad hominem argumentation? No, so I
should not post any either. The temptation to do so is strong, as some people,
including well-known/knowledgeable posters, use emotionally loaded language
and sometimes defend their positions vehemently.)
sorry again
No need for apologies. No doubt you know that I am far from exemplary
in these affairs, but I believe (or at least hope) that I have gotten
better with time. But with you, well, it almost sounds like you are
playing a game of Prisoner's Dilemma with your posts. And in that
case, I wonder who's winning? ;)
Thanks for the details, btw!
T.
.
- Follow-Ups:
- Re: Just for fun...
- From: Christian Neukirchen
- Re: Just for fun...
- From: Phillip Gawlowski
- Re: Just for fun...
- References:
- Re: Just for fun...
- From: Trans
- Re: Just for fun...
- From: Clifford Heath
- Re: Just for fun...
- From: Rick DeNatale
- Re: Just for fun...
- From: Trans
- Re: Just for fun...
- From: Clifford Heath
- Re: Just for fun...
- From: Trans
- Re: Just for fun...
- From: Clifford Heath
- Re: Just for fun...
- From: Trans
- Re: Just for fun...
- From: Mauricio Fernandez
- Re: Just for fun...
- From: Trans
- Re: Just for fun...
- From: Mauricio Fernandez
- Re: Just for fun...
- Prev by Date: Re: Object/Relational Mapping is the Vietnam of Computer Sci
- Next by Date: Re: code too hashy, I think
- Previous by thread: Re: Just for fun...
- Next by thread: Re: Just for fun...
- Index(es):
Relevant Pages
|
Loading