Re: Re: Kolmorgorov Complexity and Kim Øyhus



> > In order to produce a 100 amino acid long string of poly-methionine a
> > ribsome would need a 300 nucleotide long
> > ATGATGATGATGATGATGATGATGATG....... There is no compression possible,
> > even for repetitive sequence, as long as you make your reference
> > language the genetic code.
> Actually, there is. Smaller protein sequences can be stuck together
> after they are formed from smaller genetic sequences of DNA.

The standard genetic code doesn't provide any such mechanism.
Previously you (or somebody) claimed that Kolmorgorov Complexity K(s)
could be defined, and I (or somebody) pointed out that K9s) presumes
that you've already specified the expression/decoding engine which is
part of the defintion of K(s), and you (or somebody) then said to use
the standard genetic code, which is nonsense in the first place because
K(s) is defined in terms of a mapping from a set of strings into the
same set of strings (so that we can ask the question whether the
compressed expression is shorter than the expressed string), but the
genetic code maps from DNA-base sequences to amino-acid sequences so it
makes no sense to ask whether the DNA-base sequence is shorter than the
amino-acid sequence. But ignoring that, you now claim a mechanism for
sticking together shorter genetic sequences to make long ones, whereby
you can express 100 copies of methionine by some means shorter than 100
copies of some triple that codes for methionine. I'm quite sure the
standard genetic code can't specify that, so you need to tell us what
*other* coding system you are actually using to define K(s) for
DNA-base sequences.

> I'm mostly asking the question about the smallest package needed to
> send a specific sequence from point A to point B - to include the
> language system code needed to compress and decompress the sequence.

That question makes no sense until you first specify what universe of
packages you are talking about (what is a "package"???), how you
measure the size of packages as a number or otherwise compare different
packages along a totally-ordered scale so that it makes sense to talk
about smallest of some set of them, what you mean by "language system
code" (is that written using yet another language you haven't yet
specified), etc. There's an awful lot of undefined stuff here, making
it meaningless to discuss any consequences of such an undefined system.

Let me guess: You envision an encoder which converts a string to a
compressed string, a decoder which converts a compressed string back to
the original string, each of which is specified within some general
framework whereby you first transmit a preface telling what detailed
code you will be using then at the end of the preface you transmit the
actual encoding of the original string followed by the encoding of the
STOP signal to terminate this particular encoding unit. But the
efficiency of such compression system is dependent on the general
framework. If you use a Huffman code, the expression of specific code
isn't very long, but the compressed string isn't much shorter than the
original string, so even that short Huffman preface might be longer
than the reduction you actually got, so adding preface plus
encoded-message plus STOP-message you have more than you started with.
If you use a more complicated system, you might get better compression
of the actual message, but the preface might be immense, totally
overwealming the compression you got. But in any case you have to state
in advance what general framework you are using whereby the preface
expresses the exact coding system for a single message. K(s) depends on
that a priori choice of general framework.

Method 1 (what we were originally talking about):
Fixed coding system: s -enc-> encoded(s) -dec-> s
K(s) = length(encoded(s))

Method 2 (what you are now talking about):
Parameterized coding system: s -opt-> specialCode -enc-> prefix -dec-> specialCode
specialCode: s -enc-> specialCoded(s) -dec-> s
K(s) = length(prefix(s)) + length(specialCoded(s))

> If you make the Turing machine more complicated than the code you are
> sending, ...

That sequence of words is meaningless until after you've told us what
you mean by "more complicated". You're on the verge of a circular
definition.

> What is the smallest code needed to exactly reproduce a given
> sequence

That question has no meaning until after you tell us what coding system
you are talking about. The smallest code to express the entire works of
Shakespeare is only a single character, if the coding system is
appropriately set up, for example:
1 = entire works of shakespeare
0 anything else = anything else
That could even be a single bit to represent the entire works of
Shakespeare.

> to include the code for the language system?

That's not possible. You get infinite regress if you try.
Include coded message, begs question what code was used.
Include code for language system used to code the message, begs
question what meta-language used to express the code for language
system.
Include description of meta-language, begs question what
meta-meta-language used to describe meta-language.
Include specification of meta-meta-language, begs question what meta(3)
langauge is used to express that specification of meta-meta-language.
Include specification of meta(3) language, begs question what meta(4)
language is used to expresss that specification of meta(3) language.

But if you stop at some point, you have to just assume the receiver
already knows a priori what complicated meta(4) language you're using
and how many levels of meta depth is intended.

> In other words, if you wanted to send a particularly long sequence to
> a galaxy far far away, but it would cost $100,000 per character, to
> include the characters needed for the language system to reproduce the
> compressed sequence, what would be the smallest overall code needed to
> do this?

You have no reason to believe the aliens would be able to guess what
coding system you're using, so any attempt is futile.

All actual attempts have sent *extremely* redundant messages with
rather trivial mathematical content, such as product of two primes as
size of message to uniquely specify two-dimensional raster, and rather
simpleminded 2-d artwork that shows something highly expected such as
diagram of big star with small and medium-sized planets lined up from
it.

> It seems that codes that have significant internal symmetry or
> repetitive sequences are usually the easiest to significantly compress
> and decompress with very simple/short instructions.

But you have to specify what machine your instructions run in before
you've said anything meaningful. For example, if you use a Turing
machine, you need to specify the exact program of that machine, and how
the machine takes input and produces output, and how you know when it's
finished generating output so you can say what it was exactly. For
example, if the way your machine works is:
(1) you write a blank tape except the encoded string is in one
contiguous segment of that tape and doesn't contain any blank
characters,
(2) you set the read/write head at the leftmost character of that
encoded string,
(3) you start the machine,
(4) you wait for it to HALT, which it is guaranteed to do eventually,
(5) you verify that the tape is completely blank except for one
contiguous non-blank segment positionned from the read/write head
towards the right,
(6) the decoded string is that non-blank segment you've verified there.
Then you need to say all that, and specify the exact program of
5-tuples (seeChar, prevState, writeChar, moveDir, newState).
If you just wave your hands and say "Turing machine" as your definition
of K(s) we have no idea what you're talking about, no way to know
whether 101010101010 is compressible or not.

Perhaps as an exercise you should actually write a program for a Turing
machine that can de-compress to yield any repeating or mirror-repeating
pattern in the way you imagine it might happen. Then we can state which
particular strings are compressible per that particular machine you've
fully specified.

But my personal advice is to forget about Turing machines in this
context. Use a higher-level language wherein it's easier to express
string-conversion algorithms. Java is popular lately.
..

.



Relevant Pages

  • =?iso-8859-1?q?Re:_Kolmorgorov_Complexity_and_Kim_=D8yhus?=
    ... >>of the string. ... These are different forms of potential compression. ... >>functional system may not be able to sustain such sequence compression ... Chaos theory is about DETERMINISTIC systems which amplify small ...
    (talk.origins)
  • =?iso-8859-1?q?Re:_Kolmorgorov_Complexity_and_Kim_=D8yhus?=
    ... >>> erm, if your string is minimum, then it is compressed for the reference ... The sequence of an otherwise functional string may be ... >> compressed without regard to its function, but this compression will ... When Sean says that his definition of functional complexity includes ...
    (talk.origins)
  • Re: Anyone wanna help with a compression routine (new type)
    ... Pi is an infinite pseudorandom string. ... we know: For every language a part of at least 1 ... The problem is an important one as the compression of an algorithm ... translation by having a language pair. ...
    (sci.math.research)
  • Complexity; was: SQL
    ... >> Consider a language ... The entropy of each character in the string is H ... > sequence to "withstand" a test? ...
    (comp.object)
  • Re: Effective Lossy Compression of Bitstream
    ... we are in the lossless data compression domain of a redundant source, ... given a string consisting of 1's and 0's, ... the sequence can lose some ...
    (comp.compression)