Predicting the Future and Kolmogorov Complexity



On Mar 12, 7:05 pm, "R. Baldwin" <res0k...@xxxxxxxxxxxxxxxxxxxx>
wrote:

Again, this sort of setup does not help you predict the future
additions to the sequence or pattern. KCC, as far as I understand it
so far, has at least something to do with being able to successfully
predict the future or as yet unknown additions to a particular
finite sequence or pattern based on what has come before.

No, KCC is not about being able to predict additions to a sequence or
pattern. Kolmogorov Complexity has a very strict definition that you seem
immune to understanding. Nevertheless, I will try to get it across to you
one more time.

Given a Universal Turing Machine, the Kolmogorov Complexity of a string is
the length of the shortest program on that UTM that produces the string.

That's it. The length. Period. Nothing about predicting the sequence.

Consider the basic reason why those like Martin-Lof, Claus-Peter
Schnorr, Kolmogorov and Chaitin started thinking about the notion of
"randomness". It seems to me that it all started off on the question
of being able to predict the future nature of an as yet unknown
portion of a sequence or pattern based on what was already known.
This problem is described in terms of gambling - of not being able to
make money betting on a truly random sequence while being able to do
so when betting on a non-random sequence.

KCC does in fact play a significant role in this concept. Those like
Wallace and Dowe (see following excepts from their interesting 1999
paper) show the connection between KCC and sequence prediction:

C.S. Wallace and D.L. Dowe, Minimum Message Length and Kolmogorov
Complexity, The Computer Journal, Vol. 42, No. 4, 1999.

http://citeseer.ist.psu.edu/cache/papers/cs/13959/http:zSzzSzwww3.oup.co.ukzSzcomputer_journalzSzhdbzSzVolume_42zSzIssue_04zSzpdfzSz420270.pdf/wallace99minimum.pdf

___________

Suppose a string S has been observed, and we are interested in the
relative probabilities of two alternative future events, representable
respectively by S1 and S2. The ratio of their probabilities is
approximately

Q(S:S1)/Q(S:S2).

As each of the probabilities is this ratio is bounded above by
PT(S:Si) x Cq,t (for i = 1 or 2), the ratio may be approximated by

PT (S:S1)/PT(S:S2).

If PK/T () is used, the approximation will be crude at best, since for
all S, PK/T(S) is by definition a negative power of two. For either
AC-based probability, the possibility exists that P(S) greatly exceeds
Q(S) for some strings which happen to have a computationally simple
form (e.g. 010101 . . .), so no useful hard bounds can be placed on
the approximation error. However, for a wide class of computable
distributions over strings, there is a high probability approaching
one for very long strings that the two-part input form described above
will be among the shortest inputs producing S, and hence a high
probability that the approximation is quite close.

It is this universal ability of the complexity-based probabilities to
approximate the probability ratios of effectively computable
distributions which justifies their use in prediction. . . .

Informally, a random source of binary digits provides an infinite
stream of digits such that knowledge of the first N digits provides no
information about the next digit and which produces ones and zeros
with equal frequency. A property of an efficient (Shannon) code is
that if a sequence of events is randomly drawn from the probability
distribution for which the code is efficient, and the code words for
these events concatenated, then the resulting stream of digits comes
from a random source. Note that this statement is about the randomness
of a source or process for producing binary digits, not about the
randomness of any finite string. The randomness of a finite string is
usually assessed by asking whether any standard statistical test would
cause us to reject the hypothesis that the string came from a random
source. If the string indeed came from a random source such as an
efficient binary coding of an event sequence, the test(s) will of
course sometimes result in our falsely rejecting the randomness
hypothesis in just those cases where the source happens by chance to
give a string exhibiting some pattern.

Complexity theory provides a different definition of randomness
directly applicable to finite strings. A string S is random w.r.t a
UTM T if KT (S) > or = #S - c, where c is a small constant chosen to
impose a 'significance' requirement on the concept of non-randomness,
larger values of c requiring a lower complexity before a string is
accepted as non-random. It has been shown that this definition is not
in conflict with the ordinary notion of randomness. Another way of
stating this definition is that S is random w.r.t. T if the code
defined by T provides no coding of S which is significantly shorter
than S. That is, S is not compressible by T. . . .

If S comes from a non-random source, its early digits give some hint
as to the values of later digits. In this case, T may be given an
input comprising a fixed-length program to computer the probability
(in the usual sense) of the Nth digit's being one as a function of the
earlier digits and then to decode a (Shannon) optimal code for the Nth
digit based on this probability. Whenever the probability differs
from one half, the optimal code will be able to code the next digit
with less than one bit on average. Provided S in long enough, this
input will be shorter than S, and so S will be declared non-random
w.r.t. T.

Similar remarks apply to the two-part MML code. It too is redundant,
since the data S may be encoded using the assertion of any usable
hypothesis H for which f(S|H) > 0. Hence compressibility is a relevant
concept, and MML chooses H to maximize compression . If the S is
incompressible, it is regarded as random, i.e., having no pattern
expressible within the set of possible hypotheses. . . .

We have argued that, as applied to the inference of models or theories
form data, there is no essential difference between Kolmogorov
complexity and minimum message length approaches. They differ only in
the choice of reference Turing machines and in the attention given to
this choice.

The universality of UTM (T) ensures that for all strings S, the
difference between the complexities of S with respect to the UTMs T1
and T2 is bounded above by a constant independent of S, namely the
length of the longer of the programs required to make T1 and imitate
T2 and to make T2 and imitate T1. . .

As with Kolmogorov complexity, Solomonoff probabilities are defined
with respect to some particular UTM, but universality guarantees that
different choices of UTM will affect the odds between future events
only by a factor with bounds independent of S, E1 and E2. . .

We now seek a string I = H: A where the first part H specifies an
hypothesis about the source S (normally selected from a limited set of
possible hypotheses) and the second part A is an encoding of the data
using a code which would be optimally efficient were the stated
hypothesis true. By 'optimally efficient' we mean a code such as a
Huffman or arithmetic code which minimizes the expected length of the
coded string.

The aim in this stream is to find the hypothesis H which leads to the
shortest such string I, which may be regarded as the shortest message
encoding the data given in S. For this reason, the technique is termed
minimum message length (MML) or minimum description length (MDL)
inference. The motivation for this aim is that the MML hypothesis can
be shown to capture all of the information in S which is relevant to
the selection of an hypothesis and this method of inference
automatically embraces selection of an hypothesis of appropriate
complexity as well as leading to good estimates of any free parameters
of the hypothesis. If no I exists with #I < #S, then there is no
acceptable hypothesis for the data, at least within the set considered
possible, and the data is concluded to be random noise. . .

Solomonoff explicitly relates complexity and probability. . . The
algorithmic complexity KT (S) may also be used to define an
unnormalized probability. This probability does not correspond to the
probability of an easily-defined event and is always less than the
Solomonoff probability PT (S). However, it may reasonably be
considered as a conservative estimate of PT (S), since it will often
be only a little smaller."

_____________


Consider also the following excepts concerning the ideas of those like
Martin-Lof dealing with the concept of randomness and prediction of
future sequences based on the past - of making money gambling on what
will come next:

"Martin-Löf's original definition of a random sequence was in terms of
constructive null covers; he defined a sequence to be random if it is
not contained in any such cover. Leonid Levin and Claus-Peter Schnorr
proved a characterization in terms of Kolmogorov complexity: a
sequence is random if there is a uniform bound on the compressibility
of its initial segments. Schnorr gave a third equivalent definition in
terms of martingales (a type of betting strategy). . .

The martingale characterization conveys the intuition that no
effective procedure should be able to make money betting against a
random sequence. A martingale d is a betting strategy. d reads a
finite string w and bets money on the next bit. It bets some fraction
of its money that the next bit will be 0, and then remainder of its
money that the next bit will be 1. d doubles the money it placed on
the bit that actually occurred, and it loses the rest. d(w) is the
amount of money it has after seeing the string w. Since the bet placed
after seeing the string w can be calculated from the values d(w),
d(w0), and d(w1), calculating the amount of money it has is equivalent
to calculating the bet. The martingale characterization says that no
betting strategy implementable by any computer (even in the weak sense
of constructive strategies, which are not necessarily computable) can
make money betting on a random sequence."

http://en.wikipedia.org/wiki/Algorithmically_random_sequence

Also, consider the following passage:

________

"When we toss a coin a number of times we are accustomed to thinking
of the resulting sequence of heads and tails as a random sequence.
Thus we would consider any sequence that is the result of tossing a
coin a sequence of times to be random. This approach makes the
sequence HHHHH no less random than the sequence HTHHTH. However, most
people would consider the second sequence a random sequence but not
the first.
This leads some to believe that people do not understand what random
means. However, it led the founder of modern probability Kolmogorov,
and others (Chaitin, Solomonov, and Martin Lof) to try to say what it
should mean to say that a specific sequence is random.
Martin Lof took the approach that a sequence of heads and tails should
be considered random if it would pass a set of statistical tests for
randomness such as: the proportion of heads should be near 1/2, there
should not be too many or too few runs of heads or tails etc. That is
a sequence is random if it is a typical sequence in the sense that it
would not be rejected by standard tests of randomness. Kolmogorov,
Chaitin and Solomonov took an apparently different approach. They say
a sequence of heads and tails is random if it is "complex", meaning
that the shortest computer program you can write to produce the
sequence is about as long as the sequence itself. The Martin Lof
approach was shown to be equivalent to the Komogorov approach, and
Beltrami restricts himself to the Kolmogorov approach.
Here is a connection between complexity and entropy. Entropy as
defined by Shannon, is a measure of the uncertainly in a chance
experiment. It is defined as -sum p(i)log(p(i) where the sum is over
all possible outcomes i of the chance experiment and the log is to the
base 2. For a single toss of a biased coin with probability p for
heads and q for tails the entropy is -plogp -qlogq = -log(pq). This
entropy is maximum when p = 1/2 when it has the value 1. For a fair
coin tossed n times, the entropy is n and, for a biased coin tossed n
times, the entropy is less than n.
Shannon was interested in the problem of encoding sequences for more
efficient transmission. He showed that the expected length of the
encoded sequence could be at most the entropy, and there is an
encoding that achieves this. Writing a computer program to produce
sequences of H's and T's of length n is a way to encode a sequence.
Thus Shannon's coding theorem is consistent with the fact that most
sequences produced by tossing a fair coin are random but this is not
true of the tosses of a biased coin."
http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_8.08.html

< snip >

"Martin-Löf randomness has since been shown to admit many equivalent
characterizations -- in terms of compression, randomness tests, and
gambling -- that bear little outward resemblance to the original
definition, but each of which satisfy our intuitive notion of
properties that random sequences ought to have: random sequences
should be incompressible, they should pass statistical tests for
randomness, and it should be difficult to make money betting on
them. . .

The martingale characterization conveys the intuition that no
effective procedure should be able to make money betting against a
random sequence. A martingale d is a betting strategy. d reads a
finite string w and bets money on the next bit. It bets some fraction
of its money that the next bit will be 0, and then remainder of its
money that the next bit will be 1. d doubles the money it placed on
the bit that actually occurred, and it loses the rest. d(w) is the
amount of money it has after seeing the string w. Since the bet placed
after seeing the string w can be calculated from the values d(w),
d(w0), and d(w1), calculating the amount of money it has is equivalent
to calculating the bet. The martingale characterization says that no
betting strategy implementable by any computer (even in the weak sense
of constructive strategies, which are not necessarily computable) can
make money betting on a random sequence."

http://en.wikipedia.org/wiki/Algorithmically_random_sequence

So? What point are you trying to make? Martin-Löf randomness is not exactly
the same thing as Kolmogorov randomness. Finite numbers can be random as
defined by Kolmogorov-Chatin-Solomonoff.

Hint: in the article you cite (which is primarily about Martin-Löf) we have:

"No random sequence is decidable, computably enumerable, or
co-computably-enumerable." and "Every random sequence is normal."
Yet, finite strings cannot be normal, and they are decidable.

Just because a finite string cannot be normal doesn't mean that it
cannot be algorithmically random.

"Finite-length strings are allowed to be algorithmically random by
definition. Any finite-length string can be taken as a rational number
in some base, and rational numbers cannot be normal in any base. . .
Algorithmically random strings are either computable and
deterministic, or uncomputable."

http://www.talkorigins.org/faqs/information/algorithmic.html

< snip >

Sean Pitman
www.DetectingDesign.com


.


Quantcast