Re: Complex Specified Information - Pitman Formula



On Jul 25, 7:07 pm, "R. Baldwin" <res0k...@xxxxxxxxxxxxxxxxxxxx>
wrote:
"Seanpit" <sean...@xxxxxxxxx> wrote in message

news:1185376351.482385.23180@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

On Jul 24, 8:20 pm, "R. Baldwin" <res0k...@xxxxxxxxxxxxxxxxxxxx>
wrote:

No, because the reference strings are chosen independent of the test
strings. Therefore a significant match between a reference and a test
string is good evidence of non-random production.

That is not necessarily true. It is good evidence that the test string
was
not produced by a stationary random process with Uniform distribution,
which
is a much more restricted case.

If a perfect match happened to be to a reference string that had no
regular character repeats, like pi, this would be good evidence the
test string was not produced by a random process with uniform or non-
uniform distribution.

I note that you've correct this to "no regular character repeats".

I would accept that, happening upon the digits of pi, one probably has an
artificial pattern.

I would not accept that the lack of regular character repeats in general
implies what you say it does. Computable transcendental numbers are a
special case. Most real numbers are algorithmically random, as defined by
Chaitin, and there are no finite algorithms to compute their digits. These
are the numbers that lack regular character repeats.

Although most real numbers are algorithmically random, finding a match
to one would be just as significant as finding a match to pi - to the
same number of digits. A match in either case to a independently
established reference string would indicate some form of non-random
bias.

Shannon information is determined by reference to a known "random"
source of string production - a source that produces maximum Shannon
information.

No, no, no. That is very badly wrong. As I said above, Shannon modeled
information sources as if they were random variables. Specifically, Markov
random variables. That does not mean information sources really are Markov
random variables. With very few exceptions, they are *not* Markov random
variables. Shannon's theory only *pretends* information sources are Markov
random variables because it makes the math easy, and it is a decent
approximation that works pretty well.

Just because the reference is a "pretend" or hypothetical reference
doesn't mean it isn't a reference. A pretend reference is still a
reference.

Furthermore, the amount of information produced by an information source
depends on how surprised a receiver will be. This depends on the relative
probabilities of the different symbols the information source can produce.

You mean the "pretend" information source? Right? Again, the amount
of receiver "surprise" depends upon the receiver's comparing what is
received to what is expected to be produced by the pretend information
source. And, there you have it - a reference is indeed required.

There *is* no maximum on Shannon information. If you want more information,
just watch the random variable for a longer time.

There is maximum Shannon information for a set finite period of time -
that's the point.

It is the information *entropy*
(average information) that can have a
maximum.

You mean informational entropy has a maximum regardless of the period
of time involved. That's because there is in fact maximum SI for each
point in time. If there weren't, you couldn't calculate an average
over a span of time.

The entropy is maximum if the
information source produces symbols
equiprobably.

Exactly. . . Again reference to this "source" is required.

For a binary source, this means producing either 0's or 1's
with a 50% probability. For a decimal source, this means producing any of
the 10 digits with a 10% probability.

Right . . . Which is simply assumed to be the case via use of an
imaginary source that actually does this. In real life, however, this
cannot be perfectly assumed.

Shannon's theory also defines the information as it is received, distinct
from the information that left the transmitter, when noise is present.

Yes - also assuming the character of the "noise".

"In the Shannon approach, however, the method of encoding objects is
based on the presupposition that the objects to be encoded are
outcomes of a known random source. It is only the characteristics of
that random source that determine the encoding, not the
characteristics of the objects that are its outcomes."

http://homepages.cwi.nl/~paulv/papers/info.pdf

That quote is about Shannon's Coding Theorem, and is not relevant to the
definitions of or calculations for information or entropy under Shannon's
Mathematical Theory of Information. The quote is a reference to a means for
recoding the output of a random variable to maximize its entropy and
optimize channel usage.

I don't see where you get this from this passage since the preceding
passage reads:

"Both theories aim at providing a means for measuring
'information'. They use the same unit to do this: the bit. In both
cases, the amount of information in an object may be interpreted as
the length of a description of the object."

This sounds to me like the authors are indeed talking about
definitions and measurements of "information". It is just that
Shannon information is concerned with the source (pretend or not)
while Kolmogorov complexity is concerned with the resulting string or
"object".

This means that Shannon information is more about the type of source
it will take to transmit a particular type of string rather than the
string itself.

No. Shannon information is about the symbol probabilities of the source in
question (not the "type" of source), the rate at which they are delivered,
and the interest of the receiver. You need a receiver attempting to copy the
information source for information to exist at all, in Shannon's model.

In order to propose symbol probabilities, you have to propose
something about a source that is able to produce said probabilities.

So, to transmit a number like Pi, where all the
symbols seem to appear with equal frequency, the source needed to
transmit a sequence like pi will have to be able to produce all
possible numbers with a similar character frequency.

No. That is absolute hogwash. A PC hooked up to the Internet can output any
8-bit character frequency pattern you program into it.

Yes, and it can also output pi to the same number of digits.

That is not a
requirement on a source in able to produce the digits of pi. You simply have
to program a source to produce the digits of pi, in whatever numeral system
you decide to use. If it produces pi, the symbols output while pi is running
will tend toward uniform distribution simply because pi has that property.
It has nothing to do with the abilities of the source.

"The fundamental problem of communication is that of reproducing
at one point either exactly or approximately a message selected at
another point. Frequently the messages have meaning; that is they
refer to or are correlated according to some system with certain
physical or conceptual entities. These semantic aspects of
communication are irrelevant to the engineering problem. The
significant aspect is that the actual message is one selected from a
set of possible messages. The system must be designed to operate for
each possible selection, not just the one which will actually be
chosen since this is unknown at the time of design."

So, you see, if a system must be able to operate regardless of if pi
was chosen or some other number with equal probability, it must be set
up to handle all possibilities that could be chosen, at random.
Therefore, from the perspective of a receiver who does not yet know
what sequence is going to be sent, the receiver has to be able to
receive not only pi, but all other sequences that are equally
probable.

In other words,
this source must be able to produce not only pi, but all possible
numbers in infinite sequence space - to include truly "random" and
"non-computable" numbers like sigma.

Utter nonsense. Pi is a computable transcendental. A finite algorithm can
produce as many digits of pi as you like. Sources that produce the digits of
pi, to any arbitrary precision, can be realized by finite algorithms. That
is not true for uncomputable numbers.

That's true, but it seems like you are moving into KC here. Pi is
only one of an ensemble of possible messages where all messages are
equally probable. The fact that pi can be compressed into a simple
algorithm seems irrelevant from the perspective of SI.

"Shannon's classical information theory assigns a quantity of
information to an ensemble of possible messages. All messages in the
ensemble being equally probable, this quantity is the number of bits
needed to count all possibilities. This expresses the fact that each
message in the ensemble can be communicated using this number of bits.
However, it does not say anything about the number of bits needed to
convey any individual message in the ensemble."

http://homepages.cwi.nl/~paulv/papers/info.pdf

It seems like you are trying to do just that. In your argument for pi
being compressible to a finite algorithm, it seems like you are
arguing that the number of bits needed to convey pi to the ensemble is
smaller than pi. While this is true, Shannon information theory "does
not say anything about the number of bits needed to convey pi". That
notion seems to require that one move into the realm of KC.

By the way, where you wrote "sigma" did you possibly mean "omega"?

Yes . . . Omega.

Again, it is all about the source or the reference that is chosen.

Strictly speaking, with a single string,
the Shannon information is only
estimated..

That's true, but the estimate is based on the type of source needed to
produce such a string.

No. It is based on the measured digits in a realized string. That allows an
estimate of the probabilities of the symbols produced by the source.

I'm not sure I follow. It almost seems like you are saying what I just
said in a different way.

Sean Pitman
www.DetectingDesign.com

.



Relevant Pages

  • Re: Excel Function?
    ... =LEFTassuming that the input is a text string, not a number ... data cell is "001122". ... The first two digits represent one value, ... Now I need to reference each value independent of the other ...
    (microsoft.public.excel.worksheet.functions)
  • Re: String Reference Type
    ... All unary and binary operators have predefined implementations that are ... Therefore its always allocated in the heap and a variable of string ... As with all classes in this case y and x both reference the same String ... language depandant matter as below. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Abstract class variables question
    ... But as I think you've seen elsewhere in this thread, a value type can exist inside a class and in that case the value type is stored in the heap with the rest of the class instance. ... But as far as the "faster" goes, yes...to some extent value types have less overhead than reference types, and so can perform better in certain cases. ... Well, that would be true for a string object too, if there was any way to actually change a string. ... Seriously though, it is practically always the case that when you are writing an assignment to a reference, you're replacing the reference held by the variable. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Measurement of pitch
    ... as the method used by the Pythagoreans. ... of these reference units in the quantity to be measured. ... vibrating string seems as good as anything. ... The string or pendulum in question could no doubt be specified exactly, ...
    (sci.physics)
  • Re: Abstract class variables question
    ... I think I understand boxing a little better now. ... the object that is on the heap. ... value types are copied to the heap and made into an object and reference ... String types are already reference types and all we are doing when we do ...
    (microsoft.public.dotnet.languages.csharp)