Re: Complex Specified Information - Pitman Formula
- From: hersheyh <hersheyhv@xxxxxxxxx>
- Date: Thu, 26 Jul 2007 18:52:25 -0700
On Jul 26, 6:01 pm, Seanpit <seanpitnos...@naturalselection.
0catch.com> wrote:
On Jul 26, 2:08 pm, hersheyh <hershe...@xxxxxxxxx> wrote:
On Jul 26, 11:57 am, Seanpit <seanpitnos...@naturalselection.
0catch.com> wrote:
On Jul 25, 5:32 pm, hersheyh <hershe...@xxxxxxxxx> wrote:
The reference sequences are determined, by you, ahead of time before
you go out to analyze any other sequences. The reference sequences
are based on non-random strings that are known to be produced by
simple algorithms - like pi or like 0101010 . . .
IOW, you would know it if the SETI signal were repeated digits of pi
in base 10, but would not be able to recognize pi in base 2 or the
other reference you give. Using *your* idea, you would declare any
other signal as "random" and unrelated to the 'reference'. Is *that*
what you claim that SETI is doing?
As I've pointed out many times now, a match to a reference string, by
itself, is not enough to detect ET. A maximum Pitman CSI number does
NOT equal ET or ID for that matter. What it does indicate is non-
random bias. Try to remember this point this time.
'Non-random bias' is apparently nothing more than 'degree of
similarity'. The more similar the 'reference' and 'target' sequences
(with both being nothing other than an arbitrary choice on your part)
are to each other, the higher the Pitman CSI number.
How many times do I have to tell you that there is no "target" string
and that I only choose the reference string, not the test string?
As I remember, 'target' was your term, not mine. But since you prefer
'tested', I will use that term now.
The
test and the reference strings are chosen independently . . . AND I
don't pick the test string(s).
But you do arbitrarily choose the set of 'reference' strings.
Of course, you
do muck it up with your first term, the size of total sequence space,
which is basically irrelevant and tells us nothing of any utility.
It tells the odds of a randomly produced test string ending up with a
match to the reference string.
Not really. The odds of a randomly produced test string ending up as
a match to the reference string is 1/total sequence space size, not
total sequence space minus some value involving hd. The reference
string is already arbitrarily chosen, so the odds of the reference
string is 1. The odds of any other sequence (or even the same
sequence), chosen randomly from a universe of total sequence space
that matches that reference sequence (assuming each sequence is
present only once), is 1/the size of total sequence space. There
would be no subtraction of anything having to do with hd for the
calculation of the odds of some randomly chosen sequence matching a
pre-chosen reference sequence.
The larger the sequence space size, the
lower the odds of a randomly produced match.
I certainly agree that the odds of a match are lower as the size of
total sequence space increases. The odds of picking a match for any
arbitrarily chosen (even randomly chosen) 'reference' sequence (again
assuming that each sequence is present only once) is always going to
be 1/the total number of sequences in sequence space.
But then what is the term involving hd doing as a subtraction? If you
want to express the odds of finding any sequence within x hd units
away from your reference sequence, you would calculate the number of
sequences that are 0, 1, 2, 3, and .... x hd units away from your
reference sequence and *divide* that number by the size of total
sequence space. This would, however, be the same calculation and give
the same result regardless of whether the 'reference' sequence were a
sequence that has some special meaning to you (such as the first 100
digits of pi in base 10) or was randomly chosen from total sequence
space.
But you seem to be saying that by choosing 'reference' sequences that
have meaning to you, you somehow change the odds of finding a match,
be that match a perfect match or one that includes any sequence within
x hd units of the 'reference'. That simply is not true.
So, again, what *exactly* are you measuring here. Why is there this
subtraction and why do you think that by choosing particular
'reference' sequences you are somehow changing the odds of a match?
At best (and I find your equation pretty much meaningless), the hd
part of that equation does show us the degree of similarity of a test
sequence to a reference sequence. But there are certainly better ways
to do that.
Now, is it possible for a set of reference strings to miss a non-
random sequence?
Huh? You aren't claiming that your set of reference strings are able
to *identify* a 'random' sequence at all
The method isn't set up to identify random sequences, but non-random
biased test sequences.
It looks like it is set up to identify the degree of identity between
two sequences with the addition of a term that looks like a
calculation of total sequence space thrown in for no apparent reason.
But do explain what each part of your equation is measuring and why it
is relevant. And why you think that your choice of 'reference'
sequence changes the odds of something or other.
unless your claim is that you
can identify and use all 'non-random'
(whatever that means) sequences
as your reference set.
I specifically explained that identification of all non-random or
"biased" sequences is impossible.
I say whatever you mean by 'non-random'
because the numbers in the sequence for pi are about as random as they
come.
That's not true. The numbers in the sequence for pi have a uniform
distribution, but the sequence itself is not random. It is perfectly
predictable and computable.
Take any stretch of numbers in pi and see if they can predict
any other similarly sized non-overlapping stretch of numbers in pi.
There is no repeatablility in pi and thus the string of numbers in pi
is pretty much *random* despite being predictable by a simple
algorithm.
The definition of "random" is "non-predictable". Therefore, since pi
is in fact predictable, it is non-random.
Pi is calculated. The sequence of numbers in pi is not predictable as
a sequence.
And simple algorithms produce fractals.
Fractals are not random.
And a pattern of
random mutation and neutral drift over time would also not produce a
*predictable* determinative single result.
There is no "also" since random mutations and neutral drift would
produce a non-predictable result - unlike fractals that are produced
by repetitions of a simple algorithm.
If you ran the experiment
over again you would get a different result for such a process.
If the result is not predictable, it is random from that perspective.
And the sequence of pi is random in that you cannot predict one
stretch from another. There is no repetitiveness.
All your numerology can *really*do is identify whether or not a
sequence is reasonably close to one or another of the sequences you
chose to be 'reference' sequences; that is, it identifies degree of
similarity.
That's right. And this greater degree of similarity of the unknown
compared to the known is good evidence of non-random biased production
of one or the other or both.
Then what the heck do you need the term that involves total sequence
space as an addition (as opposed to a division, where it would at
least make *some* sense).
There actually are programs that can do this much better. And they
can handle more than a single 'reference' and a single 'target'.
My program can handle as many references and targets as your computer
can hold.
In
fact, they can arrange sequences in a nested hierarchy of similarity
on the assumption of the number of single event changes required to
produce a pattern. They come with terms like "maximal parsimony".
And these programs are actually used to identify the nature of changes
in actual proteins (usually controlled for function) in actual
organisms. Again, the problem for your brand of creationism is not,
in general, the rarity of 'novel' functions. Those are very rare
indeed. It is the vast amount of difference that is selectively
effectively neutral but which produces change in patterns *of
similarity* that are so closely related to each other that they
*cannot* be due to chance. The only non-chance explanations that make
sense are historical (and largely vertical) descent, which requires
the time-frames that geology gives us, and deliberate deception by a
designer, if the time-frame is too short.
I agree that the similar patterns are not due to chance - that they
are indeed biased since they have a relatively high CSI value. Of
course, as explained before, a high CSI, by itself, says nothing about
the likely origin of the bias. Your assumption that the bias was the
result of random mutation and function-based selection is not the only
option for the production of bias.
Well, again, I have specifically not ruled out a particularly
malicious designer intent on misleading us into thinking that the
process is historical descent (which largely involves random mutation
and neutral drift at the sequence level). Again, the pattern
*specifically* looks at proteins that all have the *same* function, so
selection is effectively irrelevant or a minor feature of the
pattern. And the pattern that arises is the one that would be
predicted by random mutation (which certainly exists) and neutral
fixation (which is unpreventable *except* by selection). That the
*same* pattern repeats again and again and again for different
proteins, each with a different function, shows that the pattern is
not strongly related to *function*.
Certainly! In fact, it is impossible to rule out
this possibility. No one can do it - not SETI scientists, not
anthropologists, biologist, chemists, physicists, or even IDists. No
one. It is impossible.
IOW, you will, in fact, generate *many* false negatives where you will
claim that some 'target' sequence cannot be derived from any of the
'reference' sequences you have tested because you have no idea what
'reference' sequences to use and are simply pulling them out or yer
arse in the first place.
The false positives will be extremely few relative to the true
positives. That's the strength of the CSI calculation. Again, the
reference sequences are chosen before the test sequences are presented
- completely independently.
Let's try the reference sequence for pi in
base ten! No. I don't see any signal sufficiently close to that.
But, if you did happen to see a radiosignal coming from outer space,
you would know that this signal was not the result of some random
process. That would be very useful if it ever happened.
Searching for a needle in a haystack is very difficult if you don't
have a magnet.
Again, you have to be able to detect significant bias before you can
hope to detect any kind of artifact like ETI or ID of any kind.
So, again, the *fact* that all the different beta globins of
hemoglobin in many different organisms have highly significant
similarity/identity tells me what? And that they don't have identical
CSI despite having the same function in different organisms tells me
what? And that the pattern of changes in sequence that is far and
away the best fit for the observed sequence differences also largely
fits the branching pattern of other proteins and also the
morphological branching (humans and chimps closest, other primates
more distant, reptiles more distant, etc.) proposed by historical
divergences tells me what? Oh, I know...all of this was designed by
an evil designer to fool us into thinking historical descent.
Let's try the reference sequence for pi to the base two. No. I don't
see any signals close to that. Let's try pi to the base seven...
There are many different references you can include - not just based
on pi. It is just that they have to be independently derived.
Can you choose a random sequence as the 'reference'? How would that
change the CSI calculation? How would it change the odds of any
randomly chosen sequence being within x hd units away (a value which
your CSI does NOT calculate)?
After you have your set of reference strings, you can compare incoming
sequences to your set of reference sequences to see if the incoming
sequences is likely to be non-random in origin.
Again, you would only be able to detect 'targets' that were near
enough to your *biased* selection of 'reference' sequences to register
as 'sufficiently close'.
That's right . . .
And more importantly you are NOT, repeat NOT, determining anything
about the 'randomness' of the 'tested' sequences. You are ONLY
determining how close they are to one or another of your 'reference'
sequences. Closeness to a 'reference' sequence is NOT a measure of
the randomness of the 'target' sequence. It is ONLY a measure of
similarity between the two sequences.
Similarity between two independently derived sequences is in fact
evidence of bias.
NO. It is a measure of similarity or dissimilarity. But if by bias
you mean similarity or dissimilarity, why call it 'bias'?
This is in fact your own conclusion when you see
similarities between biological sequences. You assume a non-random
biased origin.
When I see similarity between biological sequences, I see
*similarity*. One possible explanation of *similarity* is common
ancestry. Another is common design. But in addition to *similarity*,
I also see differences. That is proteins that perform the same
*function* can differ from 'not at all' to 'so much that they don't
look like the same sequence at all' (usually, of course, without
changing structure nearly as much). Moreover, I can examine the
pattern of changes and determine the pathways that would produce those
differences that is most parsimonious. And I can do that again for a
different protein. And I can do it for morphology (but less well).
The most parsimonious explanation consistent with all this evidence is
common descent. For sequence information, specifically, the proposed
mechanism is largely mutation (which certainly exists) and neutral
fixation over long time frames. This is a process which *certainly*
happens in the absence of selection to prevent it. Selection actually
works, largely -- but crucially, not always, to *prevent* evolutionary
change. Sequence change, if any fraction of a sequence is selectively
neutral (and the existence of many different sequences that have the
same function is clear evidence that a substantial amount of effective
neutrality exists), cannot be prevented given sufficient time.
You can't have it both ways Howard. If you yourself use sequence
similarity as evidence of common origin then you can't argue that
sequence similarities are not a measure of non-random origin.
The question is not that one can or cannot measure sequence
similarity. One certainly can (and that is the basis of sequence
homologies that support common descent -- or a deceiver designer). It
is whether what *you* are measuring with *your* CSI formula has any
meaning at all, other than some vague resemblance to measuring
something like similarity. I look at your formula as either a botched
attempt to re-invent the wheel and have some measure of 'sequence
similarity' you can call CSI or some bizarre idea that what you are
measuring really does change the odds depending on your arbitrary
choice of 'reference' to be just those sequences that have meaning to
you. Frankly, I don't have a clue as to what you think that formula
does. It looks like GIGO designed (and I know who this designer is
and what his motivations are) to produce appropriate hypothetical
numbers to the designer's need of the moment to me.
This really isn't that hard Howard. You're turning yourself into a
pretzel here.
No I'm not. I have explicitly said that there already are good
measures of sequence similarity, that they already have been used and
are used, and that I don't think your formula is worth ***.
< snip rest >
Sean Pitmanwww.DetectingDesign.com
.
- Follow-Ups:
- Chex Wat: Pi is "random" and "not predictable"?
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Chex Wat: Pi is "random" and "not predictable"?
- References:
- Complex Specified Information - Pitman Formula
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Re: Complex Specified Information - Pitman Formula
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Re: Complex Specified Information - Pitman Formula
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Re: Complex Specified Information - Pitman Formula
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Re: Complex Specified Information - Pitman Formula
- From: Seanpit
- Re: Complex Specified Information - Pitman Formula
- From: hersheyh
- Re: Complex Specified Information - Pitman Formula
- From: Seanpit
- Complex Specified Information - Pitman Formula
- Prev by Date: Re: A Creationism Textbook that is Completely Acceptable For Public
- Next by Date: Re: What's more important, self-organzation or evolution?
- Previous by thread: Re: Complex Specified Information - Pitman Formula
- Next by thread: Re: Complex Specified Information - Pitman Formula
- Index(es):