Re: Complex Specified Information - Pitman Formula



On Jul 26, 9:52 pm, hersheyh <hershe...@xxxxxxxxx> wrote:
On Jul 26, 6:01 pm, Seanpit <seanpitnos...@naturalselection.

0catch.com> wrote:
On Jul 26, 2:08 pm, hersheyh <hershe...@xxxxxxxxx> wrote:

On Jul 26, 11:57 am, Seanpit <seanpitnos...@naturalselection.

0catch.com> wrote:
On Jul 25, 5:32 pm, hersheyh <hershe...@xxxxxxxxx> wrote:

[snip]

The method isn't set up to identify random sequences, but non-random
biased test sequences.

It looks like it is set up to identify the degree of identity between
two sequences with the addition of a term that looks like a
calculation of total sequence space thrown in for no apparent reason.
But do explain what each part of your equation is measuring and why it
is relevant. And why you think that your choice of 'reference'
sequence changes the odds of something or other.

[snip repetitive stuff]

And more importantly you are NOT, repeat NOT, determining anything
about the 'randomness' of the 'tested' sequences. You are ONLY
determining how close they are to one or another of your 'reference'
sequences. Closeness to a 'reference' sequence is NOT a measure of
the randomness of the 'target' sequence. It is ONLY a measure of
similarity between the two sequences.

Similarity between two independently derived sequences is in fact
evidence of bias.

NO. It is a measure of similarity or dissimilarity. But if by bias
you mean similarity or dissimilarity, why call it 'bias'?

This is in fact your own conclusion when you see
similarities between biological sequences. You assume a non-random
biased origin.

When I see similarity between biological sequences, I see
*similarity*. One possible explanation of *similarity* is common
ancestry. Another is common design. But in addition to *similarity*,
I also see differences. That is proteins that perform the same
*function* can differ from 'not at all' to 'so much that they don't
look like the same sequence at all' (usually, of course, without
changing structure nearly as much). Moreover, I can examine the
pattern of changes and determine the pathways that would produce those
differences that is most parsimonious. And I can do that again for a
different protein. And I can do it for morphology (but less well).
The most parsimonious explanation consistent with all this evidence is
common descent. For sequence information, specifically, the proposed
mechanism is largely mutation (which certainly exists) and neutral
fixation over long time frames. This is a process which *certainly*
happens in the absence of selection to prevent it. Selection actually
works, largely -- but crucially, not always, to *prevent* evolutionary
change. Sequence change, if any fraction of a sequence is selectively
neutral (and the existence of many different sequences that have the
same function is clear evidence that a substantial amount of effective
neutrality exists), cannot be prevented given sufficient time.

You can't have it both ways Howard. If you yourself use sequence
similarity as evidence of common origin then you can't argue that
sequence similarities are not a measure of non-random origin.

In fact, I am explicitly saying that your equation is worthless and a
re-invention of the wheel, except for the minor detail that your re-
invented 'wheel' (your formula) is square and has no axle and is thus
worthless as a wheel. I have no problem at all with identifying a
degree of sequence similarity that is non-chance (that is, is unlikely
to be due to chance but, instead, must have a *causal* explanation
that links the 'test' sequence to a particular 'reference' sequence).

In fact, dope slap to me, looking at sequence identity or homology is
one of the first things that *real* biologists do when they get the
sequence of a 'new' or 'novel' protein. They perform a BLAST or
similar sequence analysis to compare the new or 'test' sequence to the
population (and I mean population and not sample) of sequences that
have already been analysed. This is like comparing a new word, say
"bull***", to a dictionary of English words and seeing if there are
any sequence matches. I suspect that you would find at least two
words that have statistically significant similarity to parts of this
'test' word. That is, scientists compare the new sequence to a
dictionary of sequences that have already been found to be present
(and which are often useful, although present is quite good enough) in
other living organisms. This is, in fact, a comparison to a
'reference' collection of sequences. They set parameters to exclude
any matches which are due to chance (for proteins, this amounts to
sequence identity significantly higher than about 15-20% identity; the
reason why this is higher than the 5% you would use is because aa's
are not present in equimolar amounts in real proteins, but since this
makes it *more* difficult to get a significant match, you can hardly
complain) and look for stretches of significant matching within a
sequence as well as overall matching.

Not surprisingly to me, but apparently to you, very often *real*
scientists find significant or highly significant matches (non-chance
relationship between sequences) between new sequences and previously
recorded sequences. There are certain general features to these
matches. First, evolutionary closeness of the organisms in time since
divergence is the most significant factor. If one examines any
sequence (functional or not) in humans, there is an extremely high
probability of sequence in chimpanzees that will be highly significant
in similarity to the point where that similarity cannot possibly be
attributed to chance. The further back the divergence of the
organisms (in standard evolutionary terms), the greater the degree of
sequence dissimilarity, even for proteins that perform the very same
function in all the organisms. At some point, identifying similarity
becomes difficult for *some* sequences that perform a particular
function. However, often these proteins have retained similar
*structure* in addition to *function*. That is why *sequence* is less
informative for *function* than *structure* is.

Moreover, they *often* find *sequence* similarity (often in smaller
patches or moieties within a larger sequence) even in proteins that no
longer perform the same *function* (the two globins of hemoglobin and
myoglobin for example, or the different flagellin proteins of the
eubacterial flagella). Typically this similarity is more pronounced
when one looks at structure. Some of the *sequence* differences in
*these* comparisons are undoubtedly due to selection for the small
number of sites where selectively favored change occurred. But much
of the difference is undoubtedly just more of the random fixation of
neutral changes that selection does not affect nor prevent.

Now tell me again why your formula is better than using a BLAST
program to search for similarity of a test sequence (any new sequence)
with a reference dictionary of sequences that have already been
discovered in organisms? What will you learn by your formula that
would not be better analysed by the existing systems for identifying
*statistically significant* similarity?

The question is not that one can or cannot measure sequence
similarity. One certainly can (and that is the basis of sequence
homologies that support common descent -- or a deceiver designer). It
is whether what *you* are measuring with *your* CSI formula has any
meaning at all, other than some vague resemblance to measuring
something like similarity. I look at your formula as either a botched
attempt to re-invent the wheel and have some measure of 'sequence
similarity' you can call CSI or some bizarre idea that what you are
measuring really does change the odds depending on your arbitrary
choice of 'reference' to be just those sequences that have meaning to
you. Frankly, I don't have a clue as to what you think that formula
does. It looks like GIGO designed (and I know who this designer is
and what his motivations are) to produce appropriate hypothetical
numbers to the designer's need of the moment to me.



This really isn't that hard Howard. You're turning yourself into a
pretzel here.

No I'm not. I have explicitly said that there already are good
measures of sequence similarity, that they already have been used and
are used, and that I don't think your formula is worth ***.

< snip rest >

Sean Pitmanwww.DetectingDesign.com


.


Loading