Re: Maximum, Average, and Likely Minimum Gap Distances



On Jul 13, 11:12 am, Seanpit <sean...@xxxxxxxxx> wrote:
On Jul 13, 6:26 am, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:



I'm talking about the entire sequence minimum needed to realize a
particular type of function.  Where have you been?  Why do you think I
keep talking about the likely minimum size needed for a function like
CytoC or lactase or rotary flagellar motility to work?   These
different types of functions have different minimum structural
threshold requirements - obviously.   One might argue and be at least
someone reasonable at the same time that the likely minimum size
requirement for CytoC functionality in a given life form is 80aa.

Might one? How would one argue this?

Read the papers I've listed for you.  Or, provide some evidence of
your own to even suggest that the likely minimum CytoC size from the
perspective of any living thing is significantly less than 80aa.  So
far, all you have is incredulity with no real evidence to back
yourself up -  Certainly nothing that has actually been published.

Your 80 aa above is nothing but a WAG.

One
could not make that argument for the flagellar motility function - not
remotely.  Why?  Because the likely minimum structural threshold
requirement needed to achieve the rotary flagellar motility function
is on the order of several thousand fairly specified residues at
minimum.  For lactase it is on the order of several hundred - at
minimum.

Here you have gone from sheer number of residues to number of "fairly
specified" residues, as if that's the same thing. You keep flipping back
and forth among different versions, and this is what makes your
arguments so ambiguous.

I've always presented the minimum size requirement as a certain number
of fairly specified residue positions.  The 80aa minimum size
requirement for CytoC isn't 80 absolutely specified residue
positions.  How has that not been clear to you? - especially after I
directly pointed out to you that only about 27 of this 80aa minimum
were invariant?  

Again, the ratio that Yockey presents and from which you got the 30 aa
number is a measure of the degree of sequence specificity required for
the specified named function (namely, everything that modern
cytochrome c does). You got the 30 aa number by taking the 20th root
of the inverse of the Yockey ratio (which is the number of non-
cytochrome c function sequences per cytochrome c function sequence).
The 20th root of the inverse of the Yockey ratio (how many non-cyt c
functional sequences there are per functional cyt c sequence) is also
as a measure of the "degree of specificity" of finding a sequence with
cytochrome c function in the universe of sequences that size.
Regardless of whether one assumes that cyt c is a protein with only
completely invariant and completely variable aa sites or the more
realistic scenario of each aa having some intermediate degree of
variability, the same "degree of specificity" value holds. That makes
the model of cytochrome c as if it were represented by a protein with
30 invariant sites and 70 freely variable sites merely a mathematical
simplification, an easy visualization, of all the possible
intermediate states of partial variability. They *all* would have the
same "degree of specificity", which is measured by the Yockey ratio as
well as your unnecessary mathematical mainipulations of the Yockey
ratio.

And the *effective maximum gap size* of all proteins of 100aa size
that have the same "degree of sequence specificity" is *equivalent*
to the 30 aa number.

None of the systems I've presented are absolutely
specified. They all have a fair degree of flexibility.  As far as what
I mean by "fair degree" read the Durston paper and note the concept of
FSC density in terms of "Fits" per residue position.

Not surprisingly, the Durston paper does essentially what Yockey did
and got essentially the same type of information. But, of course, as
I pointed out, the Yockey ratio (and all the derivatives you generate
and the Durston paper) measure the ratio of sequences with a specific
named function to the universe of all sequences of that size. That
ratio provides no measure at all of your hypothetical distance between
functional islands. Not the average distance. Not the closest
distance (the one most likely to evolve). [Ignoring that your analogy
assumes that evolution only works with 1 aa steps.] You have no
measure of "averge gap size". The Yockey ratio (or its equivalents)
cannot provide that number. Which is why you pull it out of yer arse
or use the only gap size that the Yockey ratio can provide: the
effective maximal gap size for a protein with that "degree of
specificity".

The average gap size is based on the ratio of sequences that would be
able to produce the function in question  - - or more relevantly, the
total number of all potentially beneficial sequences vs. non-
beneficial sequences.

Or, more simply, your "average gap size" is just the number of
constrained sites. A normal person would consider this to be the maximum
gap size.

It isn't the maximum gap size.  For a 100aa system, it is quite
possible to have another system that shares absolutely no sequence
homology at all - thereby having a gap size of 100 residue location
differences.

You need to distinguish the gap between sequences, which you don't care
about, from the gap between islands, which you do. If, out of those 100
residues, only 30 are "fairly specified", then the gap size is at most
30, because changing only those 30 residues will move from one island to
another. Therefore in that case, 30 is the maximum gap size -- the
maximum distance between those two islands.

You do have a point here.  It is the number of fairly specified
residue positions that is key.  A function that requires little or no
sequence specificity would have a very small maximum gap distance and
smaller still average and likely minimum gap distances.

And that is the only thing the Yockey ratio can provide: the ratio of
sequences that have a specific named function(s) to the universe of
sequences of that size. That is your measure of "degree of sequence
specificity". The simple mathematical manipulation you generate,
taking the 20th root of the inverse does not change what is being
effectively measured: degree of sequence specificity for a named
function(s) in the universe of all possible sequences that size.

For CytoC, in particular, it seems that all of the residue positions
have a certain degree of specificity.  

Which the Yockey ratio takes into account.

Some have more, some have
less.  However, a given position will not tolerate some amino acid
options at all without a significant loss of CytoC function.  Some
options would simply be too destabilization to the overall function of
the system.

Now, you might be able to find two or three positions in CytoC that
could tolerate all 20aa options without a complete loss of function,
but the point is essentially the same.

That is the maximum gap size.  Now, you might argue that
at least some of these differences are functionally neutral - and that
true.  Increased flexibility at some various positions increase the
overall number of potential sequences that can produce the type of
function in question - making it easier to find by a random search of
sequence space.  I.e., it makes the size of the island or islands with
the function in question larger.

Yes. So the absolute number of residues is not relevant to the gap size,
right? It's the number of "fairly specified" residues that counts.

That's right.  An increase in size alone is meaningless to the
argument.  It has to be an increase in fairly specified residue
positions to create a linear increase in the maximum, average, and
likely minimum gap distances.

And the Yockey ratio *is* the measure of "degree of specificity" or
how many sequences out of all possible sequences have the named
function (unless, of course, some entirely different structure or
sequence could perform the same function, a fact we do not know since
cyt c effectively has prevented any alternative mechanism which would
have to *initially* be better than modern cyt c).

That would seem to require only that *something* be in them, if indeed
there is such a requirement.

Not quite true.  For the CytoC function it is true that only about 27
or so positions are completely non-variable. However, most of the
other positions are also very restrained as well - to only a handful
of options.  There are a few that allow 8 or 9 residues, but even this
degree of flexibility isn't limitless.  And, this is only considering
single replacement events - one at a time.  Studies show that if more
than one position is replace at the same time, the constraints are
even more restrictive.

This is why the likely size minimum for CytoC is more like 80aa rather
than 27aa.  It also means that the maximum gap distance between a
different type of functional system with a similar minimum size
requirement of 80aa and CytoC functionality is 80aa differences, not
27.

How did you compute this number? How did you compute the likely gap size
of 30? And why did you say previously that the maximum gap size for
cytochrome c was 100?

I didn't argue that the likely gap size for a 100aa system at the
level of specificity of CytoC would be 30 residue differences.
That's Howard's strawman version of what I actually said.  What I
really said is that 30 residue differences is the likely average gap
distance between 100aa systems at this level of specificity or FSC.

Yet the way you calculated that number, it instead represents a simple
mathematical manipulation (the 20th root of an inverse) of the Yockey
measure of *degree of sequence specificity* or FSC of cytochrome c.
It is, therefore, a measure of *degree of sequence specificity* for
cytochrome c. The *degree of sequence specificity* and the number of
amino acids that *effectively measure that degree of sequence
specificity* is NOT "average gap size". It is *effectively* ALL (or
the mathematical equivalent of all) the invariant sites.

The average gap distance isn't the minimum likely gap distance.  The
minimum likely gap distance depends upon the number of protein-based
systems of this size in the gene pool and is always smaller than the
average gap distance.

Since you have no measure of "average gap distance" but only a measure
of *degree of sequence specificity* that amounts to effectively the
equivalent of all the sites that would produce that *degree of
sequence specificity* if evolution worked by random assembly the above
is quite moot.

I also didn't say that the likely maximum gap distance for CytoC was
100 residue differences.  It isn't.  I used the 100aa number because
that is the number Yockey used to estimate the ratios of CytoCs in
100aa sequence space.

And I say that you don't understand what the numbers you produce
mean. And when push comes to shove, all you do is pull a number out
yer arse.

What you are talking about is the maximum possible random walk
distance - which is infinite regardless of the absolute number of
residue differences.   It is just that as the gap distance gets
smaller, the odds that the random walk distance will in fact be
infinite decrease exponentially.

So you're measuring only Euclidean distances here.

That's right . . .  However, a linear increase in the Euclidean
distance translates into an exponential increase in the average random
walk distance.



But the maximum possible distance between two *islands* is equal
to the number of constrained sites in the target sequence, and that's
what Howard means by "maximum distance".

In order to know the distance between two islands of unknown position
in sequence space, you have to know something about the ratio of
islands vs. non-islands (or quarters vs. non-quarters).  This ratio
will tell you the average expected distance between any particular
starting point and an island in sequence space.  This isn't the
maximum possible distance.  It is the average linear distance that is
expected to exist between any chosen starting point and any one of the
quarters in the circle.  The greater the degree of sequence
flexibility, the more quarters there are in the circle and the less
the expected average distance between a chosen starting point and the
quarters in the circle.

You mean, in this analogy, the greater the degree of sequence
flexibility, the bigger the quarters are.

Either way, the statistics are the same.

One problem with your
procedure, if your analogy is at all useful, is that you're assuming
each island to consist of points (quarters) randomly distributed in
sequence space, when they are in fact tightly clustered.

Sean has presented no way to know how far apart the islands of
functionality are even in general or average terms. And certainly has
not even attempted to find the smallest gap size between functionally
useful islands, even though that is what evolution would find first.
Not the smallest "likely" distance based on some abstract mathematical
model with silly assumptions that have not been stated clearly, but
the smallest actually available distance based on real organismal
genomes.

Families of single proteins are indeed quite clustered at lower levels
of functional complexity.  However, the distance between family
clusters of proteins, even at low levels is quite significant.
Getting from one cluster of islands to the next cluster of islands is
a bit problematic for random walk.  It isn't impossible at lower
levels, just like it isn't impossible to get all the way across 3-
letter word space without having to swim for it very much, but it
isn't as easy as getting form one island to the next in the same
family cluster.

You get ridiculous stuff like the above, which assumes that the only
way to get from one "cluster" to another is by a series of single aa
changes.

The real problem arises once one starts moving beyond the level of the
single-protein family cluster at the level of a few hundred fairly
specified residue positions (forgive me if I don't always type out
"fairly specified" each and every time I present this idea.  I figure
that most people can remember what I mean from one paragraph to the
next).  

That might be possible if you were capable of holding a clear thought
from one paragraph to another rather than one muddled idea based on
bogus numerology that you muddle some more.

Once one starts considering functional systems beyond the
1000aa level of complexity (again 1000aa means "fairly specified aa" -
in case you forgot since the previous sentence).  Such systems of
higher complexity usually start requirement multiple proteins to form
them - as in systems like rotary flagellar motility, non-rotary
flagellar motility, ATPase, intracellular vesicle transport, etc.
Such complex multi-protein systems are not nearly as clustered
together in sequence space as were lower-level systems (comparable to
multi-word phrases, sentences, or paragraphs in a written human
language system).


Yadda. Yadda. Yadda. Words used as talismans against evil. Based on
bogus or vague numerology, misinterpretation of real science, and a
poor analogy of how evolution works morphing into just plain strawman
evolution.

And islands too
seem not to be randomly distributed, but are themselves clustered. You
just can't use the poisson distribution to calculate distances under
these conditions. A major assumption has been violated.

Not really a problem.  While there is certainly some clustering, which
would cause a decent modification in the calculations at lower levels,
this clustering becomes less and less clustered at higher and higher
levels of functional complexity - as you yourself can determine by
noticing an absolute increase in the number of required functional
residue positions with the increasing size of fairly specified
systems.

Regarding the size of the islands, have you read any of the referenced
papers I gave you?

Not the ones you're thinking about now. It seems to me that, at most,
you know the sizes of three islands, one of them cytochrome c. Are there
more?

These estimates can be reasonably extrapolated to many other types of
functions.  The Durston paper, in particular, analyzes the FSC of
dozens of proteins.

Regarding the distribution of quarters in the circle, the actual
location of the quarters in the circle is unknown.  What is known is
that the overall distribution is somewhat clustered.  However, it is
also known that this clustering effect gets less and less clustered at
higher and higher levels of functional complexity (i.e., minimum size
and/or specificity requirements).

I believe that's equivalent to an answer of "no" to my second question.

You'd be wrong then.  It is known that the clustering effect becomes
less and less clustered.  It is also known that even at low levels of
functional complexity the clustering effect between clusters or
families of islands isn't very clustered.  

Evidence? Since you are the only one that supports your model, I take
the word "known" with more than a grain of salt.

There are significant
distances even been families of islands at the level of a few hundred
fairly specified residues.  That is why although evolution between
families happens at such levels in observable time, it isn't that
common.  And, it shows an exponential decline in evolutionary
potential - even at these low levels to the point of complete
disappearance well shy of the 1000aa level.

Yadda. Yadda. Yadda.

Sean Pitmanwww.DetectingDesign.com


.



Relevant Pages

  • Re: Richard says Howard Hershey is wrong
    ... cytochrome c, the only sequence you have presented actual data for, is ... It is neither an "average" nor is it "gap ... isn't "average gap size" at all. ... "sequence specificity" as cytochrome c. ...
    (talk.origins)
  • Re: The Relationship of Gaps to Thresholds
    ... other than a fair degree of sequence similarity. ... Your MATH explicitly says that the size of the gap needed to be ... is roughly 30% of the total size of the end protein. ... start at some average distance away (that distance being a function of ...
    (talk.origins)
  • Re: Liar Liar
    ... random aa's or a maximally distant sequence. ... Define "likely gap distance" for cytochrome c and how you calculate ... completely random substitution in a protein in which none of those ...
    (talk.origins)
  • Re: Maximum, Average, and Likely Minimum Gap Distances
    ... of fairly specified residue positions. ... It isn't the maximum gap size. ... maximum distance between those two islands. ... have a certain degree of specificity. ...
    (talk.origins)
  • Re: Experimental basis for the Non-Beneficial Gap Problem
    ... unless you think evolution starts from some random sequence maximally ... distance is always smaller than the minimum structural threshold ... The maximum gap size for a 100aa system is 100aa differences. ...
    (talk.origins)