Re: Maximum, Average, and Likely Minimum Gap Distances
- From: Seanpit <seanpit@xxxxxxxxx>
- Date: Sun, 13 Jul 2008 08:12:17 -0700 (PDT)
On Jul 13, 6:26 am, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
I'm talking about the entire sequence minimum needed to realize a
particular type of function. Where have you been? Why do you think I
keep talking about the likely minimum size needed for a function like
CytoC or lactase or rotary flagellar motility to work? These
different types of functions have different minimum structural
threshold requirements - obviously. One might argue and be at least
someone reasonable at the same time that the likely minimum size
requirement for CytoC functionality in a given life form is 80aa.
Might one? How would one argue this?
Read the papers I've listed for you. Or, provide some evidence of
your own to even suggest that the likely minimum CytoC size from the
perspective of any living thing is significantly less than 80aa. So
far, all you have is incredulity with no real evidence to back
yourself up - Certainly nothing that has actually been published.
One
could not make that argument for the flagellar motility function - not
remotely. Why? Because the likely minimum structural threshold
requirement needed to achieve the rotary flagellar motility function
is on the order of several thousand fairly specified residues at
minimum. For lactase it is on the order of several hundred - at
minimum.
Here you have gone from sheer number of residues to number of "fairly
specified" residues, as if that's the same thing. You keep flipping back
and forth among different versions, and this is what makes your
arguments so ambiguous.
I've always presented the minimum size requirement as a certain number
of fairly specified residue positions. The 80aa minimum size
requirement for CytoC isn't 80 absolutely specified residue
positions. How has that not been clear to you? - especially after I
directly pointed out to you that only about 27 of this 80aa minimum
were invariant? None of the systems I've presented are absolutely
specified. They all have a fair degree of flexibility. As far as what
I mean by "fair degree" read the Durston paper and note the concept of
FSC density in terms of "Fits" per residue position.
The average gap size is based on the ratio of sequences that would be
able to produce the function in question - - or more relevantly, the
total number of all potentially beneficial sequences vs. non-
beneficial sequences.
Or, more simply, your "average gap size" is just the number of
constrained sites. A normal person would consider this to be the maximum
gap size.
It isn't the maximum gap size. For a 100aa system, it is quite
possible to have another system that shares absolutely no sequence
homology at all - thereby having a gap size of 100 residue location
differences.
You need to distinguish the gap between sequences, which you don't care
about, from the gap between islands, which you do. If, out of those 100
residues, only 30 are "fairly specified", then the gap size is at most
30, because changing only those 30 residues will move from one island to
another. Therefore in that case, 30 is the maximum gap size -- the
maximum distance between those two islands.
You do have a point here. It is the number of fairly specified
residue positions that is key. A function that requires little or no
sequence specificity would have a very small maximum gap distance and
smaller still average and likely minimum gap distances.
For CytoC, in particular, it seems that all of the residue positions
have a certain degree of specificity. Some have more, some have
less. However, a given position will not tolerate some amino acid
options at all without a significant loss of CytoC function. Some
options would simply be too destabilization to the overall function of
the system.
Now, you might be able to find two or three positions in CytoC that
could tolerate all 20aa options without a complete loss of function,
but the point is essentially the same.
That is the maximum gap size. Now, you might argue that
at least some of these differences are functionally neutral - and that
true. Increased flexibility at some various positions increase the
overall number of potential sequences that can produce the type of
function in question - making it easier to find by a random search of
sequence space. I.e., it makes the size of the island or islands with
the function in question larger.
Yes. So the absolute number of residues is not relevant to the gap size,
right? It's the number of "fairly specified" residues that counts.
That's right. An increase in size alone is meaningless to the
argument. It has to be an increase in fairly specified residue
positions to create a linear increase in the maximum, average, and
likely minimum gap distances.
That would seem to require only that *something* be in them, if indeed
there is such a requirement.
Not quite true. For the CytoC function it is true that only about 27
or so positions are completely non-variable. However, most of the
other positions are also very restrained as well - to only a handful
of options. There are a few that allow 8 or 9 residues, but even this
degree of flexibility isn't limitless. And, this is only considering
single replacement events - one at a time. Studies show that if more
than one position is replace at the same time, the constraints are
even more restrictive.
This is why the likely size minimum for CytoC is more like 80aa rather
than 27aa. It also means that the maximum gap distance between a
different type of functional system with a similar minimum size
requirement of 80aa and CytoC functionality is 80aa differences, not
27.
How did you compute this number? How did you compute the likely gap size
of 30? And why did you say previously that the maximum gap size for
cytochrome c was 100?
I didn't argue that the likely gap size for a 100aa system at the
level of specificity of CytoC would be 30 residue differences.
That's Howard's strawman version of what I actually said. What I
really said is that 30 residue differences is the likely average gap
distance between 100aa systems at this level of specificity or FSC.
The average gap distance isn't the minimum likely gap distance. The
minimum likely gap distance depends upon the number of protein-based
systems of this size in the gene pool and is always smaller than the
average gap distance.
I also didn't say that the likely maximum gap distance for CytoC was
100 residue differences. It isn't. I used the 100aa number because
that is the number Yockey used to estimate the ratios of CytoCs in
100aa sequence space.
What you are talking about is the maximum possible random walk
distance - which is infinite regardless of the absolute number of
residue differences. It is just that as the gap distance gets
smaller, the odds that the random walk distance will in fact be
infinite decrease exponentially.
So you're measuring only Euclidean distances here.
That's right . . . However, a linear increase in the Euclidean
distance translates into an exponential increase in the average random
walk distance.
But the maximum possible distance between two *islands* is equal
to the number of constrained sites in the target sequence, and that's
what Howard means by "maximum distance".
In order to know the distance between two islands of unknown position
in sequence space, you have to know something about the ratio of
islands vs. non-islands (or quarters vs. non-quarters). This ratio
will tell you the average expected distance between any particular
starting point and an island in sequence space. This isn't the
maximum possible distance. It is the average linear distance that is
expected to exist between any chosen starting point and any one of the
quarters in the circle. The greater the degree of sequence
flexibility, the more quarters there are in the circle and the less
the expected average distance between a chosen starting point and the
quarters in the circle.
You mean, in this analogy, the greater the degree of sequence
flexibility, the bigger the quarters are.
Either way, the statistics are the same.
One problem with your
procedure, if your analogy is at all useful, is that you're assuming
each island to consist of points (quarters) randomly distributed in
sequence space, when they are in fact tightly clustered.
Families of single proteins are indeed quite clustered at lower levels
of functional complexity. However, the distance between family
clusters of proteins, even at low levels is quite significant.
Getting from one cluster of islands to the next cluster of islands is
a bit problematic for random walk. It isn't impossible at lower
levels, just like it isn't impossible to get all the way across 3-
letter word space without having to swim for it very much, but it
isn't as easy as getting form one island to the next in the same
family cluster.
The real problem arises once one starts moving beyond the level of the
single-protein family cluster at the level of a few hundred fairly
specified residue positions (forgive me if I don't always type out
"fairly specified" each and every time I present this idea. I figure
that most people can remember what I mean from one paragraph to the
next). Once one starts considering functional systems beyond the
1000aa level of complexity (again 1000aa means "fairly specified aa" -
in case you forgot since the previous sentence). Such systems of
higher complexity usually start requirement multiple proteins to form
them - as in systems like rotary flagellar motility, non-rotary
flagellar motility, ATPase, intracellular vesicle transport, etc.
Such complex multi-protein systems are not nearly as clustered
together in sequence space as were lower-level systems (comparable to
multi-word phrases, sentences, or paragraphs in a written human
language system).
And islands too
seem not to be randomly distributed, but are themselves clustered. You
just can't use the poisson distribution to calculate distances under
these conditions. A major assumption has been violated.
Not really a problem. While there is certainly some clustering, which
would cause a decent modification in the calculations at lower levels,
this clustering becomes less and less clustered at higher and higher
levels of functional complexity - as you yourself can determine by
noticing an absolute increase in the number of required functional
residue positions with the increasing size of fairly specified
systems.
Regarding the size of the islands, have you read any of the referenced
papers I gave you?
Not the ones you're thinking about now. It seems to me that, at most,
you know the sizes of three islands, one of them cytochrome c. Are there
more?
These estimates can be reasonably extrapolated to many other types of
functions. The Durston paper, in particular, analyzes the FSC of
dozens of proteins.
Regarding the distribution of quarters in the circle, the actual
location of the quarters in the circle is unknown. What is known is
that the overall distribution is somewhat clustered. However, it is
also known that this clustering effect gets less and less clustered at
higher and higher levels of functional complexity (i.e., minimum size
and/or specificity requirements).
I believe that's equivalent to an answer of "no" to my second question.
You'd be wrong then. It is known that the clustering effect becomes
less and less clustered. It is also known that even at low levels of
functional complexity the clustering effect between clusters or
families of islands isn't very clustered. There are significant
distances even been families of islands at the level of a few hundred
fairly specified residues. That is why although evolution between
families happens at such levels in observable time, it isn't that
common. And, it shows an exponential decline in evolutionary
potential - even at these low levels to the point of complete
disappearance well shy of the 1000aa level.
Sean Pitman
www.DetectingDesign.com
.
- Follow-Ups:
- Re: Maximum, Average, and Likely Minimum Gap Distances
- From: John Harshman
- Re: Maximum, Average, and Likely Minimum Gap Distances
- From: hersheyh
- Re: Maximum, Average, and Likely Minimum Gap Distances
- References:
- Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Rupert Morrish
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: hersheyh
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: hersheyh
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: hersheyh
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: John Harshman
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: Seanpit
- Re: Experimental basis for the Non-Beneficial Gap Problem
- From: John Harshman
- Maximum, Average, and Likely Minimum Gap Distances
- From: Seanpit
- Re: Maximum, Average, and Likely Minimum Gap Distances
- From: John Harshman
- Experimental basis for the Non-Beneficial Gap Problem
- Prev by Date: Re: OT: host desecration or hate crime?
- Next by Date: Re: *** Post Of The Month for July 2007: Vote Now! ***
- Previous by thread: Re: Maximum, Average, and Likely Minimum Gap Distances
- Next by thread: Re: Maximum, Average, and Likely Minimum Gap Distances
- Index(es):
Relevant Pages
|