Re: Pitman numerology



On Dec 4, 5:34 pm, "R. Baldwin" <res0k...@xxxxxxxxxxxxxxxxxxxx> wrote:

You misunderstood PiP's explanation, which I thought was very clear. A Venn
diagram is a diagram of sets, not spaces. Members of a set can be far apart
from each other in a space. Distance is not relevant to set membership.

That's the problem. Just because you have a set of members doesn't
mean that any of these very rare members is easy to find by RM in a
very large sequence space given their very wide average separation
within that space.

The issue here is whether transformation within the space preserve set
membership.

Let's call the set of all points in sequence space S. Let's call the set of
all points in protein space P. P is a subset of S, consisting of all
sequences that code for chemically and thermodynamically viable proteins.

Right . . .

You have been arguing that P is much much smaller than S, right? In other
words, |P| << |S|, where |x| is the set cardinality funciton. If so, that
means |~P| >> |P|, where |~P| is the complement of P in S.

Now, the Thirumalai and Klimov article explains what it means to have set
membership in P. A sequence is in P if the resulting amino acid chain has
the dual properties of thermodynamic stability and kinetic accessbility.

Now, suppose we have two protein lineages. The set of spaces for each are A
and B. Clearly, P contains both A and B. Clearly, S contains P.

Right . . .

You have also been arguing (and I agree) that point mutations are unlikely
to transform a sequence in A to a sequence in B, unless A and B just happen
to be adjacent. There is a pretty good likelihood that a point mutation
will map from A to A,

Right . . .

In other words, if m is an element of the set M of point mutations on S,
Prob ( m : A -> A ) is pretty large, and
Prob ( m : A -> B ) is pretty small.
I think this agrees pretty well with things you've written.

Correct . . .

But, we know there are a number of macromutations, such as indels,
inversions, and duplications, that are not likely to to transform a
sequence from A to A.

Right . . .

The question is, how often do these macromutations preserve set membership
in P? If we preserve set membership, then we DON'T CARE how far apart the
sequences are.

The odds of preservation of set membership by a large indel mutation
are very much related to the density of targets (i.e., "P") within
sequence space (S). That's the whole point. Odds are very good that
a large indel mutation will remove you from A and drop you into a spot
in sequence space that is not a member of set P.

Let m' be an element of M', the set of macromutations on S.
Let p = Prob ( m': P -> P )
Let q = Prob ( m': P -> ~P ) = 1-p

If p = 1 and q = 0, then all macromutations retain set membership in P, and  
the search space is restricted to P.

If only . . . That's the problem. The odds that q = 0 are nil. The
odds that q is essentially equal to 1 are extremely good.

Clearly, if this condition holds (and
I don't suggest it does), then your
drunkard's walk model falls apart.

That's right. The problem is this model doesn't hold mathematically
or demonstrably in real world observations.

If p = 0 and q = 1, then all macromutations result in junk. If this
condition holds (and I don't suggest it does, either), then your drunkard's
walk model is valid.

Exactly. This model is exactly what is supported by the extreme
rarity of potential targets in a gigantic sequence space. For
example, what if there were only two possible targets in all of
sequence space - with unknown locations. You're standing on one of
them. What are the odds that a macromutation would happen to land you
on the other one? This really isn't that hard of a problem.

Now consider a Uniform distribution. If p = |P|/|S| and q = |~P|/|S|, then
p is approximately zero, and the drunkard's walk model remains valid. This
appears to be an unwritten assumption of your model: that p = |P|/|S|

Essentially yes. If the actual location of the targets in sequence
space is not known, the odds of finding one are based on a uniform
distribution model. The only way this model could be overcome is by
showing that targets tend to be extraordinary clustered into one tiny
corner of sequence space. This is demonstrably not true as an
unarguable fact.

But what if 1 > p >> |P|/|S|? Then selection comes into play because
sequences in P have a differential advantage over sequences in ~P, and just
as with p = 1, your drunkard's walk model falls apart. (The notation ">>"
means "much greater than"). This is what I suggest.

Selection cannot come into play at all until you actually find a
target. A target, by definition, is selectably beneficial -
functionally. Until such a target is found by purely random means,
there is no selection guidance for the random walk.

Also, I'm not sure what your ~P notation means?

Here is the heart of the matter, then. What kinds of macromutations
preserve the dual properties of thermodynamic stability and kinetic
accessbility, and thereby preserve set membership in P? Is there a
biochemical reason to expect that 1 > p >> |P|/|S|?

If a large subset of macromutations preserve membership in P, then the
drunkard's walk model is garbage.

That's right . . .

Taking this a step further, let P' be the set of sequences within close
distance of P, and ~P' its complement on S. Sequences in P' have a
differential advantage over sequences in ~P' because they can easily reach
sequences in P by point mutation. So really, we only need to preserve
membership in P' under mutation.

The problem is that you cannot differentially preserve P' sequences
over any other non-target sequence just because P' is close to P. NS
doesn't know that. Therefore P' sequences are not going to be
selectably preserved at all. There simply is not selectable advantage
for being "warmer" or closer to a potential target in sequence space.
NS doesn't play the child's game "you're getting warmer". It only
plays the game, "you're there or your not".

The facts of evolution tell us that there is a biochemical reason to expect
that 1 > p >> |P|/|S|. That means that evolution has a way to efficiently
search large spaces, by spending more time in P and less time in ~P.

Based on what? If your ~P notion means not-P or not a target
sequence, then upon what is this assertion based? Have any
statistical reason or demonstrable reason why this might be true for
macromutations? As far as I can tell, the odds that a macromutation
of any kind will hit upon P beyond the 1000 fsaar level of functional
complexity are very very low indeed.

We know that viruses can efficiently search sequence space. That is why
they are so difficult to control. If the drunkard's walk model were
correct, then viruses could not do this. Hence, there MUST BE a biochemical
reason that p >> |P|/|S|. There MUST BE a large enough subset of
macromutations that preserve set membership in P. Your model MUST BE
invalid or viruses could not do what they do.

Again, viruses able to search very low level sequence space quite
efficiently indeed. Why? Because low-level sequence space has a much
much higher ratio of P vs. ~P in S - - or potential targets vs. non-
targets in sequence space. In fact, the reason why viruses are so
difficult to control is because of the mechanism of antivirals.
Antiviral medications, like most antibiotic medications, target
specific sequences within the virus. All the virus has to do is to
slightly modify the target sequence to block the effect of the
medication. And presto, resistance is gained.

This sort of function blocking function is so sequence specific that
it can usually be done without the virus or the bacterium having to
completely loose the function of the sequence in question. All that
is done is that the sequence has moved on the same target island
cluster of sequences to a slightly different spot on the same
island.

This is why such resistance-type functions are about as low-level as
you can get in evolutionary novality. This sort of evolution is so
easy, because the odds of success are extremely good. This is a far
far cry from a magamutation producing a novel system of function that
isn't based on or that does not require the loss or destruction of
some other pre-existing system within the organism.

What's more (and this is where Thirumalai and Klimov's article comes
in)  circle A is probably not that far from circle B to begin with
because both of them have to fit within the innermost of the two
outer circles.  If you weren't constrained by NS to compact,
nicely folding sequences you might expect them to be much farther
apart constrained only by the outer circle.

Again, you're mistaken in thinking there is only one Klimov circle
here and that the starting points and targets are all within this one
small corner of sequence space.  This view isn't correct.  The islands
of potentially viable proteins are spread out all over sequence
space.  They simply aren't isolated into one tiny corner of it.

Again, you misunderstood what PiP wrote.

I understood it just fine. I think your just wishing it made more
sense than it does.

Sean Pitman
www.DetectingDesign.com

.



Relevant Pages

  • Re: Cantor Confusion
    ... said to be determined solely by membership, ... the notion of a sequence derives really from an inductive definition ... Is there not a single primitive in set theory, namely, e? ...
    (sci.math)
  • Re: Zero terminated strings
    ... using a rule along the lines of "any sequence of characters except a dot", ... when it should have tested for membership in a specific set. ...
    (comp.lang.c)
  • Re: String + Range = Strange
    ... And the same holds true of many a sequence, ... can membership be optimized in a custom manner? ... Prev by Date: ...
    (comp.lang.ruby)
  • Re: Python interpreter bug
    ... because the membership test has to check if the tested item is a member ... of the sequence. ... it's hardly qualifies as a membership ...
    (comp.lang.python)
  • Clustered Targets in Sequence Space?
    ... potentially beneficial sequence ... If the sequence space ... the potentially beneficial targets are spaced widely enough, ... "In protein structure space, ...
    (talk.origins)