Re: Sean Pitman: definitions wanted
- From: "RobinGoodfellow" <lmeyerguz@xxxxxxxxx>
- Date: 7 Dec 2005 16:40:46 -0800
Seanpit wrote:
> RobinGoodfellow wrote:
>
> <snip>
>
> > > Obviously the authors themselves think that extrapolation to the rest of the
> > > protein is at least reasonable. So do I.
> >
> > The extrapolation is not entirely unreasonable, but it must be taken
> > with a *huge* grain of salt. The authors themselves understand this
> > quite well. Judging by how many creationist and ID web sites out there
> > took the 1e-60 figure and ran with it, however, it appears that this
> > understanding is sadly lacking in the ID community.
>
> The 1e-63 figure was actually published by the Sauer et al as their
> most reasonable estimate of specificity. Of course this is a very
> rough estimate, but it is what the author's believe to be the most
> reasonable interpretation of their own work.
Virtually all scientists who publish articles related to the topics you
and I discuss seem to think that evolution is the most reasonable
interpretation of their work. You don't seem to have any problem
arguing against all those interpretations. Your appeal to authority of
the authors in this instance is sheer hypocrisy, Sean.
> Now, you may disagree, suggesting that the actual ratio is much better than this, but you
> have no published work to back yourself up here.
I don't, do I? So, am I to assume that since you have no published
work challenging evolutionary interpretations of the data, those
interpretations are correct as well? Okey-dokey.
> The best published numbers available really do counter your own assertions.
I've given you a number reasons why "the best published" numbers are
not very good, and tell us virtually nothing when it comes to
addressing questions of evolvability. The authors themselves
essentially agree with me, certainly on the first point, and probably
on the second as well. If your best retort is to cling to your
argument from authority, so be it.
> <snip>
>
> > > Given a single random walker, this walker cannot be in more than one
> > > place at a time. Therefore, at any given time, the walker will be in
> > > one level or another of sequence space. The odds that the random
> > > walker will be on a novel beneficial island at this point in time is a
> > > function of the ratio of beneficial sequences at that level of sequence
> > > space.
> >
> > No, the odds are a function of the *distribution* of the "beneficial
> > sequences", as well as the distributions of such sequences at all
> > adjacent "levels" above and below, since even a single "walker" can
> > potentially change level at every steps. And since in evolution, there
> > are multiple "walkers" at the same time, the odds are also a function
> > of the position of all the other walkers. These are just a few things
> > that your calculations never explicitly model.
>
> The odds that a particular walker will hit a beneficial island at any
> given point in time are very much related to the density of sequences
> in sequence space. Certainly it does matter what the starting point is
> and if the islands are or are not clustered in one tiny corner of
> sequence space. However, the evidence available is very clear that
> such clustering does not occur.
You guard this evidence jealously then. I've yet to see anything
remotely convincing: and in fact, everything I've seen suggest quite
the opposite, including your links below.
> Even at low levels one can clearly see
> that beneficial islands spread out in sequence space (see linked
> references). At higher levels, there simply is not this high-level
> homology that would have to be there if you were indeed correct in your
> stacking-the-deck model.
>
> http://www.zbh.uni-hamburg.de/wurst/protspace/
> http://www.lbl.gov/Publications/Currents/Archive/Apr-01-2005.html
How exactly do these visualisations of the structural relationships
between protein support in your position in any way shape or form?
Your notions of "sequence space" are completely absent from such maps.
All that this shows that functionally related proteins are clustered
together based on structural similarity, which is of course to be
expected, whether you accept evolution or ID. Of course, the map also
shows a very dense population of protein folds, including a very dense
distribution of protein folds, including and quite a bit of intermixing
between alpha, beta, and alpha-beta fold, suggesting possible
evolutionary relationships between proteins thus intermixed. While
some regions of the space are more clustered then others, there are no
huge isolated islands with vast gaps between them, at least from a
structural persepective. But I am sure that those gaps will get bigger
and really, really insurmountable right were we can't see them. All I
have to do is close my eyes and trust you.
> > > It is not your terminology, but your concepts of sequence space that
> > > seem to be either wrong or misleading. You argue that there really are
> > > no levels in sequence space at all - that it is just all one big
> > > infinite space. This is mistaken. There are indeed levels in the
> > > overall infinity of sequence space that are truly finite.
> >
> > Of course there are "levels", Sean. You can arbitrarily define these
> > levels however you like, and make them either finitie or infinite as
> > you prefer. However, the fact remains that you are having difficulty
> > consistently defining such levels (or even defining your "sequence
> > space") in a manner where each "level" corresponds to some degree of
> > complexity. Below [note: I snipped away that paragraphfor brevity, but
> > please feel free to restore it if you feel that the following is a
> > misunderstanding / misrepresantion of your view], you claim that a
> > 1000-aa sequence corresponds not to a single point in sequence space,
> > but to an entire blob spanning the 400th to 1000th level of sequence
> > space, because the sequence encodes a protein with a function that has
> > a 400 minimum length requirement. If a sequence does not correspond to
> > a single point in your sequence space, then what does?
>
> The minimum functional requirements are what we are talking about. The
> minimum functional requirements are indeed found at a specific level
> within sequence space - on an island within that level and no lower
> level.
Did you not understand my question? Again, I ask: what corresponds to
a single point in your sequence space? It doesn't appear to be a
sequence, which leads one to the paradoxical conclusion that when you
say "sequence space" you don't actually mean "space of all possible
sequences". The way out of this paradox is to conclude that you don't
really know what you are talking about. But, if you'd like to prove me
wrong, tell me: where in your sequence space would I place the human
hemoglobin? What complexity level would it correspond to? What is the
overall number of sequences at that level of complexity?
I am not just trying to be difficult, Sean. There just very basic
questions about your model that you don't appear to have thought
through. For instance, it seems that you still haven't figured out
that you need to separate the notions of sequence space and function
space, and that one does not immediately equate to another.
> > > > So, do you think that the distribution of "beneficial sequences" in the
> > > > "space" of level L is independent of the distribution of such sequences
> > > > at level L-1?
> > >
> > > Not completely independent, but at higher and higher levels this does
> > > become more and more true. This is where the lava lamp imagery comes
> > > into play. At higher levels the columns of vertically connected
> > > sequences start to break apart into completely separated blobs. These
> > > blobs may span many levels of sequence space themselves, but they are
> > > completely separate from all other blobs in sequence space at these
> > > higher levels.
> >
> > The problem is, your calculations do not even begin to model this "lava
> > lamp" effect. You simply assume that each at level, the distribution
> > of beneficial sequences is uniform random, and base your 20^N
> > calculation off that assumptions.
>
> That's not true. The distribution is not completely random. There are
> groups and clusters, but these do start to disperse at higher and
> higher levels. What is not true is that all the beneficial islands
> cluster themselves in one tiny corner of sequence space like you
> suggest. They do take on a rather scattered appearance as one moves up
> the ladder.
Your calculations are based off the assumption of completely random
distribution. Your unsupported assertions that islands become
scatterred as one moves up the ladder, even if true, do not change the
fact that your calculations only model the scenario where the
distribution is uniformly random, and thus we should expect to see no
clustering whatsoever.
> > If you wish to model the lava lamp
> > effect, rather than just wave your hands about it, you might at least
> > try to give some figures about the expected volume of each "blob" (and
> > not just within each level, but between multiple levels as well), the
> > disritubion of these blobs in your yet-to-be-defined sequence space,
> > the average *mutational* distance between each pair of blobs (weighted,
> > of course, by the probability of each type of mutation you consider),
> > and the average separation, in terms of mutational distance, between a
> > pair of blobs as a function of increasing, yet-to-be-well-defined,
> > complexity. That would take your model in the right direction in terms
> > of realism. We can then discuss why that model would, too, be highly
> > inadequate.
>
> Take the flagellar system of motility, for example. It creates a
> sizable blob within sequence space. But, relatively speaking, the
> minimum requirements make this blob extremely tiny relative to the
> overall sequence space size at this minimum level. There is nothing at
> the levels below or sideways that come remotely close to this blob. It
> is truly isolated on all sides. This model is quite adequate to
> illustrate the problem.
You yourself claim that each of the proposed steps in the evolution of
the flagellum are several dozen residues apart from each other. I
could change as many residues in a globin (where they would constitute
a much larger percentage of the overall "system size"), and still
retain a protein with a similar function. Howard Hershey points out
that the flagellum is quite close to a system containing both a motor
and a rotatable pore, both of which already have independent selectable
utility. I fail to see these vast neutral gaps of yours, and your
continued refusal to actually answer about my questions about blob
size, shape, and distribution of sequences within each blob doesn't
really help your position.
> > > That's true - and, therefore, it is relatively easy to find blob
> > > islands of such types even in sequence spaces that are rather huge.
> > > However, finding such islands will not help you find the rarer more
> > > specified islands any faster.
> >
> > Unless of course, the large blob islands happen to intersect, or be
> > close to, some of the rarer, smaller islands. How do your calculations
> > account for this possibility? Some math here would be nice.
>
> It's the lower ratios as well as the fairly widely spaced distributions
> of beneficial islands Leonid. At higher levels the homologies simply
> aren't close enough for such intersections to exist. There is no
> homology between the flagellar system of motility and any of its
> subsystem parts that is significantly homologous enough to easily give
> rise to the flagellum this side of trillions upon trillions of years of
> average time.
"Trillions upon trillions". Sad to say, Sean, this is as close to a
calculation as you get throughout this entire post.
> > So, why can't a large blob serve as a bridge between two smaller blobs,
> > even at the same "level"?
>
> It's not the size of the blobs that's important. It's the average
> distance between blobs that's important. At higher levels, the blobs
> simply do not intersect any more like they did at lower levels.
I am talking about relative "blob" sizes here. If "a blob" represents
a function with a high overall density in "sequence space" (what you
claim happens with all template-matching functions, although you seem
to think that transcription factor binding doesn't qualify as such),
then it may serve as an evolutionary pathway between several
lower-density blobs. Just one of a myriad scenarios you fail to take
into account.
> > Much as you would like this, you cannot
> > simply look at the positions of small "blobs" in isolation: blobs may
> > be positioned next to one another in "sequence space", and they may all
> > shift in response to environmental pressures, so their positions and
> > shapes are far from fixed.
>
> That's true, but their average distances do not change in any
> significant degree.
So you keep saying.
> > To use your lava lamp analogy, shake up the
> > lamp enough, and you'll see blobs changing form, dissipating, arising,
> > and morphing with each other.
>
> Yep, but at higher levels, levels were the blobs are very rare, it
> doesn't matter much as the change in location and shape doesn't really
> bring them any closer together - on average.
First, you've never even attempted to quantify how rare these blobs are
at high levels. Second, you've never demonstrated, either
mathematically or empirically, that higher-level blobs get separated
from lower level blobs, even if they do get separated from other blobs
at their level. Third, you can keep saying "on average" until you are
blue in the face, but the fact remains that you have no idea how to
compute these averages, and until you do, no one who does not subscribe
to your views a priori is going to believe you.
> > Of course, until you give us some
> > indications on how to define each blob, and how to determine their
> > shape, volume, and distribution, the whole point is moot.
>
> That's where the ratios of Sauer and Yockey come into play.
Before, I've given you scenarios were the ratios as computed Yockey
could be generated by a trivial process of "template-matching"
evolution. Same could be said of Sauer's results, except the
"template-matching" would be the result of co-evolution of the
transcription factor and its binding site. Your retort was that
you'll know template matching when you'll see it. Please don't
construe this as my having trust issues, Sean, but I just don't believe
you.
[snip]
> > > Not true - unless the distributions, by some amazing stroke of luck,
> > > were to somehow head right in the direction of the random walkers.
> >
> > Care to quantify "amazing stroke of luck"?
>
> I already have.
I must have missed those specific probability calculations. I'm sure
you'll have no trouble repeating them. Let me guess: it's 20^-N, isn't
it?
> > Selection constrains the "random walkers" (which they aren't really,
> > but never mind) from heading away from their "beneficial islands". The
> > islands themselves, however, are free to move in response to
> > environmental conditions, and new islands may arise, and old ones alter
> > shape, in response to the positions and number of some the "random
> > walkers" (i.e. evolving genomes). But do such things matter to you?
>
> Oh, so you restrict the walkers and move the islands? How does that
> help the walkers find the islands any faster than if the walkers moved
> and the islands stayed put?
What part of "new islands may arise and old ones may change shape" was
unclear?
> You see, the odds are the same.
Your math in the previous paragraph was just too darn complicated to
follow, Sean. The stochastic analysis is beyond anything I've never
seen before! Just give me a few more years, and perhaps I'll see that
the odds are the same. Repeating this assertion frequently and
forcefully might help, too - why change what works?
> <snip>
>
> > > All of these links discuss random walk in terms of distance - which is
> > > actually quite similar to discussing random walk in terms of time.
> > > Why? Because, generally speaking, it takes a certain amount of time to
> > > cover a certain distance in a random walk.
> >
> > Most amusing. I didn't bother to download the huge powerpoint
> > presenations, but in the other three links (one of which is an
> > undergraduate CS project - not much in the way of a scientific paper or
> > a textbook chapter, BTW), "distance" refers to the Eucledian distance
> > on the lattice or in space where the random walk takes place, *not* to
> > the number of random walk steps.
>
> The Eucleidan distance is divided up into discrete steps in these
> examples. It is not covered in a single step.
>
> > And you are correct, that to cover a
> > certain "distance", whatever your metric of such might be, a random
> > walk will need a certain number of steps (i.e. time).
>
> The steps on a lattice or space are distances. The time per step can be
> measured. Distance and time are therefore related.
Yes, they are *related*. They are not the same thing. The number of
steps is referred not as "distance" but as "time". Accept it and move
on.
> > However, this
> > time is not necessarily *exponential* in the distance covered.
>
> Yes, it is. As far as the total number of steps are concerned, time and
> total distance walked increase in equivalent degrees. If a drunk
> walker has a distance meter and takes one step per second, with a 1
> meter stride, how far would he have walked and how much time would it
> take, on average, for him to find a target that averages 10, 20, 30 and
> 50 meters away on a 2D surface?
To reach any points a distance N meters away? If you read the
wikipedia link, you'll see the answer is N^2. To reach a specific
point N meters away? Depends on the mesh size of the lattice on which
the walk is talking place, and whether the walk is allowed restarts.
Assuming a mesh size of one meter, there are approximately N^2 possible
targets a distance N meters away. A random walk with restarts from the
origin - which models the multiple random walkers in evolutionary
process all originating from the common ancestral point - will
therefore take N^4 steps to reach a specific target. Which, of course,
is not exponential in N. Is that the answer you were hoping for, Sean?
> You see, time and distance both increase exponentially with each linear
> increase in average target distance from the starting point.
What I see is that you have no idea what you are talking about. And
that you still are struggling to grasp the difference between "time"
and "distance".
> > For
> > instance, in the example detailed in the WikiPedia page, the random
> > walk is expected to take on the order of N^2 steps to cover a distance
> > of N. Of course, that example bears exaclty zero similarity to the
> > evolutionary scenario, but the stubborn notion you seem to have that
> > "random walk" implies "exponential number of steps" is just silly.
>
> Not true. Random walk does imply an exponential increase in the
> average number of steps with each linear increase in the required
> multi-character difference in sequence space.
I guess the above calculation is just a figment of my deranged
imagination. I'm so glad to have you tell me what is correct and what
is not, Sean. Please don't bother to back up your assertions with
calculculations of your own - it'll only ruin their pristine beauty!
[snip]
> > So, to clarify, do you mean to say that adding a new protein to an
> > already functional high-level system is so unlikely to produce any new
> > benefits as to be realistically impossible in the established
> > evolutionary time frames? Or am I misunderstanding you again?
>
> You got it.
And so, we have come to the inevitable conclusion that evolution of
novel proteins is impossible, since every organism starting with the
simplest prokaryote is already a highly complex functional system, and
adding a new functional protein to such a system can't be done this
side of a zillion years. That's all, folks!
I suppose that nylonase, much like penicillinase, has always been
there, or maybe the designer(s) just got bored one day and decided to
get busy.
> > > > If the
> > > > latter is true, we would expect to find little or no homology between
> > > > beneficial sequences between higher and lower levels. Is that what we
> > > > observe?
> > >
> > > Yes - this *is* exactly what we observe. Take the flagellar system of
> > > motility again, for example. There are indeed lower-level homologies
> > > to this system, but none of them come very close to the minimum level
> > > of flagellar motility. They are like your P and Q functions where Q is
> > > completely homologous to 100-res of P, but P is 1000-res long. Well, P
> > > is not at all close to Q even though P contains Q. The next closest
> > > steppingstone function on any side of the island of flagellar motility
> > > is very far away.
> >
> > No, the scenario is more in line with the following: P contains Q1,
> > which is 100-res long, Q2 which 200 res long, Q3 and Q4 of 300 res
> > each, and a 100-res domain with no sequence homology.
>
> Ok . . .
>
> > Furthermore,
> > we know that combining Q1 and Q2 would yield independent utility, as
> > would Q3 and Q4.
>
> Right . . .
>
> > We don't know about the final 100 residue segment,
> > and it seems important for the function of P, but we do observe that
> > the structural features in that segment are common throughout
> > biological systems and should not be very difficult to evolve from
> > another structure.
>
> Ok . . .
>
> > On the other hand, from within the context of your
> > calculations, we would see no more homology from P to any smaller
> > functional protein than could be expected at random.
>
> Not true. I'm not saying that homologies do not exist. The lava lamp
> does have columns and interconnected blobs. However, this
> interconnected state is much greater at lower levels than it is at
> higher levels.
Your calculations, however, do not take the homologies, and therefore
the clustering of the "blobs", into account at all. You say one thing,
Sean, but compute another.
> > If you claim that
> > larger systems necessarily require the use of existing smaller
> > functional components (i.e. the designer is somehow constrained from
> > creating novel components to address the unique challenges posed by the
> > complex system), then you are essentially conceding that the
> > distribution of beneficial sequences is necessarily non-uniform,
> > thereby breaking your calculations.
>
> I've always said that such interconnections exist at lower levels, that
> there are indeed islands clusters and interconnecting bridges. It is
> just that these interconnections break down at higher levels. They do
> not stay so clustered.
I know that's what you're saying, Sean: you don't have to keep
repeating yourself. I just don't believe you. You've provided no hint
on how to compute this clustering effect, to correlate with a some
complexity metric, nor any evidence whatsoever than anything at the
level of "functional complexity" where these clusters break down.
> The subsystems of P all have very low odds of coming together just
> right and these low odds become even lower, exponentially lower, as
> they approach P. Even the closest subsystem of P is nowhere near P.
Let's try modeling a simple evolutionary scenario. Suppose there are
two functional protein complexes Q1 and Q2, each coded by an
L-nucleotide sequence, in a bacterial population of N individuals with
genome size S, where L<<S. Suppose the alleles coding for Q1 and Q2
have initial frequencies f1 and f2, and both Q1 and Q2 confer a
selective advantage to the bacteria. Suppose, further, that Q1 and Q2
can merge to form a selectively advantageous complex P, if a fortuitous
gene transfer places them within a distance of d nucleotides of one
another (if you'd like, set d=1). Finally, suppose that the bacteria
transfer genes at the rate of t transfers per individual per
generation. We would like to compute how many generations it would
take for P to emerge in the population, given parameters N, L, S, f1,
f2, d, and t. To make things easier on you, we can assume that
shuffles of a single genome are not allowed within the context of this
problem. With me so far?
Before I go any further, I'd like you to tackle this problem, Sean.
Don't just tell that the time is exponential in L. Try to come up with
a functional form. Let's see what your model can do!
And before you ask, what are the odds of Q1 and Q2 being in the
population in the first place, realize that just like the odds of P
emerging depend on Q1 and Q2, so do Q1 and Q2 depend on their
precursors, and so on. In other words, all the probabilities in
evolution are conditional, and cannot be simply approximated as
N(Q1)/20^L, where N(Q1) is the number of functional Q1 sequences. But,
that shouldn't matter to you, as you claim that in the scenario, the
number of generations until P arises should be exponential in L, even
if Q1 and Q2 are there. So, let's see what you've got!
[snip]
> > Actually, what you describe is exactly a *random walk* within the
> > starting point island. "Random sampling" would entail generating a
> > completely new random sequence at each step, rather than slightly
> > modifying the old sequence to get a new one.
>
> Random sampling is off the island - into the surrounding non-beneficial
> sequence space. Every time a mutation lands a sequence into the
> non-beneficial sequence space, nature destroys that sequence. The rest
> of the population, still being on the island, is saved by natural
> selection, to search randomly, by random sampling of sequence space,
> another day. This is not random walk.
Sean, for the love of the Designer, if you are going to discuss
mathematical concepts, learn the definitions! Random walk refers to an
iterative process, where a sequence is changed one step at a time.
Random sampling refers to generating entire sequence de-novo from the
space of all posibilities. You keep inventing your own terminology
without realising it, and while this is somewhat amusing at first, it
makes communication needlessly difficult.
> > > The only way random walk can occur is if a non-beneficial neutral
> > > sequence is somehow maintained by a population, out of site of the
> > > forces of natural selection, for the purposes of randomly walking
> > > through sequence space.
> >
> > No, both scenarios describe random walks: the only things that are
> > different are the volumes of sequence space available to the "walkers".
>
> Not true. One is random walk, the other is random sampling.
Oh sure. When constrained by natural selection, mutation works by
producing entire new sequences de-novo, rather than by slighly
modifying existing sequences. How could I not see this earlier?
[snip]
> > > Why don't you explain how I've misunderstood the relevance between the
> > > definitions of random walk and uniform random sampling?
> >
> > Draw, or imagine, a large square lattice - say, 50x50. To simulate a
> > random walk, start at the center of the lattice, and walk one the
> > lattice for 10 steps, at each step going off randomly in 1 of 4
> > directions. To simulate random sampling, throw 10 darts anywhere at
> > the lattice. Is the set of points visited by the walk different from
> > the set of points hit by the darts? If, after the first 10 steps, you
> > wanted to maximize your chances of staying close to the origin, which
> > method would you pick: random walk, or random sampling? Which would
> > you pick if you wanted to find yourself close to the point with
> > coordinates (10,10)?
>
> You can also set limits on random sampling around the target - and it
> would still be random sampling. Just because the random sampling does
> not often go very far beyond the bounds of the beneficial island, when
> it comes to random mutations, does not mean that it is random walk.
> You misunderstand the definitions.
Of course I do, since you keep inventing your own. Seriously, Sean,
get yourself a nice introductory textbook on stochastic processes - it
won't bite, I promise! If you would like specific recommendations, I'd
be happy to oblige.
Cheers,
Leonid.
.
- Follow-Ups:
- Re: Sean Pitman: definitions wanted
- From: Seanpit
- Re: Sean Pitman: definitions wanted
- References:
- Re: Sean Pitman: definitions wanted
- From: seanpit
- Re: Sean Pitman: definitions wanted
- From: RobinGoodfellow
- Re: Sean Pitman: definitions wanted
- From: Seanpit
- Re: Sean Pitman: definitions wanted
- Prev by Date: Re: Is there enought happenning in the brain for self awareness?
- Next by Date: Re: The incompleteness of the genome [was: No "Macro" Evolution]
- Previous by thread: Re: Sean Pitman: definitions wanted
- Next by thread: Re: Sean Pitman: definitions wanted
- Index(es):
Relevant Pages
|