Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: anon1@xxxxxxx
- Date: Wed, 11 Jan 2006 15:54:40 -0800
> Say species A has one version of a sequence while species B has two
> versions. This could happen in two ways (at least -- these are just the
> most parsimonious): 1) condition in A is ancestral, and the sequence
> was duplicated on the lineage leading to B; 2) condition in B is
> ancestral, and one sequence (don't know which one) was deleted in the
> lineage leading to A.
That's why with only two species, and no *other* way to polarize that
particular chain of links between them, you have no idea whether it was
a duplication or insertion event, because you don't have enough points
to trace the dup-then-diverge pattern or the diverge-then-delete-one
pattern.
dup-then-diverge diverge-then-delete-one
| / \
| / \
* / x
/ \ /
/ \ /
With a whole series of points, either actual measurements in a lab, or
reconstructed states at internal nodes in a parsimony tree connecting
actual measements in fossils or living genomes, you can see the
difference in those two patterns. But with just two data points (states
of two nodes), no way. (Of course if you kept track of *when* you
measured states in a lab, you *know* the answer already, but you can
discard that data, and use the theory to predict which way time went,
then bring the timing back in to check if theory was correct.)
> >>>If we see a large number of indels that are all in the same direction,
> >>Same direction? Ah, you mean all inferred gaps in the same species.
> > I'm not sure what you mean, so I'll give an example of what I mean:
> > Taxon A Taxon B
> > 1: present missing
> > 2: present missing
> > 3: present missing
> > 4: present missing
> > 5: present missing
> > 6: missing present
> > 7: missing present
> > 8: missing present
> > At sites 1,2,3,4,5 the direction is present-on-left.
> > At sites 6,7,8, the direction is present-on-right.
> > 1,2,3,4,5 are in one direction, while 6,7,8 are in the other direction.
> Yes, I understand what you mean.
Good.
> In order to analyze these sequences you have to align them. That
> means you are putting what you think are homologous sites next to each
> other. If one species is missing some sequence, you put in gaps (think
> of them as spacers) in that sequence to cover the missing parts.
That's a stupid way of doing it, presumably done in order to make use
of existing software that is stupidly designed. Yes, in this one case
I'm claiming current professional practice is stupid and they should be
doing it a different way. I already said *how* they should do it, so I
won't repeat that proposed new/better methodology here.
> So you end up with something like this:
> ACGTACGTACGTACGTACGTACGTACGTACGTACGT----ACGTACGTACGT
> ACGTACGTACGT----ACGTACGTACGTACGTACGTACGTACGTACGTACGT
> ACGTACGTACGT----ACGTACGTACGTAC--ACGTACGTACGTACGTACGT
Those are ***short*** indels, not what we're talking about here. Any
***short*** indel is so likely to randomly turn up twenty different
places due to just stochastic mechanism, that it needs to be ignored
when comparing genomes, as indeed is done here. But in cases where
the missing short-sequence is simply a tandem cycle lengthening from
either adjacent part, you should probably do something slightly
different, such as deleting form *all* genomes the entire parts that
are identical modulo cycle-lenghtening. Hmm, actually you get the
same result either way, so the way described above is good enough.
We're talking about really large indels, where it's virtually
impossible for a sequence to suddenly appear out of nowhere, so if you
see two copies one place and only one copy elsewhere you shouldn't just
ignore that problem as done above. Again, I already described the
proposed new/better methodology so I won't repeat here.
> > A 5:3 ratio is not conclusive as to the arrow of time.
> > But a 500:2 ratio would be pretty conclusive that there were 500
> > deletions and 2 virus-insertions, not vice versa.
> > (This is in the absense of any SINE-insert or DUP-then-diverge data
> > between these two taxa, which would be better evidence of time-arrow.)
> That would be true if you were assured that deletions were much more
> likely than insertions. But how are you assured of this?
I'm talking *only* about really long indels, not those short strawman
indels you showed in the example above. Do you understand that?
If a really long dupliction happens, at first the two copies are very
very similar, and over time they diverge from each other. This is
actually measureable if we have the data (observed or reconstructed)
along a chain of adjacent nodes along a path in a tree.
If a really long immigration occurs from known source, the inserted
segment is very much similar to the source but not so close to any
homologous segment already in the destination geneome, yet still after
the immigration event, the two copies diverge.
If the same thing happens, but the source isn't known, we can't observe
the very similar sequence between immigrated copy and source, but still
we can observe the disparity between immigrated sequence and aborginie
sequence, and we can observe the subsequent divergence from each other.
But if we're looking at a deletion event, backwards in time so we
mistakenly believe it's an immigration event, we seem to see
immigration followed by convergence of the two copies towards each
other, which is so grossly unlikely as to be unreasonble.
Summary so far: Even one (1) such blatantly obvious example of
duplicate-then-diverge or immigrate-then-diverge or diverge-then-delete
is enough to polarize a link/branch. If we have several such, we expect
100% agreement between them, and if the disagree with each other we
suspect a botched measurement somewhere. So here's a good set of data:
Node: A B C D E F
1 1 1 DUP-> 2 div-> 2 div-> 2
1 1 IMM-> 2 div-> 2 div-> 2 div-> 2
1 DUP-> 2 div-> 2 div-> 2 div-> 2 div-> 2
2 div-> 2 DEL-> 1 1 1 1
2 div-> 2 div-> 2 div-> 2 DEL-> 1 1
Add this one new line and we now have a botched set needing explanation:
1 1 <-DEL 2 <-div 2 <-div 2 <-div 2
But suppose we don't have any such strongly polarizing chains of nodes
in our data. We just have a bunch of indels near the ends of our chains
with no room to show divergence in either direction in the narrow
region where there are two copies.
1 1 1 1 1 indel 2
2 indel 1 1 1 1 1
or not enough mutations over the 2-copy region to show divergence in
either direction, hence no way to distinguish between immigrate-diverge
in one direction and diverge-delete the opposite direction:
1 1 indel 2 ===== 2 ===== 2 ===== 2
2 ===== 2 ===== 2 indel 1 1 1
then in such cases with *only* that kind of unpolarized indels, we'd
need to use the preponderence of direction method we're disputing here.
One more case: If we see an indel where the two copies are identical,
we may presume that to be a duplication event we're observing. But it
could still be a deletion event after a recent unobserved duplication
event. For example, what really happened:
1 1 <-DEL 2 <==== 2 <==== 2 <==== 2 <==== 2 <-DUP 1
and what we observe (the two rightmost nodes above are *not* observed):
1 1 indel 2 ===== 2 ===== 2 ===== 2
and we thought it was most likely, which was a mistaken idea:
1 1 DUP-> 2 ===== 2 ===== 2 ===== 2
so with that observation, we must leave the polarity undetermined.
Of course *any* indel not invoving a copy elsewhere in the same genome
is non-polarized, for example:
1 1 1 indel 0 0 0
There's no way to know whether that is a deletion of the only copy of a
sequence, or an insertion of a new sequence not previously present in
this genome. (I'm still assuming this sequence doesn't match any
sequence in a possible source, so we can't know for sure where it came
from, so we can't know for sure it came from anywhere.)
So what if we have a whole set of non-polarized indels.
First consider the 1-indel-2 case:
1 1 1 1 1 indel 2
1 1 1 1 1 indel 2
1 1 1 1 1 indel 2
1 1 1 1 1 indel 2
1 1 1 1 indel 2 ===== 2
1 1 1 indel 2 ===== 2 ===== 2
1 1 indel 2 ===== 2 ===== 2 ===== 2
1 indel 2 ===== 2 ===== 2 ===== 2 ===== 2
all directed the same way, here 1-left 2-right. Furthermore I'm
assuming that the two "copies" in each case are dissimilar enough that
we can eliminate the chance these indels were in fact duplicates
immediately followed by massive mutations across a single link/branch
followed by near total stasis across all the subsequent. What's more
likely, that a bunch of deletion events occurred, getting rid of one
copy of each previously-existing pair (which is safe because they might
be similar enough to be redundant and so getting rid of one or other
isn't harmful), or that a bunch of immigration events happened where in
every case the introduced sequence just happened to be very similar to
one already present? If these are retrotransposons, all bets are off,
but if these are just ordinary sequences, I'd say the chance of such a
large set of immigration chancing to nearly match (but not identically
match) what was already there is essentially zero. So these must have
been all deletion events.
Now consider the 0-indel-1 case instead:
1 1 1 1 1 indel 0
1 1 1 1 indel 0 0
1 1 1 indel 0 0 0
1 1 indel 0 0 0 0
What's more likely, that a bunch of deletions happened, while not a
single immigration or duplication anywhere in that chain, or that a
bunch of immigrations happened, with not a single deletion or
duplication anywhere in that chain, while in both cases the background
rate of point mutations was so small that we can't detect divergence
across pairs of long-ago-duplicated sequences? I don't know. I think
several deletion events across different links/branches of the chain
are more likely than several virus infections across different
links/branches of the chain each of which introduced foreign sequence
from unknown sources.
Summary of my claim:
(1) If we can measure sigificant divergence (between two near-copies of
sequences) in a particular direction along such a chain, regardless of
whether any DUP or DEL or IMM event occurred:
1 1 1 DUP-> 2 div-> 2 div-> 2
1 1 IMM-> 2 div-> 2 div-> 2 div-> 2
2 div-> 2 div-> 2 div-> 2 DEL-> 1 1
2 div-> 2 div-> 2 div-> 2 div-> 2 div-> 2
Then we know the polarity for sure, and any contradiction in this data
is serious cause for concern, unless we've crossed by the root along
this chain (see later).
(2) In the absense of such clear case of divergence, if we have lots of
clearly polarized indels such as some kinds I cited earlier, then we
can use majority vote and expect a very strong concensus, or else have
cause for concern if the vote is too close.
(3) In the absense of either divergence or self-polarizing indels, we
use the heuristic that deletions are more common than virus vector
immigrations, and so we conduct a vote and a strong preponderence of
direction gives us weak support for polarity while weak preponderence
of direction leaves the link/branch/chain unpolarized.
(4) If we see any sudden change of direction, from the middle toward
the ends, but consistent direction within either of the two parts:
2 <-div 2 <-DUP 1 1 1 1
2 <-div 2 <-div 2 <-div 2 DEL-> 1 1
2 <-div 2 <-div 2 <-div 2 div-> 2 DEL-> 2
2 <-div 2 <-div 2 <-div 2 div-> 2 div-> 2
1 1 <-DEL 2 <-div 2 div-> 2 div-> 2
1 1 1 1 1 DUP-> 2 (2 identical copies here)
then we can be sure the root is out the third branch from that node
where the direction changed in this chain. Or if we see a bi-polarity
within a single link/branch, but uniform polarity away from that link
within all other links, then the root is within that one link via a
branch point we don't show because we don't have any outgroup in our
data.
(5) I've drawn all these examples as single chains, not showing the
larger tree which contains them, because it's much easier to show
chains than trees here in an ASCIi text medium. But really what you'd
do is draw such strong >>> or weak > polarizations in the entire
unrooted tree and see where the root obviously lies. Here I'll draw the
tree with all links/branches running horizontally so there's plenty of
room for such polarization to be drawn in ASCII:
Bonobo-----+---<<>---+---<>>------Human
| |
Chimp------/ |
|
Gorilla-------<<>------+---<>>>>---/
|
\----<<<<<>>---+----<>>>>>>>>--------Orangutan
|
Gibbon#1--------+-<<<<<<<<<<<<>>>>>>>>>-/
|
Gibbon#2--------/
That's all hypothetical, if and when we complete enough of the genomes
of the various species to allow comparisons across duplicated genes to
show divergence (strong polarization) and DEL events (weak
polarization), and if it turns out there is enough self-polarization
within such closely related taxa. I think you can see you don't need an
outgroup to find which link/branch contains the root, right? (Note
because of lack of outgroup here, root is *within* an internal branch,
not out the third branch from a node as in my examples above which
assumed we included an outgroup but didn't know for sure that it was
really an outgroup.) As to whether such definite polarization would be
found among that group of taxa, I am not willing to predict at this time.
This is just an example here.
..
.
- Follow-Ups:
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- References:
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: anon1
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: anon1
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Prev by Date: Re: wheels
- Next by Date: Re: Ant School - The First Formal Classroom Found in Nature
- Previous by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Next by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Index(es):
Relevant Pages
|
Loading