Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman <jharshman.diespamdie@xxxxxxxxxxx>
- Date: Fri, 06 Jan 2006 18:11:40 GMT
anon1@xxxxxxx wrote:
>>most fossils, of all ages, are marine, and that marine sediments are
>>interbedded with terrestrial ones. So you are forced to imagine a
>>pre-flood world in which there was land on top of ocean on top of land
>>on top of ocean.
>
>
> Aha, that's a good refutation to my strawman hypothesis of layering of
> fossils due to elevation where the various species lived at the time of
> a Flood. Thanks. So I'll have to think up a new strawman hypothesis to
> make the geological column data fit a non-evolutionary mode.
[snip]
Sorry, too boring to read. Why are you doing this?
>
>
>>there are in fact programs that just draw trees, but you
>>have to give them trees as input.)
>
> That makes no sense, unless you're talking only about the
> visual-rendering program which converts a table-description of a tree
> into a raster image showing the same data in visual form.
That's what I'm talking about, except that they generally aren't
table-descriptions or raster images. But yes, I am distinguishing
"tree-drawing", and visual exercise, from phylogenetic analysis, which
tells us what tree we ought to be drawing.
> I'm more
> concerned with the program that takes a matrix relating different
> characters against different species or individual and performs
> mathematical analysis to find a minimal (character-difference) spanning
> tree connecting all the species/individuals.
You are talking about a minimum-evolution tree; in discrete-character
terms, a parsimony tree. Careful. That's not the same as a
minimum-spanning tree.
> I don't care so much about
> the program that actually draws a picture of that already-computed
> mathematical tree. (And I don't really care how the miminal-spanning
> program writes its output, whether it uses XML or Fortran formatted
> output to express the mathematical tree it has constructed.)
Usually it's output as a string in what's called "Newick" format. Here's
a sample: (A,(B, C)). That's a three taxon, rooted tree in which B and C
are sister groups and A the outgroup.
>>the justification for minimum-evolution or parsimony methods.
>
> The idea that in a very high dimensional space, a branching sequence of
> random walks from a single starting point is impossibly unlikely to
> ever intersect itself except immediately after a branching event when
> the different paths haven't diverged far from the starting point, in
> fact it's impossibly unlikely for two divergent random walks to ever
> get significantly closer to each other to where a minimal-spanning-tree
> algorithm would use such a convergent-evolution link instead of
> strictly following only the true-evolution links? Hence it's possible
> to uniquely recover the *actual* evolutionary paths just by looking at
> the character/DNA data.
Sorry, you have deleted too much context. And you are too divorced from
biological meaning for me to care.
>>However, this doesn't relate at all to your odd attempt to draw a
>>line from B1 to B2.
>
> I'm simply saying that if B1 and B2 were the only data presented, the
> program would draw a link directly between them, but with A1 and A2
> also present, with need to link all four together, it's more efficient
> to link B1-A1-A2-B2 and *not* to include any direct B1-B2 link. How
> does that not relate??
It bears no real resemblance to how phylogenetic analysis works. Here's
how it works: You start with a bunch of alternative trees. You evaluate
them all under some optimality criterion, of which parsimony (smallest
total of required changed) is one. You pick the tree that is best under
that optimality criterion. All else is bells and whistles, mostly ways
to keep from having to evaluate all possible trees.
>>Insertions do most likely come from somewhere.
>
> Except for extremely short insertions, such a single base, which might
> come from a break in the DNA strand followed by some random molecule
> attaching to the end of one strand and then the anti-copy mechanism
> mistakenly making an anti-copy of that random molecule as if it were an
> existing DNA base. I agree all long sequences are unlikely to be
> generated de novo by such fooling of the anti-copy mechanism, so we can
> assume they come from actual DNA or RNA strands that were part of some
> genome somewhere else (other place in genome of same cell, or a pilus,
> or a virus, etc.)
>
>
>>But generally you don't know where they come from,
>
>
> Most of the time they come from elsewhere in the genome of the same
> cell. If they are sufficiently long, sequence comparison will show
> exactly where they came from.
Only if you have the whole genome to look at. Generally you don't. Very
few species have had their entire genomes sequenced. If we had to wait
for whole-genome sequencing before resolving phylogenies we would have a
very long time still to wait.
> Only a very few insertions should come
> from other cells or from viruses. If we see a large number of indels
> that are all in the same direction,
Same direction? Ah, you mean all inferred gaps in the same species.
> and each indel is completely
> different in pattern from any of the others, and not a single one of
> them matches any DNA elsewhere in the same cell, I think we can assume
> they are deletion events, not insertion events. Tandem repeats would
> follow patterns, and known SINEs would be from a very limited menu of
> possibilities active in that particular species, so we can eliminate
> those as likely if the indels are totally unstructured and
> uncorrelated.
I will agree that if we had the entire genomes of the species being
compared, and went through a lot of computational trouble, we could
probably polarize a great many indels that we now cannot.
> If we see 90% of the indels in a particular direction matching segments
> of DNA elsewhere in the same genome, and the remaining 10% of that
> direction are of unknown source, I'd accept that 10% are unknown virus
> vector or other non-local copy&paste.
Yes, I suppose this would be possible in principle, if indeed we had the
entire genome sequence.
>>and thus there is no way to tell an insertion from a deletion without
>>rooting the tree.
>
> I disagree. Duplication-insertion events are easy to diagnose, and
> hence decide which direction time's arrow goes. SINE insertions
> likewise are easy to diagnose and decide time's arrow. Large numbers of
> indels all going the same direction and *none* of them matching any
> duplication or tandem repeat or SINE etc. would clearly be diagnosed as
> all deletions, not insertions. The clearest case would be a lot of
> indels in one direction all clearly duplication or SINE inserts, and a
> lot of indels in the opposite direction not explainable via any such
> mechanism, and tandem indels going both ways which we can ignore,
> clearly showing that the first group are indeed insertions while the
> second group are deletions, and the tandem indels are then assigned
> accordingly.
>
>
>>The fatal problem (though there are many others) is that "copies of
>>something from somewhere" are distinguishable only if you know where
>>they come from.
>
> Perhaps we'll just have to leave this question unresolved until
> Venter's whole-ecosystem shotgun sequencing project has catalogued 99%
> of all sequences worldwide, and then there'll be a simple test via
> database lookup for any sequence to tell where it could have come from.
Don't hold your breath. A genome project for every species? Not in your
lifetime.
> Unless you accept my assumption that in any large (ten or more)
> collection of indels, there are solid explanations for several of them,
> either copies from elsewhere on same genome, or SINE inserts, and these
> are sufficient to establish the time-direction for that particular link
> as well as to provide true evidence not only that evolution happened
> but specifically that evolution happened via a mechanism we see in the lab.
>
> Do you have any summary statistic, regarding the complete unrooted tree
> of life, how many links between internal nodes display sufficient
> known-mechanism indels that we can use that information to assign a
> time-direction to that particular link?
No. And please stop calling them links. They're branches.
>>>If you score different kinds of changes,
>>>you should be able to estimate which way time occurred between two
>>>genomes which differ by any insert/delete/dup.
>>
>>The chances of actually doing that might indeed increase if you had two
>>or more entire genomes to examine. But usually we don't.
>
> I was reading in a recent issue of _Science_ that we already have a
> hundred or more complete genomes. I presume most of them are very small
> genomes, such as viruses or bacteria, only about ten or so really large
> genomes such as mustard or mammal. So what do you mean by not having
> even two complete genomes?
I mean not the two particular complete genomes for which you are trying
to figure out indel polarity for. That's what I mean by "usually". There
are very few such pairs. If you want to do it with human and chimp, you
might be able to.
>>I never use time-directed indels to root the tree.
>
> So what do you use to root it? You have prejudice about what came
> before what else, based on what some other expert claims and your
> respect for such authority, with no evidence (except fossils) to
> support that prejudice, and you just go with your prejudice as if it
> were Gospel?
I go with the phylogenetic structure that has been determined before me.
If I had to start from scratch every time, there would be no scientific
progress. Generally, all I have to do is suppose that birds are
monophyletic, and that crocodiles are outside that clade. Would you
grant that? This can be tested by bringing in a mammal outgroup for
those sequences that can actually be aligned that far. We could argue
about whether bird monophyly is a reasonable assumption if you like, but
I would find it very boring.
>>You can use time-directed indels (SINEs being the only such that you
>>have mentioned) to root a tree.
>
> They don't completely root the tree. At best they certify that certain
> branches of the tree are true clades, and that the true root lies
> somewhere in the relatively small part of the tree that connects those
> clades together. Do you know how many total species are known, and
> hence the total size of the unrooted tree connecting them all, and do
> you furthermore know the statistics of the known-clades that result
> from knowing the SINE-insert data, and hence the number of branches in
> the undirected portion of the tree?
No. And anyway I reject your basic premise, remember? We don't need
SINEs to root the tree.
>>You can use a rooted tree to determine the direction of any
>>non-time-directed indels on it.
>
> Given that only part of the tree is rooted, the known clades as given
> by the SINEs, this method determines the direction of every link within
> any one of the clades, but doesn't help anywhere in the undirected
> backbone that connects the various clades together.
There is no distinction between "backbone" and "clades". And again,
remember that I'm not supposing that SINEs are the only way to root a
tree, or even part of a tree.
>>"Directed link" is not a term I would use, nor am I sure of its meaning.
>
> If you feed in the character and/or DNA data for current species only,
> you get an unrooted tree, comprised of leaf nodes (the current species
> themselves),
We usually call those "terminal nodes". Systematics terminology differs
somewhat from graph theory terminology. Get used to is.
> internal nodes (the new nodes generated by the program),
> half-internal links (between internal nodes and leaf nodes),
Terminal branches.
> and
> full-internal links (between two different internal nodes).
Internal branches.
> The
> half-internal links are of course directed, from the internal node to
> the current species.
No they aren't. The root could lie anywhere along that branch, even
(theoretically) at the terminal node itself.
> But initially there's no reason to decide which
> direction the full-internal links go, so they are all undirected links
> initially. If and when you have solid evidence to allow you to decide
> which direction some full-internal link goes, *then* you change that
> link from undirected to directed in your database. So my term "directed
> link" means any link for which you know the direction of time across
> that link. This term refers to our information. In nature of course
> *every* link was directed, but we aren't omniscient so we don't know
> the direction of every link, so in our model some links are undirected
> until we get more information sometime in the future, or they remain
> undirected forever if we never get sufficient information to assign a
> direction to them.
If you used standard terminology, it would be easier for me to
understand you.
>>No, internal nodes are not species, even presumed species. They are
>>hypothetical common ancestors, which may or may not be species.
>
> In the case of LCA of current species that all engage in meiosis, and
> presumably all ancestors back to the root of eukaryotes also engaged in
> meiosis, what else would a LCA be except a species?
A complicated question, in which we would have to argue extensively
about the meaning of "species", which I really don't want to get into.
>>Extending the meaning of "species" through time is itself problematic.
>
> Extending the meaning so that a species lasts over time is indeed
> problematic, and perhaps meaningless. But at any particular epoch
> (single time), it's possible to define the species at that particular
> time in a logical manner. Yes it's a fuzzy definition due to ring
> species and in-progress species-splits, but still in some sense the
> term can be defined just the same as it'd defined today.
True in principle for any time slice. But of course the necessary data
are present only for the current time slice. Best not to introduce
pointless concepts, especially when they're unnecessary.
>>I counted the weights exactly as the data allow. Only the SINEs were
>>self-polarizing. All else is your delusion.
>
> So if I asked you: DNA-segment-duplication events are self-polarizing,
> because it commonly happens that a single segment of DNA gets duplicated
> and then the two copies gradually diverge over time, but it *never*
> happens that two unrelated segments of DNA gradually drift toward an
> exact match and then at the moment they exactly match suddenly one
> of the two copies is exactly/totally deleted. Agree [ ] / Disagree [ ]
> You'd check Disagree??
I'd ask you what you thought you were talking about. It appears that you
believe we are assuming that we have an entire sequenced genome to
search for similar segments. That assumption is just unwarranted in
almost all cases.
>>You seem to think that a SINE on one branch polarizes that
>>branch only, when in fact it supplies a root for the entire tree
>>subsequent to that particular insertion.
>
> No, I don't think polarizing a single link has no consequences
> downstream. Indeed my opinion is as you stated there. Polarizing a
> single link establishes one of the two sides of the tree, namely the
> *after* side, as a true clade, which thereby defines the direction
> along all links within that clade. That true-clade is thereby
> eliminated from consideration as the true location of the root. The
> root must be somewhere in the other side of the tree, the *before*
> side.
Good.
>>>So do you know of any such force-direction mutations across major links
>>>in the middle of the unrooted tree of life as it is currently known,
>>>which could be used to root portions of the tree and thereby establish
>>>those portions as true clades, and thereby restrict the global root to
>>>the rest of the tree that connects those clades together, ...
>>
>>No. I'm sure there are gene duplications that would be useful for the
>>purpose. For example, all vertebrates share two duplications of the
>>entire single ancestral HOX cluster.
>
> So now you're agreeing with me that duplications have intrinsic
> direction (self polarized), which is the opposite of what you said just
> earlier.
No it isn't. What you were talking about earlier were just indels that
you assumed were polarized in some way. Duplications do have a
direction. However, this is complicated by the fact that copies can be
both gained and lost, and orthology can be a tricky question at times.
Again, this problem would be ameliorated if you had whole genomes
available for examination, but for most species you don't.
>>I don't know if any protists have SINEs.
>
> What about duplications followed by later divergence of the copies?
Plenty of that.
>>>By the way, I presume the unrooted metazoan tree must look like this:
>>> Parazoa--------+---------Radiata
>>> |
>>> Bilatera
>>>where the root is presumed to be somewhere within Parazoa, right?
>>
>>Radiata is not a term in modern use.
>
> It means Cnidaria + Ctenophora. I'll re-draw the unrooted tree to show
> them separately to please you.
>>Nor is Parazoa.
>
>
> It means Porifera + Placozoa. I'll re-draw the unrooted tree to show
> them separately to please you.
>
> Porifera--+--------+---------Bilatera
> | |
> Placozoa +--Cnidaria
> |
> Ctenophora
> Do you see an unrooted tree there, or can't you see it? (Earlier you
> said you don't know what an unrooted tree is, despite my frequent
> drawing such trees for you. Well here's another.)
You are imagining things. That is indeed an unrooted tree. It's not
necessarily the true unrooted tree, though. several of the branches you
show are contentious, as I have explained. The root probably lies within
Porifera.
> Also, do you agree all five taxa I've used there are modern usage?
Yes.
> Now the big question: Per the latest data, have I drawn the correct
> unrooted tree connecting those five taxa? If not, which unrooted tree
> would you draw to connect those same five taxa?
It's unclear. All three possible resolutions of the
Bilateria/Cnidaria/Ctenophora trichotomy are currently in contention.
>>... it's unclear whether they are a clade or whether one or the other
>>is closer to Bilateria than the other.
>
> All I'm asking about here is the **unrooted** tree. With an unrooted
> tree, "closer to" is meaningless, so please don't use such nonsense
> wording in this part of the discussion.
OK. It's unclear if they form a possible bipartition of the unrooted
tree. Happy?
> Note if we include a sixth taxon, any protist, then on the assumption
> that protists came before any animals, and on the assumption that
> Porifera lie between the protist ancestor and the rest of the animals,
> that sixth taxon would be *inside* the Porifera node on the unrooted
> tree, forcing the Porifera node to be broken into multiple nodes,
> causing the tree to be re-drawn something like this:
>
> Protist--+----------+-------------+--------+---------Bilatera
> | | | |
> SomePorifera MorePorifera Placozoa +--Cnidaria
> |
> Ctenophora
>
> As an interim measure, before we have the Porifera-related internal
> nodes fully resolved, we might have an unrooted tree like this:
>
> SomePorifera MorePorifera
> \ /
> Protist----------*----------+--------+---------Bilatera
> | | |
> EvenMorePorifera Placozoa +--Cnidaria
> |
> Ctenophora
>
> So anyway, how would you draw the unrooted tree of just the five animal
> taxa, not including the one protist taxon, based on the latest
> information? The way I drew it 43 lines earlier, or some other way?
I would reduce the resolution.
> OK now I switch to *rooted* trees, true cladograms, are you clear on that?
>
>>>Now I saw a rooted tree long ago that showed:
>>> --Protists
>>> |--ModernProtists
>>> `--Animalia
>>> |--Porifera
>>> `--(unnamed)
>>> |--Placozoa
>>> `--(unnamed)
>>> |--Radiata
>>> `--Bilateria
>>>But somebody here (you?) said we now believe the root is *within* Porifera.
>>
>>Yes. This is a bizarre tree in many other ways, such as the monophyly of
>>ModernProtists, and the use of Radiata as a clade name.
>
> If Cnidaria + Ctenophora form a single clade not including any other
> phylum, and if the old name for that single clade is "Radiata", what do
> you have against continued use of that name?
Radiata is actually an old name for a number of groups that also
included echinoderms.
> I'll re-draw it to clearly
> show that Radiata is the smallest clade including both Cnidaria +
> Ctenophora if that pleases you.
>
> As for ModernProtists: Sorry, I'll re-draw that as an unresolved node,
> and I'll re-name the topmost node
> Here is the result with both those changes:
>
> --ProtistsAndAnimals
> |--SomeModernProtists
> |--MoreModernProtists
> |--EvenMoreModernProtists
> |--YetSomeMoreModernProtists
> |--YetAnotherBunchOfModernProtists
> `--Animalia
> |--Porifera
> `--(unnamed)
> |--Placozoa
> `--Eumetazoa
> |--Radiata (or just call this node "unnamed" if you wish)
> | |--Cnidaria
> | `--Ctenophora
> `--Bilateria
>
> Now presumably you would break up Porifera into multiple clades which
> connect at various points along the branch from Animalia down toward
> Eumetazoa, correct?
That's what at least some evidence is currently showing.
.
- Follow-Ups:
- References:
- Prev by Date: Re: I'm ready to give up
- Next by Date: Re: PZ Myers: a rising star in the liberal blogosphere?
- Previous by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Next by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Index(es):
Relevant Pages
|