Re: Part 1 (of 3): What are major aspects of evolutionary theory?



> > So-far we have discussed duplicated segments of DNA, whose gradual
> > drift away from each other after duplication shows the arrow of time,
> > but that works only for very recent duplications where we have the DNA
> > sequences of ancestors and/or several very near modern
> > individuals/species which can be used to reconstruct the genomes at the
> > presumed branch nodes.
> This is word salad, I'm afraid. "Reconstruct the genomes"? "DNA
> sequences of ancestors"? And what you are saying (if I can understand
> it) is not at all what I was talking about.

Suppose we have an unrooted tree with three terminal nodes whose genome
we know and one inner (branch) node connected to all three. Suppose
we've figured out where a segment of DNA in one terminal node matches
the other two:
J: CCTCCAGTCCTATCCAATCTACTGTACTTT /J
K: CCTCAAGTCCTATCCAATCTACTGTACATT K-M
L: CCTCAAGTCCTATCGAATCTACTGTACTTT \L
x y z
Three positions, x,y,z, have base differences, the rest are identical
across all three genomes. Now let's guess what the genome at the inner
node would be. I claim that majority rules is a good guess:
M: CCTCAAGTCCTATCCAATCTACTGTACTTT
Suppose J was the common ancestor of K and L. Then x mutated from C to
A before the inner node M, and both descendents K and L inherited
copies of that mutated base. Also y didn't mutate until after the inner
node M, from C to G only in reaching descendent L. Also z didn't mutate
until after the inner node M, from T to A only in reaching descendent
K. Thus three point mutations, once each, explain all variation. Any
other explanation would require more than three mutations total, in
particular would require two different mutations on exactly the same
base. The same argument works whether K was the common ancestor of J
and L, or whether L was the common ancestor of J and K. The argument is
totally symmetric with respect to the three nodes, so we don't have to
root the tree before applying the majority-rule rule-of thumb.

In the case of all the other bases except x y or z, the argument is
even stronger. Assuming the inner node matches all three terminal nodes
requires no mutations at all, whereas assuming it differs requires
three point mutations all at the same base with one before the inner
node exactly cancelled by each of the others, very very unlikely.

Now suppose we know only two of the terminal nodes adjacent to a inner
node. Whenever those two agree, we don't need to know the third,
majority rule says the inner node probably matches those two,
regardless of whether the third terminal node matches the first two or
not.

Now consider a larger tree, and we'll show just a single base for each
terminal node:
H=[A] J=[T] K=[T]
| | |
+------+---+--+--L=[T]
| |
+---N=[A] M=[T]
|
+---P=[A]
|
Q=[A]
By majority rule, we can assume the inner node linking K and L has base
T, then using that we apply majority rule again to conclude the inner
node leading to M also has base T, likewise the inner node leading to J
has base T. From the other end, the inner node connecting P and Q
should have base A, likewise the inner node connecting to N has base A,
and the inner node under H has base A. Now we see a single mutation
along the link under H/J.

Now none of those conclusions are 100% sure. They are merely the most
parsimonious of the possible results.

Next consider a case where the majority rule doesn't apply at an inner node:
H J=[A] K=[T]
| | |
+------+---+--+--L=[T]
| |
+---N=[A] M=[A]
|
+---P=[C]
|
Q=[C]
By majority rule, the node connecting K and L has base T, and the node
connecting P and Q has base C. But at the node above M, we have one
vote for A and one vote for T and one we don't know. So if the base is
A or T only one mutation is needed, whereas if the base is C or T two
mutations are needed, so we assume it's either A or T. Now look at the
inner node under J. One neighbor is A, the other is A or T. In the
former case, majority rule says it's A. In the latter case, it could be
A or T. In conclusion ie could be A or T, with preference for A.
Likewise the node to the left of N could be either A or C. Now consider
the inner node under H. One neighbor is A or T with preference for A,
other neighbor is A or C. Fewest mutations would occur with A. Now we
can guess that H also probably has A.

Now consider a highly conserved base in all three domains of life. If
we try to reconstruct the LCA of all three domains, at places where the
DNA bases are uniquely homologous, if all three domains have the same
base, we have high assurance the LCA also had that base, and if two of
the three agree but the third is different, we have moderately high
assurance the LCA agreed with the majority.

So what I envision from whole-ecosystem shotgun sequencing (see another
message I posted previously) is that we find segments of DNA bases
which fit entirely within single shotgun reads, and try to find which
are so very similar they must be homologous, and we build unrooted
trees for such, and use majority rule and variations thereof to predict
the genomes of the inner nodes of our trees.

> > (Which is a circular argument if you're trying to root the tree in the
> > first place.)
> Which nobody does.

If nobody ever roots the tree in the first place, then you can't use
the rooting somebody else already did, you're at square zero and can't
get past there.

> > Um, slight problem: With only three domains of life, there is only one
> > possible unrooted tree, and all three un[sic]rooted trees are satisfied by
> > that one unrooted tree, so even if all three different unrooted trees
> > applied to various homologous-gene-duplication groups, there'd be no
> > way to check anything based on modern DNA evidence. Or am I wrong?
> Yes, you are wrong, but I have no idea where your confusion lies. You
> are correct that there is only one unrooted tree. So where do you get
> your three unrooted trees from?
(That was a typo.)
> Did you mean three rooted trees?
(yes)
> If so, the duplication would root the tree and distinguish among
> them.
(So we're in agreement that a duplication would show that the root
occurred somewhere thataway in the original unrooted tree, on the side
where there's only one copy, away from the side where there are two
copies.)

> You have to know that X is a clade already before you can
> use y to root it.

So how do you ever know that X is a clade?

> But you want to know how to get into this system initially.

Yes, that's what I keep asking you, and you keep evading.

> If you go far enough out, even with the worst molecular clock, you
> reach a point at which it's not credible to root a tree at that much
> distance from the midpoint, and thus you have outgroups.

I agree with the basic idea, but I have no idea how to decide what is a
reasonble threshold for such guesses.

> Some characters do seem to root themselves too, or at least that's
> not a bad initial hypothesis. We might suppose that, for example, any
> protist would make an outgroup to Metazoa, and that Metazoa is
> monophyletic.

I don't agree with that argument. There could be a whole bunch of
protists, maybe even a phylum or two, which are just degraded metazoa.
But even a single protist descended from metazoa would make metazoa
non-monophyletic (not a clade, only 99.999% of a clade).

> Now in fact some protists might conceivably be descended from
> Metazoa. But if most of them are not, and we pick a bunch of different
> examples, we would find a contradiction that let us see this. Only if
> all protists are descended from Metazoa will we have a potential
> problem.

Let P represent major groups of protists, and let M denote major groups
of metazoa, in the following condensed view of a large unrooted tree:
PPP MM
PP--MMMMM--PP
|
PPP
PPPPP
PPP
We have no way to know which of the three groups of protist is
ancestral and which are degraded metazoa. Accordingly we have three
possible roots for metazoa and have no clue which is correct.

On the other hand, if after sequencing corresponding parts of *every*
species of eukaryotes whatsoever, the unrooted tree looks like this:
PPPPPP MMMMMMMMM
PPPPPPPPPPPP--MMMMMMM
PPPPPPPPP MMMMMMMMM
P PPP P M M
I.e. just a single clump of each, then I would guess maybe the root is
somewhere in that one protist half-tree, and so we know the root of the
metazoa tree.

Of course it's going to be amusing if it comes out like this:
PPPP MM
PPPPPPPPPPP--MMMMM M MMMMMMMMM
PPP PPPP MMMM MMMM MMMMMMMMMM
P P PP--MMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMM
MMMMMMMMMMMMMMM MMM MMMMMMMMMMM
MMMM MMMMMMMMMMMMMMMMMM
M M M MMMMMMMMMMM
MMMMMMMM MMMMM M
whereby most of metazoa is one clade descended from one kind of protist
but a small group of metazoa is a different clade descended from a
different kind of protist. It would be especially amusing if it turns
out that convergent evolution made the two clades look nearly the same
in the way they develop toward adulthood despite using very different
mechanisms to evoke such similar development. For example, there are
two things that can be done with the pore that develops from the first
invagination of the blastula, it can be mouth or anus, and it's
possible there were three times this evolved, two of one way and one of
the other way, and so the small clade may duplicate the development
style of half the large clade.

So how long do we have to wait until a sufficient representative of
*every* group of metazoa and protist has been sequenced (corresponding
DNA segments, not whole genome) to where we can fit them all into a
single unrooted tree and know which of the cases applies?

Do you know of any online Web page with an unrooted tree that covers a
good fraction of all known protist and metazoa phyla, everything
sequenced to date, as a preliminary view of this? (And if it also
includes plants and fungi, so much the better.)

> You remember that we had many human sequences, not just "Human" as a
> terminal taxon. The question was whether "Human" was a clade, and
> whether chimps were an outgroup.

Are you talking about nuclear DNA, or mitochondrial DNA? Mitochondrial
DNA is only a very tiny portion of the whole human genome. But nuclear
DNA is involved in meiosis so there's no such thing as a "clade" within
a single species. Each species must be considered an "individual", a
single node, when drawing either rooted or unrooted trees with nuclear
DNA, so the question you ask is meaningless.

I'm going to assume you're restricting this question to mitochondrial
DNA, where it makes any sense. By the way, is the 3.1 billion figure
for human genome the total of nuclear and mitochondrial, or is that the
subtotal for nuclear only? How large is the mitochondrial portion?
<http://www.actionbioscience.org/evolution/ingman.html>
* Mitochondria have their own genome of about 16,500 bp that exists
outside of the cell nucleus. Each contains 13 protein coding
genes, 22 tRNAs and 2 rRNAs.
Is that correct, only 16.5k bp total for human mitochondria?
I guess the 3.1 billion figure to that many significant digits would
apply to either the sub-total or the grand total equally well, because
the mitochondria part is essentially zero at that scale.
So how many humans, and how many chimps, have gotten their
mitochondrial DNA totally sequenced, so that all of them could be put
into a single unrooted tree? What does that tree look like?
One clump of chimp connected via a single link to one clump of human?
If, after drawing the tree, we compare different branches of it, do we
discover any lateral gene flow, or is it consistent with none at all?

Anyway, back to questions of clades: Suppose we get lots of
mitochondrial genomes of all five species, and the unrooted tree looks
like this: H H H H H
BBBBBBBBBBB-----------+-------+--------+--HHHHHHHHHHHHHHHHHHHHH
B B B | | | H H H
BB BB B OOOO GGGGGG CCC-+-CCCCC HHHHHHHHHH
B B BBB O G G C C C H H H
CC HHH H
Then we can say for sure at most one of those species is not a clade.
(Treating all species of gibbon as if a single species here, but
showing each species of chimp separately.)

I think in general, when we get an unrooted tree like that, the best we
can say is that all but one of the clumps is a clade, and the remaining
clump may or may not also be a clade, and we have no idea which clump
is that one maybe-clade. Maybe it's best to draw only unrooted trees
and let the reader guess which if any of the clumps is not a clade.

> >>If humans still form a single group relative to the chimp when the
> >>tree is rooted on a gorilla
> > Huh? The tree is rooted on a branch between two clades, not on a
> > single-species clade.
> Don't be pedantic. This is the commonly used terminology. Nobody is
> thereby claiming that gorillas are ancestral to humans. You could say
> "rooted on the branch leading to a gorilla", but that's needlessly long.

Oh, thanks for explaining the slightly illogical jargon which confused me.

> The question at hand is whether Homo sapiens is a clade, and whether
> a chimp is a safe outgroup

Which makes sense only if you're restricting discussion to mitochondria.

> As long as there is a branch separating the "Human" part
> of the tree from all others, our conjecture that humans are a clade is
> unfalsified.

So long as you restrict discussion to asexual reproduction, such as
mitochondria, that makes sense. With link between two parts of unrooted
tree, at most one part can contain the true root, so at most one is not
a clade.

But if we start talking about nuclear DNA, then each species is a
single node, it makes no sense to talk about clade or tree when dealing
with meiotic-crossing DNA. The only interesting questions are whether
traditional/natural groups of more than one species, such as "placental
mammals" or "birds" or "bats" or "monotremes" or "tetrapods" or
"chordates" or "metazoa", are clades (as originally defined, without
any "adjustments" of moving some small groups into or out of the
natural group to fix its cladeness).

> Are you saying that species can't be paraphyletic?

If it's a sexually reproducing species, in the sense that every member
can potentially mate with every other member (either directly if of
different sex, or with one intermediary if of same sex), and if in fact
there is travel between different local populations to avoid clades in
the local-population sense, then it makes no sense to even ask the
question in regard to nuclear DNA.

Each species is a single clade because it's a single individual for the
sake of computing trees or cladograms with nuclear DNA.
..

.



Relevant Pages

  • Re: Part 1 (of 3): What are major aspects of evolutionary theory?
    ... >>>but that works only for very recent duplications where we have the DNA ... >>>(Which is a circular argument if you're trying to root the tree in the ... > So how do you ever know that X is a clade? ... > species of eukaryotes whatsoever, the unrooted tree looks like this: ...
    (talk.origins)
  • Re: Part 1 (of 3): What are major aspects of evolutionary theory?
    ... > bipartition in the unrooted case, and either sub-tree or clade in the ... > are a clade evolutionarily). ... > unrooted tree (which is asumed to be the shadow of an unknown rooted ... real node somewhere on that stem, such that some real species falls into ...
    (talk.origins)
  • Re: Part 1 (of 3): What are major aspects of evolutionary theory?
    ... and either sub-tree or clade in the ... are a clade evolutionarily). ... unrooted tree (which is asumed to be the shadow of an unknown rooted ... Any living species could be the common ancestor ...
    (talk.origins)
  • Re: Archaeopteryx
    ... > So the name on the node is usually the clade name, ... > tree from Google cladogram search is generally successful, ... >>The reason for this is that TOL concentrates on extant taxa. ... > Neornithes includes all extant birds. ...
    (talk.origins)
  • Re: Co-optation Today
    ... without sufficient empirical justification. ... The characteristic of causality, plain and simple. ... causality up the tree. ... My DNA came from my mother and father. ...
    (talk.origins)