For Sean Pitman: Some real nested hierarchies



Here is a protein-coding region, tas2r38, which codes for a taste receptor protein. The sequence is 1002 bases long, but is highly conserved; only 40 of them vary among species here, and only 10 are relevant to a nested hierarchy. All 10 are shown below:

Homo_sapiens cggtatccag
Gorilla_gorilla cggtatccga
Hylobates_klossii tgacaaccaa
Pongo_pygmaeus tgacaactga
Pan_troglodytes cagtgttcgg
Pan_paniscus cagtgtttgg

These 10 sites are enough to uniquely determine a tree, shown below:


/----------- Homo sapiens(1)
/--60--+ /---- Pan troglodytes(5)
/---96---+ \--96--+---- Pan paniscus(6)
/-------+ \------------------ Gorilla gorilla(2)
| \--------------------------- Pongo pygmaeus(4)
\----------------------------------- Hylobates klossi(3)

The numbers on branches are bootstrap support, which runs from 0 to 100%, and is an index of how strongly in agreement the data are on that particular branch. Eight of these sites fit the tree perfectly; only two (numbers 8 and 9) do not. Of the 10 sites, 9 are silent. All are certainly within the bounds of what Sean thinks natural processes are capable of doing.

So what is Sean's explanation for this nested hierarchy?

Here is another protein-coding region, semenogelin II, which codes for a protein that is a component of seminal fluid. This one is a bit more variable. Of 1746 sites, 134 are variable, and 22 are relevant to a nested hierarchy.

These 22 sites also uniquely determine a tree, here:


/--- Pan troglodytes(1)
/--88--+--- Gorilla gorilla(3)
/--100--+---------- Homo sapiens(2)
/-------+------------------ Pongo pygmaeus(4)
+-------------------------- Hylobates lar(5)

Semenogelin is quite a bit more variable than tas in amino acid sequence: of 22 differences, half are amino acid changes, 10 of which are compatible with the tree and 1 of which is not. The others are silent, and here also 10 fit the tree while 1 does not.

There are some differences between the two trees. Semenogelin lack a bonobo sequence and has a different species of gibbon from tas2r38. More importantly, the relationships among the three African ape genera (Gorilla, Pan, Homo) are different. I'll explain that in a moment. But the real message is that this sequence shows (with that one exception) the same nested hierarchy as tas2r38. Now why should that be?

And in fact I could repeat this exercise a thousand times with different genes, and they would all show the same relationship among species, except that a minority would differ in the arrangement among the three African apes.

Why the ambiguity among Pan, Gorilla, and Homo? That's because the history of a gene doesn't have to be quite the same as the history of the species that contain it. Different forms of the same gene (alleles) can arise within a species, and these differences can be retained through a speciation event, even through two if the time between them is short enough. Eventually the differences sort out so that only one gene lineage has descendants in any given species, but they may sort out differently from gene to gene. The time between the split between Gorilla and the Pan/Homo lineage and the later split between the Pan and Homo lineages is short enough for this difference between gene and species phylogenies to occur. While a majority of genes show the species pattern (Gorilla vs. Pan and Homo), about 20% of genes show each of the other patterns (Pan vs. Gorilla and Homo or Homo vs. Gorilla and Pan, as seen in semenogelin.) I would be interested in Sean's explanation of this phenomenon too.

Note that none of these differnces fall within the range of what Sean considers to require ID as an explanation. No "1000 fairly specified residues" here. In fact the differences among sequences are slight, easily explained by random mutation and natural selection, and the differences among amino acid sequences even less.

So anyway, that's what a nested hierarchy looks like. Note that it's a property of the character data, that is of a rectangular matrix of characteristics x species. In a perfect nested hierarchy, the sets defined by characters would never overlap. Here, we have a bit of homoplasy (convergent evolution) and a bit of gene tree vs. species tree that only slightly clouds the picture. The bootstrap tests show the decidedly hierarchical nature of the data, as does agreement among trees.

So now I invite Sean to do the same thing with some kind of human-generated system: get the character data for the bottom-level elements in that system and show that it has a nested hierarchical pattern in the same way I have done for two primate genes. Remember not to choose characters for the patterns they display; this would be biased. I picked my characters and my genes at random. Their strongly hierarchical structure is a property of the entire genomes of these organisms.

.



Relevant Pages

  • Re: Human Endogenous Retroviruses in the Primate Lineage
    ... I believe you mean "DNA barcoding". ... more matches are sister species in one clade and the new sample is ... this explains why a mitochondrial gene is used ... Plants have too little mitochondrial sequence diversity, ...
    (talk.origins)
  • Re: For Sean Pitman: Some real nested hierarchies
    ... The sequence is 1002 bases long, ... So what is Sean's explanation for this nested hierarchy? ... bonobo sequence and has a different species of gibbon from tas2r38. ... Different forms of the same gene (alleles) ...
    (talk.origins)
  • Re: For Sean Pitman: Some real nested hierarchies
    ... The sequence is 1002 bases long, ... These 10 sites are enough to uniquely determine a tree, ... bonobo sequence and has a different species of gibbon from tas2r38. ... Different forms of the same gene ...
    (talk.origins)
  • Re: Human Endogenous Retroviruses in the Primate Lineage
    ... conserved primers and it's easy to sequence. ... more matches are sister species in one clade and the new sample is ... this explains why a mitochondrial gene is used ... Actually, COI is a bacterial gene; as you know, mitochondria are just ...
    (talk.origins)
  • Re: For Sean Pitman: Some real nested hierarchies
    ... The sequence is 1002 bases long, but is highly conserved; only 40 of them vary among species here, and only 10 are relevant to a nested hierarchy. ... These 10 sites are enough to uniquely determine a tree, ... Semenogelin is quite a bit more variable than tas in amino acid sequence: of 22 differences, half are amino acid changes, 10 of which are compatible with the tree and 1 of which is not. ... That's because the history of a gene doesn't have to be quite the same as the history of the species that contain it. ...
    (talk.origins)