Re: Sean PItman and nested hierarchy



Charles Brenner wrote:
On Mar 5, 1:47 pm, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 5, 10:52 am, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 4, 6:18 pm, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 4, 12:40 pm, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 3, 6:59 pm, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 3, 8:27 am, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Mar 2, 8:21 pm, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
Charles Brenner wrote:
On Feb 26, 9:36 am, John Harshman <jharshman.diespam...@xxxxxxxxxxx>
wrote:
[I thought I'd start a new thread since Sean isn't replying in the old
one. ...
[the following seems to me to justify a big snip]
Let's at least agree on what we're talking about.
We're talking about a likelihood ratio framework. Data of life or
fossils is observed and there are two hypotheses about how it came
about --
Hn = common descent from natural processes
Hg = God created life
No. That's not what the two hypotheses are, unless you're stating them
badly. The two hypotheses are 1) common descent and 2) separate
creation.
Ok. I accept the correction.
The salient and interesting fact about the data is that it indicates a
nested hierarchy (NS). NS is a more or less inevitable consequence of
Hn, but only one possible consequence of Hg. So the data supports Hn
over Hg, but by a lot or by a little? If the NS property of the data
is improbable under Hg, then the data is strong evidence supporting Hn
over Hg.
In order to get a handle on how improbable NS is as a consequence of
Hg, you posit the universe of all possible data states and imagine
them to be equally likely. That's quite unlike what one does in real
life (and most of my work is related to the theory or practice of
likelihoods of DNA data), but I've agreed to play along with your
point of view -- though of course I argue that the "equally likely"
provision doesn't make sense (mathematically, let alone
scientifically). However, if God created life then the patterns in the
data correspond to what God did -- the scheme that God chose. Are we
agreed so far?
No, unless by "god created life" you mean that there is no common
descent, which is a confusing way to say it.
That's what I should have meant. (Speaking of confusing, my editor
apologizes for introducing the abbreviation NS for "nested
hierarchy".)
And we're specifically talking about the patterns in character data,
i.e. similarities and differences among species. These in turn imply
connections among the species, or perhaps lack thereof. So I have
simplified by reducing the possible patterns to different assemblies of
connections among species.
There is an infinitude of possible data sets.
Is this true? Only if individual data sets can increase without bound.
Can they?
I don't see why not. Also, if you allow that the data can include
continuous measurements then there are infinitely many different
possible data sets even of bounded size.
I was thinking of the data sets as genomes.
It's natural to want to
bin the data to cope with it. In your case part of the motive for
doing so is to have a finite set, because for a finite set at least
the concept of "uniform probability distribution" is well-defined.
However, lifting this distribution from the finite set back to the
real data does not impose a uniform distribution on the real data --
just some arbitrary distribution that is tailored for the purpose of
your argument.
In short, what you call a "simplification" is a way of being sneaky.
Not intentionally so.
(It's also very bizarre. Unless I quite misunderstand, a few of the
graphs are trees representing natural hierarchies, which might
actually come about as a consequence of some God-scheme we could think
of.
I don't understand what you mean by saying it's bizarre. Can you explain?
My remarks in parentheses just above and below were supposedly the
explanation. To expand slightly: Since the vast majority of the graphs
are ones that have zero probability (in my opinion), they are just
stuffing. By positing equal probability (assuming Hg) for each graph
you thereby artificially exaggerate the apparent rareness of (the
handful of graphs representing) nested hierarchies as consequences of
Hg.
What are your criteria for considering some graphs to have zero probability?
I'll bet some graphs are literally impossible. The graphs are produced
by an algorithm (to be accurate one or another favored heuristic) that
tries to fit the genomic data into a pattern. Regardless of whether
there are any limitations or preferences on the model of creation --
on "God" -- can phylogenetic analysis really ever produce graph
including a loop? (But I already gave a weaker form of this argument
in my previous response, below, so perhaps I have not understood your
question.)
Yes, it can, depending on the algorithm used. Now it's true that all the
most commonly used algorithms will always produce a tree (often a
single, fully bifurcating one) regardless of data input, but there are
others that will produce loops or unconnected points. See, for example,
John Alroy's program CTA. And there are of course loops in reality --
hybrid species -- for which there are programs. And there is also
Splitstree, which produces a big set of loops. Trust me. Programs are
written to fit the data, not the other way around. If the data suggested
some weird graph, there would be a program to analyze it.
I think we've been a little imprecise about nomenclature. I assume the
graphs we have in mind are in fact directed graphs -- they don't
merely say that A and B seem to be descended one from the other, but
specifically in which direction.
No. That's not necessary. Phylogenies are directed graphs. But I was
considering the more general case.

When I said "loop" I had in mind a
circle of inheritance, as opposed to a set of edges like A->B->Z and A-
C->Z (which I've heard called "articulation". Is that a usual word
for it?)
Is it possible you mean "reticulation"? But anyway, that's exactly what
I thought you mean by "loop".

which could arise from hybridization. A devious creator could
I suppose create data that mildly suggests a circle: the junk DNA of B
looks like it descended from A in being substantially like a mutated
and partly broken-up and re-arranged version of A, C looks like a
descendant of B, ... and A looks like a descendant of Z. A->B->C->...-
A. Does John Alroy's program cater to this? Would it ever produce the
result A->B->A?
No. It produces, if I recall, undirected graphs (i.e. unrooted trees).

I have an alternative (rather abstract, tedious to try to write down)
argument in mind involving consideration of a multiplicity of binning
schemes that we might devise, of which your graph idea would be only a
non-distinguished example. But it may not be of interest, especially
if the answer above is satisfactory.
Sadly, it isn't.
My abstract argument comes down in effect to saying, why would John
Astor's program be so special?
Alroy. I have no idea what you meant by that.

Alroy. I mean special in that among all the arbitrary classification
schemes you could devise, why would this one produce bins (the various
graphs which are the possible outcomes of the program) of uniform
probability when applied to genomic data that is a consequence of Hg?

I think that instead of focusing on the algorithms, which after all must adapt to the data, you should focus on the data.

Let's imagine that my creationist adversary (C.A., whom you are ably
representing) won't buy any of my logical or intuitive attempts to
argue that some (directed) graphs (nearly) cannot occur as
representations of the genomic data.
Hey, how come I have to be the creationist?

Ok, I advocate for serial killers, I guess I can advocate for a
creationist just for today.

Now, for any particular creation-
scheme (G), some kinds of graphs are quite likely and others are not.
However, when I try to say that certain graphs -- tangled loops for
example -- are unlikely from any G whatever, C.A. raises the objection
that I'm imposing a limitation on G. After all, for all we know G's
method involves intentionally designing the junk DNA just so as to
create a tangled loop in the graph. (We're assuming for the sake of
argument that such is mathematically possible.)
I don't understand what a tangled loop is, unless you mean to imply a
directed graph with a circle in it, A>B>C>A, for example. It seems to me
that would be possible, as long as different parts of the genome were
compared for each piece of the loop.

By "tangled loop" I had in mind a directed graph whole mess of circles
that are connected, reminiscent of a tangled fishing line.

OK, so why would that be unlikely?

Very well. From the C.A. perspective God or whatever chooses a
separate-creation-scheme G and there is some probability distribution
(unknown to you, me, or C.A.) across all possible choices for G.
Depending on which G is chosen, the various bins (=choices of graph)
that CTA can produce are more or less likely to occur. In other words,
the probability distribution on the bins of CTA is a projection from
the probability distribution of G's. Is that probability distribution
approximately uniform?
The algorithm CTA is man-made, e.g. by John Alroy. While it may be a
natural-seeming way to arrange the data from the perspective of
evolutionary biologists with their typical interests, in the sense
that that's an arbitrary interest, CTA is an arbitrary method of
arrangement. Instead of CTA, we could use some different binning
algorithm including algorithms which make no pretense of doing a
similar thing. Anything that computes some bin number based on the
entirety of the genomic data (perhaps providing that it tends to
compute about the same bin number from a substantial subset of the
genomic data) will do.
Let me stop you here. I would contend that there are methods of
determining which binning methods fit the data better. Nested-hierarchy
data demand to be dealt with by a nested-hierarchy algorithm, for example.

If you're suggesting that CTA is "natural" rather than arbitrary
because it's designed for nested-hierarchy situations, that would be
irrelevant because we're assuming the hypothesis of Hg (separate
creation).

Again you confuse the method of creation with the pattern made by the data. Separate creation can produce any pattern whatsoever, including a nested hierarchy. If the data show a nested hierarchy, an algorithm that produces nested hierarchies is appropriate. (CTA was brought up precisely because it's capable of producing other sorts of graphs, by the way, including reticulations and disconnected sub-graphs.)

Each such algorithm defines a projection from
the probability distribution of G onto a set of numbered bins. Since
the bins are man-made and arbitrary, no matter what the probability
distribution on G may be, it can't tend to be uniform when projected
by an arbitrary algorithm. One particular algorithm, by chance, the
probability distribution for the various bins may be uniform, but not
in general across algorithms. (For example two algorithms could differ
mainly in that one of them collapses half the bins of the other into a
single bin. The probability distribution of the bins cannot
simultaneously be uniform for both algorithms.) Therefore, it would be
far-fetched to suppose that the bins of some particular binning scheme
such as CTA would have a uniform distribution.
You have lost me entirely. Who are you arguing with again? About what?

Fair point. To briefly summarize:

You raised a likelihood argument for common descent. The data shows a
nested hierarchy. Two hypotheses are Hn common descent and Hg separate
creation. Under Hn such data would be expected. Anyone including my
creationist buddy would agree. Under Hg it would be very surprising.
Therefore the data strongly supports Hn. I'm with you so far but is
the creationist?

To seal the deal you made an argument that since "god" (the agent
under Hg) can do anything, the possibilities for the data have a flat
probability distribution -- hence the probability of the particular
outcome of nested hierarchy is negligible. I objected that "flat
probability distribution" is meaningless pseudo-mathematics applied to
most infinite sets, but you clarified that the set you had in mind are
the possible outcomes of a program like CTA -- graphs -- of which
there are only finitely many. So equal probability for all the graphs
is at least mathematically possible.

Nonetheless I still object that "can do anything" is not tantamount to
"equal probability of anything" and in particular -- I am quite
willing to attribute limitations to god -- I think that most graphs
are inherently implausible. For example, what kind of a cockamammie
separate creation scheme would produce genomic data that leads to a
lot of tangled loops?

What's cockamamie about that? Is it more cockamamie than a nested hierarchy without common descent? Some might consider it an elegant and complex pattern, therefore aesthetically appealing.

However, my buddy C.A. who is squeamish about
making assumptions about god has to phrase the argument a bit
differently. Rather than making the positive statement that your
graphs are not of equal probability, C.A. would simply say that
there's no reason to believe that they are.

Indeed, to believe that they are is implicitly to impute quite a lot
in terms of the tendencies of god. Whatever god's possible schemes,
they tend to produce certain patterns of genomic data which in turn
CTA maps (better word than "projects" which I used before) to various
graphs. Hence the probability distribution of the various graphs can
be mapped inversely back to tendencies of god to choose types of
schemes. Moreover, any different program than CTA (and no reason it
should have even a vaguely similar intent -- we are assuming Hg here
remember) would map back to a totally different set of tendencies for
god's possible schemes. Hence my question, what's so special about
CTA?

Nothing, and I made no claim that there was. Algorithms are attempts to deal with features of the data. It's the data we should be talking about, and asking what they imply. Getting the structure out of the data is a problem in inference, for which algorithms are tentative solutions. Now in practice we seldom have this problem because the real data are better characterized than the unknown number of possibilities that god could have produced. Data are mostly a nested hierarchy with a bit of homoplasy, differential sorting of gene trees, and horizontal transfer thrown in.

No reason, therefore, that C.A. should be the least tempted to
accept that the graphs are equal probability outcomes assuming Hg just
because you say so, nor in particular that a nested hierarchy graph is
improbable.

Of course, personally I agree with your conclusion. I just don't buy
the fake mathematics part of the argument. To refute C.A. I think you
have to make specific arguments to specific objections.

How would C.A. attempt to make sense of the phylogenetic data? Would he merely throw up his hands and disclaim all ability to learn anything about the creator's process? That seems to be the alternative.

.


Quantcast