Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: anon1@xxxxxxx
- Date: Thu, 15 Dec 2005 20:10:12 -0800
> If these blocks are only a few thousand bases, or even a few hundred
> thousand, then nothing really changes. If they're millions of bases,
> then we get problems.
The average seems to be appx. 15 thousand bases per block based on the
data I found a few days ago (200 thousand blocks total). So we're OK by
your standard. But the 200 thousand breaks between blocks have been
accumulating over time since the bottleneck, so many of those blocks
may be newly appearing as a result of recent splits between old blocks,
and there may be extremely tight linkage between many pairs or groups
of adjacent blocks whereby over any experimental period there are
usually no crossings whatsoever and technique for measuring centiMorgan
distance would show zero instead of the correct very-small measure of
distance. However most of the breaks between blocks may be hot spots
that break many many times over that time span, often enough to produce
measurable centiMorgan measure, and only the not-so-hot spots would
gradually accumulate new first-time breaks over that time span. I
haven't seen those hotness statistics yet.
> The average human gene is about 20kbp (the bulk of that being introns)
> and there's about twice that space on average between genes. So even
> with these blocks, unless they're very big, we should get recombination
> between adjacent genes.
Golly gee, it seems the average size of haplotype blocks is only
slightly smaller than the average size of single genes. With strong
linkage between adjacent blocks, it seems that entire genes are
selected as a unit most of the time, and only rarely different exons
within a gene are recombined, and virtually never would any single exon
ever be split (assuming that by selection pressure it turns out that
the breaks between blocks don't happen within single exons; I haven't
seen that data yet either).
I suspect that our ancestors have evolved a nice tidy mechanism to
reduce the breakage within genes, expecially within exons, while still
allowing whole genes to be recombined as gene-sized units. This
happened sometime after the LCA with birds, since birds don't have that
mechanism. I think it's time we learn about crocodile haplotype blocks,
to check whether they have our adaption or not. Did mammal-like
reptiles break off from crocodiles before or after theropods/birds
broke off from crocodiles? Is there a tree-of-life search engine where
we can enter three or more genus names and get back a tree showing
paths to those specific genera while omitting all the other side
branches?
> http://cnx.rice.edu/content/m11317/latest/
The human genome is 2.91-billion base pairs in length.
OK, that conflicts with the over 3.1 billion bp that I saw elsewhere.
Which Web page is correct per most recent info?
The average size of a human gene is around 27,000bp, with typical
ranges between 20,000 and 50,000bp.
It would be nice to know how the haplotype blocks, of slightly shorter
average length, and the tight clusters of haplotype blocks, probably
somewhat longer average length, align against each other. Are the hot
spots (for recombination) usually between genes, or within genes?
If within genes, in introns, or right inside exons?
Like the human genome, the
mouse genome is large, 2.5Gb, only 14% smaller than the human genome.
Tiny typo there: should read 2.5Gbp (base pairs, not bytes).
Approx. 99% of mouse genes have a
direct, assignable human homologue.
Is that 99% of the total predicted genes, or 99% of the known-function genes?
> >>How do you compare large block changes with point changes on a
> >>fair basis? For example, does a single block of 50 bases that gets
> >>duplicated count the same as 50 separate SNPs, or just 1 or 2 SNPs?)
> Depends on what you are trying to count.
Somebody else quoted a statistics of how much of the difference was
accounted for by SNPs and how much by everything else. I was asking
what counting method *that* particular statistic was based on.
> If you want (for some reason)
> to count simple sequence differences, it might count as 50. If you want
> to count evolutionary events (usually a more useful thing to count),
> then it's one. If you want to count the probability of it having
> happened identically more than once (i.e., homoplasy), that gets to be a
> complicated question.
If I were computing a statistic and then publishing the result, I'd
make it clear what counting method I used. I might even use two
different counting method and show how the result varies with the
counting method. <satire> Fortunately I'm not doing this kind of
research, and trying to publish my results, because journal editors
don't want such counting-method published so they'd refuse my paper
until I deleted the explanation and I'd refuse to delete it, and my
paper would consuently never get published. </satire>
If I were *reviewing* a paper to decide if it was fit to be published,
I would expect the method of calculation to be included, and would
issue a complaint to author if it wasn't present, and leave it to the
publisher whether to agree with my desiderata or not, whether to return
the manuscript to the author for slight change to include that, or
publish as-is with the info not included.
> All those other kinds of diversity count for much fewer evolutionary
> events, or single mutations, than do the SNPs.
Yeah, but I want to know at least to an order of magnitude *how* much
lower they are. For example, if there are actually 10000 SNPs and only
4 block-copies, each of length 50, and no other mutations, then by two
ways of accounting:
- 10000 + 4 = 10004 total, 4/10004 = 3.9984006E-4 = 0.04%
- 10000 + 4*50 = 10000 + 200 = 10200 total, 200/10200 = .01960784314 = 2%
Now if I see a published report saying 2% of genetic difference is due
to block-copies, how do I know the second line above is the true
meaning, rather than the following alternate line that gives the same
result, 200 different block-copies, each of size 50?
- 10000 + 200 = 10200 total, 200/10200 = .01960784314 = 2% published
On the other hand, if they used the top method of accounting, saying
0.04% with no explanation, but I guess they were using the bottom
method, I would conclude this was the truth: only about 1 block-copy
against 120 thousand SNPs:
- 120000 + 1*50 = 120000 + 50 = 120050 total, 50/120050 = 4.1649313E-4 = 0.04%
So you see if they don't say what counting method was used, I can be
off by more than an order of magnitude in understanding what they are
really saying.
> I would guess that there are very few loci without polymorphisms.
It depends on whether "locus" means single base pair, a huge chunk of
DNA that doesn't combine internally. Only about one base pair out of
one thousand have any SNP (3 million SNPs out of 3 billion base pairs).
But on the average each haplotype block has about 13 or 14 SNPs
accumulated since the bottleneck. So depending on your definition of
"locus" not only does every locus have a SNP but most have more than
ten SNPs, or 99.9% of loci don't have even one SNP. The dictionary
definition is pretty much worthless for this purpose:
4. The position that a given gene occupies on a chromosome.
A gene doesn't occupy **a** position, it occupies a whole sequence of
tens of thousands of consecutive positions. Or if you count it
occupying only the exon locations, then still it occupies about a
thousand locations clumped over a span of tens of thousands of base
pairs. Let me see if I can find any better standard definition of
"locus" on Google ...
<http://helios.bto.ed.ac.uk/bto/glossary/lm.htm#l>
locus(Plural loci)
The position of a gene, DNA marker or genetic marker on a
chromosome. See gene locus.
Not any better, sigh.
<http://www.biology-online.org/dictionary/genetic_locus>
genetic locus
(Science: genetics) The position of a gene in a linkage map or on a
chromosome.
Still not any better.
So did you mean "locus" = "the entire span of a gene", or what?
If that's what you meant, I'd agree very few genes without SNPs,
although a majority of such SNPs would be in introns, and majority of
SNPs in exons would be neutral changes due to coding synonyms, and if
there is even one SNP that affects phenotype is anybody's guess until
the specific study is done on that particular gene.
> If the probability of a mutation is in the
> neighborhood of 10^-9 per site, per individual, per generation (and
> that's a reasonable ballpark figure),
Yeah, I think I saw that same ballpark estimate months/years ago too.
> then most sites will experience
> multiple mutations per generation, somewhere within the population. That
> makes total polymorphism pretty much a meaningless number.
As an exact number, yes it changes every day. But to a few significant
digits it's stable for decades. I don't feel like doing the math right
now, all burned out on math, you understand, right?
> And that's
> why mean polymorphism is the figure we count, and why frequency is
> important.
Well each new mutation occurs in only a single individual, and the only
reason most of them nowadays stick around a while is because of
population "explosion" so that each new mutation has a good chance of
growing to several copies within a couple generations, after which it
has very little chance of going extinct so long as the population
continues to double every 1.5 generations as it's been doing lately.
The distribution is somewhat like the Zipf distribution in general
nature, that is lots and lots of low-count allelles, only a few
high-count allelles, yet most of the total count is included in those
very few high-count allelles. The low-count tail is a mix of
this-generation mutations that occur only once, previous generation
mutations that occur once or twice, two-previous generation mutations
that occur twice or three times, etc. Anyway it isn't just the
gross/average frequency that's important, it's also the shape of that
Zipf-like distribution, since the frequency of count=1 polymorphisms is
what determines the per-generation polymorphism-extinction rate but the
rest of the distribution helps maintain the count=1 number alongside
new mutations. But my intuition says the new mutations dominate over
the diffusion from higher counts in maintaining the count=1 number.
> The great majority of all neutral mutations disappear.
In a rapidly growing population, such as humans during the past
century, even moreso as soon as we expand into space, this may not be
true. If each individual sires 5 children (with help of mate of course,
so the expansion factor is 2.5 per generation), each child has a 50%
chance of getting any particular new mutation from that one parent, so
there's only 1/32 chance of the new mutation going extinct already,
5/32 chance of one child with mutation, 10/32 chance of two children
with copies, 10/32 chance of three, 5/32 chance of four, and 1/32
chance all five children get copies. The next generation, with so many
starting copies (appx. 6) on the average, it's very unlikely the new
mutation will disappear, and after that longterm survival of the
neutral mutation is virtually assured. Again I am all burned out on
math (just barely got that binomial distribution of (p+q)**5
calculated) and don't feel like computing the asymptotic chance of the
new mutation *ever* going extinct with neutral drift and 2.5 expansion
of population per generation.
> the frequency of a new mutation is 1/2N, and the probability
> of a neutral mutation eventually becoming fixed is its frequency.
> Therefore a new mutation has a probability of 1/2N of being eventually
> fixed, and a probability of (1 - 1/2N) of becoming extinct. Is that a
> tendency?
That's only with fixed total population size. It hasn't applied to
humans since about a thousand years ago.
..
.
- Follow-Ups:
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- References:
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: Nic
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: anon1
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: Nic
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- From: John Harshman
- Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Prev by Date: Re: Book-able view of ID as speculative science
- Next by Date: Re: Book-able view of ID as speculative science
- Previous by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Next by thread: Re: Part 1 (of 3): What are major aspects of evolutionary theory?
- Index(es):
Relevant Pages
|