Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- From: hersheyh <hersheyhv@xxxxxxxxx>
- Date: Thu, 13 Dec 2007 21:17:37 -0800 (PST)
On Dec 13, 2:36 pm, Seanpit <seanpitnos...@naturalselection.
0catch.com> wrote:
On Dec 11, 10:18 am, hersheyh <hershe...@xxxxxxxxx> wrote:
Here is Sean's description of the calculation of "average gap
size"
*****
"Well, first we have to calculate the likely gap size. Using an
average between the calculations of Yockey and Sauer, the ratio of
potential beneficial vs. non-beneficial for 100aa systems is about
1e-40. This creates a ratio for a 1,000aa system of about
1e-40^(1000/100) = 1e-400. So, the average gap size between
potentially beneficial sequences at this level would be about 308
residue differences - i.e., 20^308 = 1e400."
*****
I have often accused Sean of numerology, or essentially of pulling
numbers out of his arse and claiming that they represent things that
they, in fact, do not represent. Take a look at the above calculation
which is the mathematical basis of just about all his sequence
arguments. You might think, unless you *actually* do some thinking,
that Sean is performing some hard-nosed mathematical analysis here
rather than merely manipulating numbers and assigning "scientific-
sounding" names to them. Thus, unless you actually look at what the
numbers mean, you might be fooled into thinking that Sean actually is
calculating "likely gap size" or "average gap size" by doing the above
mathematical manipulations of numbers, which are assumed to be based
on hard evidence, and that he is accurately telling us what the
numbers mean. Nothing could be further from the truth. In fact, I
think that the first person Sean 'fooled' by his mathematical
manipulations was Sean himself. He was fooled because he did not
actually think about what he was doing and let the fact that the
numbers he got seemed to "prove his point" cloud his thinking to the
point that he actually does think that the ratio he calculated was
"potential beneficial vs. non-beneficial", that there is his
exponential relationship between length and the above ratio, and that
he has calculated the "average gap size". Sean probably does think
that his numerology is a "mathematical proof" of the impossibility of
evolution.
Let me summarize the problems I see with the above calculation:
1) The calculations of Yockey and Sauer do NOT give us the ratio of
"potential beneficial vs. non-benecial" sequences for 100aa systems.
They give us an estimate of the number of sequences that have
cytochrome c function and related sequence divided by the total number
of sequences at 100 aa level.
Yockey's estimate is in fact dealing specifically with CytoC
functionality. I've noted this several times on my website myself.
Sauer and Olsen are dealing with lambda repressor functionality, not
CytoC.
Then why did you claim that *these* ratios represented "the ratio of
potential beneficial vs. non-beneficial for 100aa systems" [see your
statement above to determine if I am out-of-context]?
What these and other similar estimates indicate is that a
certain degree of required sequence specificity produces a certain
ratio of sequences in sequence space that could produce some useful
degree of functionality of the type in question.
That is NOT what your claim was. Your claim was that this ratio
represented the ratio of *all* "potentially beneficial" sequences to
non-beneficial sequence space. Don't you read what you wrote? You
did not say that this ratio represents all 100 aa sequences that have
all the properties needed to provide the function of cytochrome c in a
modern context. You did not say that this ratio represents all 100 aa
sequences that have all the properties needed to act as a lambda
repressor against a modern lamda gene site. For a simple example, if
you change a particular aa in cytochrome c, that may change the energy
level of electron that the heme takes up (the heme is the actual
electron-transporting element), thus eliminating the "cytochrome c
function" without changing the ability of the protein to bind to heme
significantly. Is a protein that binds heme, but doesn't function as a
cytochrome c "beneficial"? Only context can say.
No. Each of these ratios is an estimate of the ratio of sequences that
produce *a specific* modern function in a particular modern context to
total sequence space. Surely you can see that there is a difference
between saying "some useful degree of functionality *of the type in
question*" and "all potentially beneficial sequences". Note that your
description in your statement makes no reference at all to "useful
degree of functionality of the type in question". I am not a mind-
reader. You CLAIMED that your ratio represented "potential beneficial
vs non-beneficial" sequences. Even you know that isn't the case.
That alone means that the rest of your calculation is GIGO. I don't
have to go any further. But I did and will again. Because it is so
much fun to point out that, wrt making a meaningful calculation, you
performed three mathematical steps and whiffed with each swing. You
are batting 0 for 3.
Obviously, this doesn't tell us how many potentially beneficial
sequences of all kinds exist in 100aa sequence space. But, what it
does do is illustrate a pattern of an exponentially declining ratio of
beneficial vs. non-beneficial with increasing size and/or specificity
requirements.
Then you agree with me that the ratio you presented is NOT the ratio
you claim to have presented when you said that the ratio was of
"potential beneficial vs. non-beneficial sequences". You, I presume,
will go back and correct your appendix to accurately state that you
have no idea what the ratio of "beneficial sequences vs non-beneficial
sequences" is and the ratio you are presenting makes no claim to
represent such a number. [Again, you don't have to say
"Acknowledgement goes to H. Hershey and his many clones for preventing
me from foolishly claiming that this ratio actually was the ratio of
"beneficial vs. non-beneficial sequences" when you correct this
obvious mistake.] Thus any calculation that derives from this ratio
does not, even if the rest of the calculation were correct, calculate
the "average gap size" since the number you do need is, indeed, an
estimate of potentially beneficial to total sequences, from which you
can calculate the ratio of "potential beneficial to non-beneficial
sequences".
In order to avoid this exponential increase, one would
have to hypothesize that the number of different potentially
beneficial sequences/structures increases at pretty much the same rate
as the size of sequence/structure space increases with each increase
in minimum structural threshold requirements. That notion simply
isn't tenable to anyone who approaches this problem with a remotely
candid mind. It isn't true in any language/information system that we
know of and it isn't true for genetically based information systems
either.
And, to point out the obvious, this demonstrates that you are merely
asserting the *particular* (and highly unlikely) exponential ratio you
use because you do not ACTUALLY have any evidence about what the real
relationship is.
If the number of "potential beneficial" sequences were to grow two-
fold for a ten-fold increase in total sequence space, that would also
result in an "exponential" decrease in the ratio of "potential
beneficial to total" sequences. But it would be much slower than the
ratio you used. You, apparently out of thin air, assume that there is
absolutely NO increase in the number of beneficial sequences with
increases in length. Why you didn't choose to arbitrarily declare
that the number of "potential beneficial" sequences *decreased* with
increasing length I don't know. Why you claim that it is impossible
for the number of "potential beneficial" sequences to increase in a
linear relationship I don't know. I don't know because you have
presented precisely zero evidence to support your particular
exponential math that declares that the there is zero increase in the
number of "potential beneficial sequences" as total length increases.
Zero. Nada. Not a shred.
That's the point. There is a clear pattern of exponentially
increasing non-beneficial sequences relative to each increase in
potentially beneficial sequences with each increase in minimum
structural threshold requirements. This point should be so obvious as
to be beyond argument.
Sorry. Your mathematical claim is quite a bit more specific. Your
claim is that the ratio is exponential *and* that the number of
sequences that have your *specified function* (which we can agree is
NOT "potential beneficial sequences) does not change with increases
(or decreases?) in length.
Mathematically, if we call the
estimated number of sequences that have cytochrome c function C, and T
is the totality of sequence space for 100 aa long peptides, then the
equation that Sean is claiming represents as "potential beneficial vs.
non-beneficial" ratio is actually C/T. T can be mathematically
estimated as 20^le, where le is the length of the sequence, since
there are 20 possible amino acids at each position. Converting to
base 10, this would be 10^le^log20. Log 20 is about 1.3. So the
ratio, R(100), of what Sean calls "potential beneficial vs. non-
beneficial" sequences is really the equation:
R(le) = C/10^le^1.3 = 10^40 when le = 100.
Moreover, this relationship (the ratio of sequences with cytochrome c
function and sequence to all possible 100 aa long sequences) has ONLY
been determined (well, estimated) for the case where le = 100.
That's because 100aa is fairly close to the minimum structural
threshold requirement needed for CytoC. You can't produce a
beneficial degree of CytoC functionality with just 50aa - no matter
how they are arranged relative to each other.
Irrelevant. That is not the point. I wouldn't care if it were 90 or
70 or whatever. The point is that you only have data (such that it
is) for this one point. You cannot determine what the relationship is
between this ratio and increasing length when you only have one data
point. You need at least two data points to decide if the
relationship is linear, or exponential, or something else.
2) Problems with calling the numerator "potential beneficial
sequences". It should, in fact, be obvious to anyone that C, the
number of sequences that have cytochrome c-like function and sequence,
is NOT all possible "potential beneficial sequences". I think even
Sean sees that.
Of course it's not. But, statistically, this point is pretty much
irrelevant to the main issue.
The ratio you present is most certainly NOT irrelevant if you are
going to use this *ratio* to claim that a number derived from this
ratio represents "average gap size" between beneficial sequences.
Again, given that one is looking for
targets that require a minimum of at least 100aa with an equivalent
degree of specificity as that required by CytoC functionality, the
actual ratio of beneficial vs. non-beneficial is not going to be
significantly different from the ratio of CytoC to non-CytoC in
sequence/structure space.
No, Sean. You are looking for the ratio of "potentially beneficial"
sequences to total potential sequences. The ratio of sequences that
perform all the functions involved in what a modern cytochrome c does
in a modern context as a fraction of total sequence space cannot
possibly give you the ratio you need.
I've tried to explain this concept to you before, but I'll try again.
Either you say that there is absolutely no way to get any idea at all
as to the likely ratio of total targets to non-targets (which removes
the scientific basis for your proposed mechanism by the way) or you
try to use the available evidence to get as best as an idea as you
can.
Actually, *my* mechanism says that the ratio of beneficial targets to
non-targets is irrelevant, because what counts is the "minimum actual
gap size" or even the "minimum possible gap size" in some cases. And
that number is idiosyncratic and dependent upon precise local
conditions and genomes. *My* mechanism says that your numerology is
irrelevant. But that is also what your math says, since the numbers
you generate have no relationship to what you claim they represent.
One approach to determining the most likely number of potentially
beneficial targets at a given threshold level is to consider how many
total novel beneficial systems are in existence today in all living
things at a given level.
If you knew that the ratio you presented was bogus, why didn't you
present your "correction", such as it is, right from the get go? I
find more likely a scenario that you are so clueless about what you
are calculating that you actually believed that this ratio was the
ratio of "potential beneficial vs non-beneficial sequences".
This number can be roughly known and it is
not more than a trillion for the 100aa level. Even if it was 10
trillion, it wouldn't make any significant difference. The reason for
this is that 10 trillion uniquely different 100aa systems with the
minimum degree of specificity of CytoC would take up no more than 1 in
1e27 sequences in sequence space. That is still a very tiny fraction
of the available space of just over 1e130 sequences. The potential
targets are still vastly outnumbered by non-beneficial sequences.
My counterpoint would be that *almost* any aa sequence that forms a
(or perhaps a couple) of thermodynamically preferred structures has
the capacity to bind to one or more biologically relevant epitopes.
The benefit or lack thereof of such binding is conditionally
determined and not an inherent feature that is invariant. The problem
with proteins that don't form a stable structure is not that they bind
to or interact with too few biological structures, but that they bind
to too many. And my further point is that most proteins do indeed form
a few thermodynamic minimum structures rather than be completely
structureless.
This ratio only gets exponentially worse with increasing minimums.
Keep the specificity requirement the same and raise the number of
amino acid residues, the ratio drops exponentially. Keep the number
of residues the same and raise the specificity, the ratio drops
exponentially. That's the whole point.
But is it even *possible* to calculate the number of "potential
beneficial sequences"? The answer is 'sort of', but only in a squishy
soft way that allows one to say that "potential beneficial sequences"
are a heck of a lot more frequent than cytochrome c-like sequences.
There might appear to be a "heck of a lot more" beneficial targets
than CytoC targets - - at first approximation. But, when you compare
this "heck of a lot" to the total size of sequence space, it is a tiny
little spec of dust in the bottom of the bucket. It isn't remotely
close, relative to the vast horde of non-beneficial sequences, to
being a "heck of a lot".
You have to starting thinking in relative terms here.
First we have the problem that "beneficial" is not an inherent feature
of a sequence, but a conditional one. Even the cytochrome c sequence
is not 'beneficial' in every situation or organism (think anaerobes).
And "potential" is important, since we cannot assume that a sequence
is useless until or unless we have a context to put it in.
The context is any living thing in any particular environment. Take
your pick. Whatever context you choose will have the same problem.
The ratios are not going to be significantly different regardless of
context. Bringing up the context is therefore irrelevant - a red
herring.
The context is all the living things that have ever existed in all the
environments that have ever existed. We are talking about "total
sequence space", so we have to talk about "total possible benefit
space". And a 300 aa protein that binds heme using only 100 of its
300 aa's would be just as likely to be 'beneficial' as one with only
100 aa's. And, in some contexts, one that binds far more weakly than
modern heme binding proteins do would be 'beneficial' whereas in any
modern cell it would not be. What is 'beneficial' in the context of
natural selection depends on what the competition has (or lacks).
But, if you understand how proteins actually work to produce
'function' in organisms, we can think about this problem in a more
reasoned way than Sean has (i.e., by not assuming that the number of
cytochrome c-like sequences is a good stand-in for all "potential
beneficial sequences"). Proteins 'function' by providing surfaces
with affinity for biologically relevant structures or parts of
biologically relevant structures. Enzymes work because proteins bind
the substrates and products of the reaction less well than the
stressed intermediate. [Which is also why compounds related to
substrates or products but with equal or higher affinity for the
protein surface can be toxins or antibiotics.] Proteins also form
multimers and complex structures because of the affinity of small
patches of aa's on each protein for each other. Which biological
structure an amino acid sequence has affinity for determines its
'function'. Moreover, because most biologically relevant structures
are small, the number of aa's involved in any particular binding
feature is also small. That is, the 'function' of proteins is a
consequence of the binding of epitopes by a small number of aa's.
That, of course, is the reason we can say, for example, that the
sequence of FliG involved in binding to FliF is a particular small
stretch of aa's within FliG, not the entire protein. That is, most
functional proteins have more than one functional surface and, often,
these surfaces can be modified without affecting the binding
properties of other parts of the protein. That is, a functional
protein can typically be thought of as a conglomerate of smaller
sequences that each has a binding surface. Change in one aa often
does not radically change the affinities of other parts of the protein
(or even the one it is involved in).
Again, this is completely irrelevant to the fact that different types
of systems have different minimum structural threshold requirements.
This is true of all language/information systems. For example, it is
true of the English language that a change of most single characters,
taken one at a time, will most likely not completely destroy the
particular intended functionality or meaning of the paragraph
completely.
Will you f**king forget about false analogies with English! Deal with
proteins. And the word "system" has no meaning here. What you seem
to mean is "some enzyme activity that exists in and has co-evolved in
some organism". Unlike English words, proteins do not cease to be
proteins because they change. And the parts of proteins that did not
change often do not lose *all* function, even if the protein no longer
has the original function.
The same thing is true of biosystems. However, there is a limit
beyond which change or removal of characters will completely destroy
the functionality in question. This limit is what I call the
structural threshold. Every type of functional system has such a
limit and this limit is different for different types of systems.
Some systems have greater limits while others have lower limits.
Those systems that require greater limitations are exponentially rarer
in sequence/structure space.
But this knowledge does not tell us how far away a protein with a
related function or with functions that include, say, the heme binding
but not the electron transport functions of cytochrome c is.
This concept is actually quite simple and downright intuitive. It
really isn't some great mystery.
Even if it is "quite simple" and "downright intuitive", that is not
your claim in the quoted mathematics. The claim I am disecting is
whether you can use the ratio of cytochrome c or lamda repressor
sequences that retain sufficient activity in a modern system (which
you misleadingly claim represents "potential beneficial sequences") to
total sequence space in order to calculate "average gap distance". I
am pointing out 1) that the ratio is not what you claim it is. 2)
That the relationship of that ratio to length of the aa sequence is
unevidenced and merely assumed. And 3) that the number you call
"average gap size" isn't. Focus, Sean. More to the point, you have
already admitted that the ratio you presented in mathematical analysis
is not what you claimed it was. That means you need to change what
you claim.
Now, if we recognize that the 'function' of any stretch of aa's in a
protein is binding an epitope that has the potential to be
biologically relevant, we can ask if there are proteins in which there
is a reason to have a stretch of aa's that bind radically different,
but biologically relevant, epitopes. The obvious place to look is the
immunoglobins (but self-sterility alleles might be a good second).
[Actually, because the stretch of aa's in the variable region is long
and we can only detect binding that binds quite tightly, recognizing a
binding requires an even tighter affinity between protein surface and
epitope and typically requires a larger than average epitope.] There
is a stretch of aa's in immunoglobins that vary by mutation and or
other chance mechanisms. The question then is, what fraction of these
randomly generated variable sequences within the immunoglobin molecule
produce completely functionless immunoglobins that cannot bind any
biologically relevant structure? Actually, that cannot be answered,
per se, since most of the *actual* randomly generated immunoglobins
will be functionally useless in any individual's lifetime because the
individual will not come in contact with the epitope recognized by
that variant. But we *can* ask if there are any biologically relevant
epitopes of sufficient size (other than those for self, which actually
do form, but, in general are eliminated) that the immunoglobin system
*cannot* recognize. The answer appears to be "Not many." Antibody
binding even occurs to structures that have never been produced in
nature.
I can't believe you are using the example of immune system evolution.
That's just classic.
While immune system evolution is a real example of evolution in
action, it isn't an example of evolution of higher-level systems where
more than sequence matching to some pre-formed template. The immune
system works in a very similar way to Dawkins's famous "Methinks it is
like a weasel" evolution algorithm. Along comes a foreign antigen
epitope that is typically about 20 residues in size.
And how big is lactose?
So, the total
number of possible antigen epitopes is about 20^20 or
104,857,600,000,000,000,000,000,000 or ~100 trillion trillion. Since
there are trillions of different possible antigen epitopes, how does
one's immune system cope with such a variety of potential enemies?
Well, there are many immune cells produced by the body. In humans, in
particular, about 10^12 lymphocytes are present at any given time.
And fewer than that that have different sequences.
Not
all the T-cells have different Y-shaped receptors, but many of them
do. Chances are that if enough non-self enemies get into the body at
least one of the immune cells will recognize the non-self marker
sequences or "antigens" located on this invader as "foreign" to at
least some useful degree.
Yep. Do note that it is not the entire antibody that binds an
'epitope', but only a relatively short sequence. It is the
distinction between 'epitopes' bound that represents the "functional"
difference between these sequences. And "at least to some useful
degree" is *all* that is needed whether the binding is 'just' binding
or is 'binding' that leads to a catalytic speeding up of a reaction or
represents interaction of subunits in a multimer. Evolution does not
require that one land on an optimal sequence immediately.
The odds that a single T-cell will
recognize a random epitope to at least some useful degree is about 1
in 10^12. So, does this mean it would take a trillion different T-
cells to cover all possible invaders? Well, no. The reason is
because an average cell or foreign invader "bug" has about 10^12
different antigen epitopes. So, on average, a single T-cell will
recognize at least one of the potential antigen epitopes of a foreign
invader.
That's why the immune system is actually likely to recognize all
foreign invaders to at least some degree of usefulness - even at
initial exposure.
The point is that a very, very, very wide range of biologically
relevant epitopes can be recognized and bound to a "useful", but not
necessarily "optimal", degree by a limited number of sequences about
100 aa in length. That is all that evolution needs to do as well.
Like the immune system, after you have the "some degree of
usefulness", generating optimization is simple.
After this point, improved immune system
recognition and defense is simply a matter of random mutations and
improved character matching from one generation of immune cells to the
next. This process is not at all different from what Richard Dawkins
did with his evolution algorithm where each single additional
character match provides the individual with improved reproductive
advantage. This means that that gap between what currently exists as a
starting point and the next closest potentially beneficial sequence is
always only one character change away. Immune system evolution is
therefore predictably rapid and efficient. No big surprise
http://www.detectingdesign.com/immunesystem.html
The problems come when one is trying to demonstrate how novel systems
of function evolve where template matching can't be used - like in the
evolution of high-level systems like flagellar motility.
Flagellar motility is due to epitope binding that is every bit as much
a matter of 'template matching' as the binding of an immunoglobin to a
foreign epitope.
There are no
templates upon which to build such systems where each and ever single
character change will be recognized as more beneficial than the last.
That is merely unsupported assertion.
That's why such systems end up having to cross vast gaps that can only
be traversed by dozens of non-beneficial character changes.
That is unsupported assertion.
Would that be possible if Sean's calculation of the ratio of
"beneficial to non-beneficial sequences" were a good estimate? No way
in hell. A ratio of one "potentially beneficial" sequence to 10^40
"non-beneficial sequences" at the 100 aa level (which is a little
longer than the two variable regions, H and L, together) would mean
that to generate a single *potentially beneficial* immunoglobin that
binds a potential biologically relevant epitope, you would have to
produce 10^40 different mutations in 10^40 different cells. That,
however, is significantly more than the number of cells in your body
(which is about 10^14). In fact, that is most likely more than the
number of cells in all the humans that ever existed. And that,
according to Sean, is what would be needed to generate a *single*
potentially beneficial sequence by random mutation of the variable
region sequence of immunoglobins.
You are confusing different types of functions here that do not have
the same minimum structural threshold requirements. Simply antigen
binding isn't a very complex function.
ALL protein functions involve a protein providing a surface (or
several different surfaces) that interact with some biologically
relevant structure. I see no difference between an antigen binding to
an antigen and a FliG binding to FliF. In fact, the binding of FliG
to FliF involves *fewer* aa residues. Can you name a single protein
that does not accomplish its 'functions' (and the plural is
intentional) by providing a surface (or several, generally independent
surfaces in some cases) that interacts with some biologically relevant
structure?
Simple binding to at least a
useful degree does not require more than 20 or so very loosely
specified amino acid residue positions. The immune system function
and overall antibody functionality requires a greater minimum than
this, but the basic binding function is very simple. In the same way,
basic binding of a protein to a lactose sugar molecule is also very
simple and does not require many residues nor does it require high
sequence specificity. However, if the function in question requires
more than mere binding, as in the lactase function, a great deal more
size and specificity is needed - i.e., at least 380 fairly
specifically arranged residues.
Again, you are trying to compare apples to oranges here. Different
types of functions have different minimum structural requirements.
So you assert. But the comparison with the immune system is not the
main point here, even if I were wrong about my ideas about the *real*
ratio of "potential beneficial vs non-beneficial" sequences. The main
point is that you have made a calculation of "average gap size" that
involves taking a ratio that you assert means something even you admit
that is not what it means. And then assuming, in the absence of
evidence, a specific (and unlikely) relationship between that ratio
and length of the sequence. And then taking a 20th root and simply
calling it "average gap size".
In fact, once you understand the basis behind protein 'function', it
may well be that the ratio of "*potential* beneficial" to "total
sequence space" would be close to one.
Now that is wishful thinking if I ever saw it.
Not at all. If every protein that has a particular sequence also has
a preferred thermodynamic minimum structure (or even a few), that
means that every protein forms specific surfaces that have the
*potential* to bind to *some* biologically relevant molecule. In many
cases, binding a biologically relevant molecule *is* beneficial in and
of itself.
Again, we are talking
specifically about systems that have certain MINIMUM size and
specificity requirements.
I think you are claiming that there is only one "function" in any
enzyme rather than that different parts of a protein have different
subfunctions that, together, generate what you call *the* or
teleological function. You seem to be assuming that function is
something vaguely distributed over the entire structure rather than a
consequence of subfunctions due to specific sites that can be useful
either independently or in fewer combinations. But again, whether or
not I am right or wrong in estimating a much different ratio of
"potential beneficial sequences" to total sequence space, you are
still wrong. Your calculation of "average gap size" is still GIGO.
Just from the fact that the ratio you present is not what you claim it
is. Even before we look at the rest of the calculation.
You are trying to compare systems with
different minimum thresholds to each other when the ratio in question
only concerns systems with the same minimum requirements. Systems
with higher minimum requirements will be linearly farther away from
other systems within the same level as well as from potentially
beneficial targets within the level just above and just below.
Each new protein sequence
resulting from the change of a single aa would still have a structure
that would typically (in the parts that weren't changed) still bind to
particular biologically useful structures. And we know for a fact
that in some cases loss of binding a particular substrate *is* what
causes a change to be *actually* 'beneficial'.
A function gained by loss of pre-existing binding is very easy to
evolve. This is the basis for most forms of antibiotic resistance.
Such evolution happens very commonly and rapidly.
The number of *actual*
beneficial sequences, however, is clearly much smaller than the number
of *potentially* beneficial sequences, but the benefit of a sequence
is highly dependent on *actual* conditions. Again, 'beneficial' is a
conditional state involving the interaction of sequence and
environment, not an inherent state dependent solely on sequence.
Although true, this concept is essentially irrelevant to the problem
at hand - as noted above.
Something clearly does not compute here. Of course, once you
recognize that Sean's ratio does not mean what he claims it means,
"the ratio of potentially beneficial vrs non-beneficial sequences",
you can recognize a major flaw in his argument. The number he uses in
the numerator is irrelevant to the claim that he is making for that
number.
Once you start comparing apples to apples you will see that your
argument doesn't hold water.
Whether I am right or wrong here is irrelevant. My being wrong would
not make your calculation anything other than GIGO. This is not a
dichotomy. We can both be wrong. In your case, that is a certainty,
since we already know that you are calling the ratio you use something
it isn't.
You can't compare systems that have
different size or specificity requirements to each other to obtain the
"ratio". The ratio of interest involves comparing systems with the
same minimum requirements to sequence space at that level.
In short, the numerator value, as an estimate of the number of
"potentially beneficial sequences", is actually a number (an estimate
of the number of sequences 100 aa long that have cytochrome c activity
and sequence similarity) that doesn't have any clear relationship to
the claim. Ergo, GIGO.
The notion that the number of targets vs. the total number of
sequences being 1:1 is what is absolute GIGO.
So any protein that forms a specific (or even several) structure lacks
what property, do you think, that would allow it to interact (as a
surface) with some biologically relevant structure? My claim is that
all that is needed for a protein to have "potential benefit" is for
that protein to interact with some biologically relevant structure
(actually it should also include interact with structures that are not
currently relevant but might be in the future or have been so in the
past). For a protein to have "actual benefit", of course, it must
interact with some biologically relevant structure in the right
context. But if the ratio you want and claim you need to calculate
"average gap size" is the ratio of "potentially beneficial sequences"
to total sequences, then you have to include any and all proteins that
can form a surface that interacts with some biologically relevant
structure.
But, like I said, it really doesn't matter if I am wrong on this. My
being wrong would not make your ratio right.
2) Problems with the denominator.
[snip mostly because we both agree that the numbers *if* you are right
about the relative ratio would not differ significantly]
3) Problems with the extension of the equation to larger sequence
sizes.
Even if you *accurately* describe what Sean's equation *really*
measures: Namely, the ratio of cytochrome c-like sequences divided by
total sequence space or the mathematically similar ratio of cytochrome
c-like sequences per non-cytochrome c-like sequences, this ratio ONLY
holds for the length of 100 aa's. That is because that is the only
data point we have been given.
Not if you define the levels you are considering as having an
equivalent degree of specificity. Again, it doesn't matter what
degree of specificity you choose. If you keep that degree of minimum
specificity the same, and increase the minimum number of required
residues, the ratio of targets vs. non-targets will drop
exponentially.
Sean, however, *claims* (apparently out of thin air) that this ratio
decreases exponentially with increases in total length.
It does if you keep the specificity requirement constant.
More
importantly, he *claims* that the exponential decrease in the ratio
exactly matches the increase in total sequence space.
I never said the match was exact. It is indeed close at higher
levels, but not exact.
That is, for
every ten-fold increase in total sequence space, the ratio of number
of cytochrome-like sequences per total sequence space (or non-
cytochrome c-like sequences) decreases ten-fold.
You don't seem to understand that we are only talking about the
minimum structural threshold requirements for a system. CytoC has a
minimum of about 100aa (more like 80, but that's beside the point).
Other systems with equivalent specificity requirements that ALSO have
greater minimum size requirements, like a few hundred, will be
exponentially rarer than CytoC in their own sequence spaces.
Essentially, this
amounts to a claim that C, the number of cytochrome c-like sequences
that Sean claims represents ALL "potentially beneficial sequences"
doesn't change at ALL with an increase in protein length!
What? I'm not talking about CytoC functionality at higher levels
because CytoC functionality doesn't have a higher minimum than 100aa.
Higher level systems are NOT CytoC systems because CytoC is not a
higher-level system. CytoC is a fairly low-level system that actually
requires a minimum structural threshold of less than 100 fairly
specified residues.
The fact remains that that is what you have done. I didn't say that
the sequences are cytochrome c. I said that the math claims that
however you calculated the ratio, when the length of the protein is
increased, your math declares that the number (that is number in the
numerator) that you called "potential beneficial sequences" (which
actually are indistinguishable from the number of cytochrome c-like
species in your ratio) doesn't change at all.
When asked about this, Sean merely claims that this particular
exponential relationship between the ratio of "potentially beneficial
vrs. non-beneficial sequences [sic]" is "obvious" and anyone who
questions it is brain-dead or a liar. Well, it may be "obvious" to
him. But it sure ain't obvious to me. Especially since no data at
all are presented that demonstrates that this particular exponential
relationship *is* the relationship. Rather than, say, even a
relationship in which the number of cytochrome c-like sequences
increases in lockstep with the increase in length, producing a
constant *ratio* of "potentially beneficial to non-beneficial"
sequences as size increases.
So, basically, in asserting a specific exponential relationship for
which Sean has presented no evidence whatsoever, he generates numbers
that, to say the least, are of questionable validity. Ergo, GIGO.
You don't seem to grasp the concept of the minimum structural
threshold requirement.
It doesn't f**king matter. You are saying, by using the particular
exponential relationship you do, that if the denominator (total
sequence space or the numerically similar non-beneficial sequences)
increases 10-fold, the *ratio* decreases approximately 10-fold. That
requires that the numerator remains unchanged. Mathematically. That
is essentially saying that the number of "potential beneficial
sequences" is fixed and independent of the size of total sequence
space. Mathematically.
You could have said that the number of beneficial sequences only
increases 5-fold for every 10-fold increase in total sequence space.
That would have resulted in an exponential decrease in the ratio, but
a slower one. You could have said that the number of beneficial
sequences decreased 2-fold for every 10-fold increase in total
sequence space (which is a function of length). You could even have
said that the number of beneficial sequences increases 10-fold for
every 10-fold increase in total sequence space (i.e., that the
relationship is actually linear). The fact is that there has been NO
EVIDENCE presented to favor ANY of these relationships. All we have
is your assertion that the particular ratio you chose is the correct
one. And the one you chose mathematically declares that the number of
beneficial sequences does not change with changes in total sequence
space.
The question is, how many potentially
beneficial target functions require at least 100 amino acid residues
with a degree of specificity of 1e-40? What is the answer to that
question? Now, compare this to the question of how many potentially
beneficial target functions require at least 200 amino acid residues
with the same degree of specificity? The answer to this second
question will be exponentially lower than the answer to the first
question. CytoC does not qualify as an answer to the second question
because it's threshold limit is at 100aa, not 200aa. Do you grasp
that concept?
I give a flying f**k whether you think the numerator of the ratio you
determined represents the number of cytochrome c sequences (which, of
course, even you admit that it does), the number of lambda repressor
sequences, or the number of potentially beneficial sequences. The
mathematical relationship you describe when you do your exponential
increase is one which mathematically says that whatever you called the
numerator in your original ratio, that number remains constant
regardless of the increase in the denominator. That is, if for every
ten fold increase in the size of the denominator, the ratio decreases
ten-fold, the only way that can happen is if the numerator, regardless
of what you call it, remains unchanged.
Until you do, there is no point dealing with the rest of your "points"
Just one more point. Namely that your determination of "average gap
size" is no such thing.
- because they are all related to this basic misunderstanding of yours
regarding minimum structural threshold requirements. You just don't
seem to understand this concept.
And you don't seem to understand how math works in science. Namely
little details like not calling a ratio the ratio of "potential
beneficial vs. non-beneficial sequences" when you have no way of
measuring or even estimating that ratio. Little things like not
knowing the mathematical consequence of claiming a very specific
exponential relationship with increasing length and the need for
supporting evidence to make that claim.
Your calculation of "average gap size" is bullshit numerology, Sean.
It was bull shit numerology the moment that you said that the ratio
you declared was the ratio of "potential beneficial vs non-beneficial
sequences" was not really that at all. The fact that the next two
steps are also bullshit is merely adding to the stink adhering to the
bottom lines of these calculations.
Personally, I think you know you cannot answer the last comment
either.
Sean Pitmanwww.DetectingDesign.com
4) What does his "average gap size" *really* mean?
Let us *assume*, purely for the sake of argument only, that Sean's
numbers actually measure what he claims and assume also, purely for
the sake of argument, that Sean is correct in his "exponential*
increase. I think that I have demonstrated that this is hardly likely
to be true, but let's go deeper into the land of unreality and
mathematical mumbo-jumbo. Let's assume that what Sean has generated
is indeed a ratio not too far from the actual values of "beneficial to
non-beneficial sequences". That means that the reciprocal of that
ratio tells you how many non-beneficial sequences exist per single
beneficial sequence. Sean then takes the twentieth root of this
number and claims, ta-da, that the result is the "average gap size"
between beneficial sequences. But is it, really? I think not. What
Sean has is a new population size (the number of sequences of a given
size that do not have beneficial function per sequence that does).
This population size is, of course, smaller than the total sequence
space for a protein of length le. So what the 20th root of that
population size *really* means is that, *if we had a total sequence
space of that size* and were to reverse the calculation of total
sequence space, 20^le, we could find out how many le for a sequence
would produce a total sequence space of that size. Sean *claims* that
this number (the number of aa's needed to produce a total sequence
space of the size determined by the number of non-functional sequences
per functional ones) is "average gap size". Am I missing something?
I see no reason at all to think that such a number represents "average
gap size". Ergo, GIGO.
So to summarize, Sean has calculated a number he calls "average gap
size" by calling a ratio something it isn't, that makes unlikely
assertions about the relationship of this ratio to changes in length,
and then comes up with a number that doesn't seem to be "average gap
size" at all. IOW, a GIGO ratio with an assumed GIGO relationship
with increasing length which is used to produce a GIGO number.
<sarcasm on>This, of course, represents the very best of creation
science. That is because the numbers produced are the ones that a
creationist wants. It doesn't matter how these numbers are derived
nor what they mean. <sarcasm off>
.
- Follow-Ups:
- Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- From: _Arthur
- Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- From: Seanpit
- Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- From: hersheyh
- Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- References:
- Howard Hershey's Challenge of Sean Pitman's Assumptions
- From: Seanpit
- Howard Hershey's Challenge of Sean Pitman's Assumptions
- Prev by Date: Re: increase in information represented by DNA
- Next by Date: Re: Relative Dating
- Previous by thread: Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- Next by thread: Re: Howard Hershey's Challenge of Sean Pitman's Assumptions
- Index(es):
Relevant Pages
|