Re: Hyperdimensional Sequence/Structural Space - for Howard Hershey



On Jul 4, 7:53 am, hersheyhv <hersh...@xxxxxxxxxxx> wrote:

< snip >

Sure, most of these aren't really viable, but random
mutations don't know that.

Natural selection does. And that means that real structure space does
not include any lengths of sequences that cannot produce *a*
structure, such as a fold or helix (except as a linker between domains
that either interact with each other specifically or between domains
that have independent functions).

The non-viable random mutations take time Howard. They add into the
total increase of average time needed to find the next viable and
beneficial sequence or structure.

And, the ones that are viable cannot be
represented accurately, as far as the true "distance" to their closest
neighbors, on three-dimensions.

In fact, the method described by Choi and Kim does exactly that and
performs it *better* than other measures of structure and *much*
better than sequence homology does. It more accurately places data
points with similar function near each other than any of the other
methods.

The method used by Choi and Kim does not represent the true distances
that exist between the dots they've projected from hyperdimensional
space onto just three dimensions. The actual distances between these
dots are much much larger than what the figures make it appear. It is
kinda like projecting objects that exist in three dimensions onto just
two dimensions. They objects would appear on two dimensions to be
much closer than they actually are as they exist in three-dimensional
space.

Therefore, three-dimensional
representations are not truly reflective of the actual distances that
exist in sequence space between various islands of realized and
potentially beneficial structures or sequences.

You'd also be able to know the truth of this if you did any structural
comparisons or sequence comparisons between systems with greater and
greater minimum structural threshold requirements.

Again, to quote the people who, unlike you, *have* done both structure
and sequence comparisons between systems and looked at the role of
size (chain length) in Fig 4c.

Which, by the way, you claimed had not taken place . . .

"We also notice that the protein chain lengths correlate significantly
(Spearman's rank correlation coefficient r 0.3098,
P 2 10 16) with the ages of CSAs (Fig. 4c). These
observations combined with the assumption that the present-day
proteins represent the entire spectrum of proteins at different
stages of evolution from their respective CSAs, we propose a
scenario for the evolution of protein structural classes: ancestral
proteins of small short secondary structures primarily in three
classes ( [alpha, beta, and alpha + beta] classes) evolve to medium-
sized proteins
of four classes ( [the above 3] and alpha/beta classes) in roughly
similar
proportions, then to larger proteins with a preponderance in alpha/
beta
class, as schematically shown in Fig. 5."

Basically this is an indication of the amount of evolution that is a
consequence of domain addition rather than single nucleotide change.
Large proteins do not evolve by single nucleotide or aa addition.
They evolve by combining pre-existing domains.

You mean that they are assumed to evolve this way - based on homology
studies like this one? Again, this assumption has not been
demonstrated beyond the 1000aa threshold.

Beyond this, is the obvious fact that the larger the protein sequence
in this study, the more character difference separate them from all of
the other beneficial sequences - on all sides. That fact is clearly
demonstrated in this paper and this is the whole point I've been
trying to make!

You'd understand this if you understood anything about the nature of
hyperdimensional sequence and structure space. In order for the
clustering to actually be in one tiny corner of overall space, like
you imagine, the maximum number of structural or sequence differences
between any two proteins, any two at all, would be no more than a
dozen or so. That's not the case and that is clearly not what is
being demonstrated in this paper. The paper demonstrates a maximum
separation between the most distantly separated proteins of over 1800
differences - spanning pretty much the entire diameter of sequence
space - "length" and "breadth".

Sean Pitman
www.DetectingDesign.com


.



Relevant Pages

  • Re: Sean Pitman: definitions wanted
    ... >> in a sequence without a complete loss of function as long as it is done ... This is what Sauer and his team ... Very few proteins that I am aware have ... The real problems come at higher levels. ...
    (talk.origins)
  • Re: The last ancestor of all life
    ... functional assemblages like a flagellar motility system even if all the ... Is it really your position that the proteins used for flagellar ... to be folded in the proper way after translation. ... determined solely by the the sequence of the protein. ...
    (talk.origins)
  • Re: The last ancestor of all life
    ... functional assemblages like a flagellar motility system even if all the ... bacterial cell that takes the individual proteins and assemble them ... determined solely by the the sequence of the protein. ...
    (talk.origins)
  • Re: Hyperdimensional Sequence/Structural Space - for Howard Hershey
    ... better than sequence homology does. ... The method used by Choi and Kim does not represent the true distances ... of them) imply that all the universe of functional proteins that have ... Why do you think clustering requires that *all* the proteins be in one ...
    (talk.origins)
  • Re: The last ancestor of all life
    ... solution- transmembrane proteins don't behave in aqueous solution. ... You can polymerize the flagellar subunits by themselves, ... determined solely by the the sequence of the protein. ... arrangement of the codes in the underlying DNA. ...
    (talk.origins)