Re: The Revised Pitman CSI Formula



On Jul 21, 7:46 am, "R. Baldwin" <res0k...@xxxxxxxxxxxxxxxxxxxx>
wrote:
"Seanpit" <seanpitnos...@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:1185016739.746464.227510@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I tried putting in other values for N. The larger N gets,
the closer that ratio gets to unity.

Yeah, you're right . . .

So, I'll have to revise my formula as follows:

For binary sequences CSI =

X^n - (n! / (n-hd)! hd!)

For example:

X = 2
n = 10

Sequence space = 2^10 = 1024

HD0 = 1 seq ( CSI = 1023 )
HD1 = 10 seq ( CSI = 1014 )
HD2 = 45 seq ( CSI = 979 )
HD3 = 120 seq ( CSI = 904 )
HD4 = 210 seq ( CSI = 814 )
HD5 = 252 seq ( CSI = 772 )
HD6 = 210 seq ( CSI = 814 )
HD7 = 120 seq ( CSI = 904 )
HD8 = 45 seq ( CSI = 979 )
HD9 = 10 seq ( CSI = 1014 )
HD10= 1 seq ( CSI = 1023 )

X = number of possible characters per position
n = size of the sequence
hd = Hamming Distance

Do you intend to use this same formula for non-binary sequences?

Since this metric has a different output characteristic than your last one,
what is your new hypothesis about what it does?

The hypothesis is still the same and has always been the same. As a
string gets larger it occupies a much larger sequence space - a space
that represents my definition of "complexity" in this context. The
total number of strings that are within 1 step (HD = 1) of this
reference string is smaller than those that are within 2 steps, etc.,
up to the average HD in the case of binary strings (as illustrated
above). These differences or the ratios between HD1, 2, 3 . . .
become more and more pronounced with increasing sequence size - as do
the CSI values. This means that the greater the CSI value the more
reliable the hypothesis of non-random bias of the reference and the
compared strings.

That's the hypothesis anyway. I just needed to find a formula that
would actually represent it. My original formula was an attempt to
represent this hypothesis, but the second part of the formula (- X^hd)
wasn't really a representation of the number of sequences at a given
HD distance in sequence space as I had intended. The calculation of
this number seems to require the use of factorials.

Therefore, the modified formula X^n - (n! / (n-hd)! hd!) seems to be
more in line with my hypothesis. It can also be modified to use other
values for X, but I don't have the time right now to work that part
out. Perhaps later - or perhaps you can offer a suggestion in this
regard.

Sean Pitman
www.DetectingDesign.com

.



Relevant Pages

  • Merry Christmas!
    ... dictionaries and every variant possibility has a separate "word" entry. ... The byte string of the "word", whose length is specified by a four ... match is found for a source byte sequence in the dictionary. ...
    (rec.arts.sf.written)
  • Merry Christmas! Linux RULES! New applications to develop!
    ... dictionaries and every variant possibility has a separate "word" ... Each entry in the dictionary contains: ... The byte string of the "word", whose length is specified by a four ... addresses whose entry is selected by the first byte of the sequence. ...
    (comp.os.linux.misc)
  • Re: user defined function that converts string to float
    ... > I need user defined function that converts string to float in c. ... initial, possibly empty, sequence of white-space characters (as ... point character, then an optional exponent part as defined in ... then a nonempty sequence of hexadecimal digits ...
    (comp.lang.c)
  • Re: Coin tossing guessing strategy...
    ... You need a precise string in exactly one order, ... with respect to sequences of n flips, is to note that if some sequence ... In fact, assuming a fair coin, all 10 coin sequences are equally ... heads and tails. ...
    (sci.math)
  • Re: Coin tossing guessing strategy...
    ... You need a precise string in exactly one order, ... with respect to sequences of n flips, is to note that if some sequence ... In fact, assuming a fair coin, all 10 coin sequences are equally ... heads and tails. ...
    (sci.math)

Loading