Byte difference pre-processing, any advantage?



Let's say I was going to try experimenting to see if doing byte math
differences on strings, to see if there was any advantage to be
gained.

Anyone know if this will help or not?

I kind of assume that, seeing as byte-difference isn't really used to
help create more patterns, that this technique doesn't really help!
But then I've not seen anyone saying it doesn't.

Basically, let's say you have a string like this:

ABCDEFE

You could store the first byte plain: 65
The next byte would be: 1
And all the rest would be: 1
The last byte would be: -1 though, or perhaps 255, using a numeric
wrap-around algorithm.

The nice thing about this pre-processing is that it's very easy and
quick to perform.

I'm wondering though, would it help increase the number of patterns
available, in the same manner that BWT does?

For what it's worth, a similar lossless concept called BOCU is used to
compress UTF-8 strings. This one actually DOES decrease the size of
the result, though :)

.



Relevant Pages

  • Re: optimizing memory utilization
    ... >found within each of the 10,000 strings. ... This finds repeating and overlapping patterns, ... >data as lists of lists. ... track of which patterns are alive as you go from character to character. ...
    (comp.lang.python)
  • Re: playing without visual patterns and fingerings
    ... strings) and would indeed be A Dorian. ... spanning notes from the fifth to the eighth frets on all six strings) also ... Substitute/add the minor Pentatonic patterns that are contained within the ...
    (rec.music.makers.guitar.jazz)
  • Re: playing without visual patterns and fingerings
    ... > strings) and would indeed be A Dorian. ... > spanning notes from the fifth to the eighth frets on all six strings) also ... > Substitute/add the minor Pentatonic patterns that are contained within the ...
    (rec.music.makers.guitar.jazz)
  • Re: optimizing memory utilization
    ... Is the latter the 10k file to be searched for matches to patterns from the former two? ... The file name would be the top level element for a given XML node ... Before that, though, are you going to modify the strings for consistent ... module can make arrays of numbers very efficient. ...
    (comp.lang.python)
  • Re: Efficient String Lookup?
    ... Each wildcard is independent of other wildcards. ... So is the set of patterns static and you want to find which pattern ... How many patterns vs inputs strings? ...
    (comp.lang.python)