Re: comp algorithms




"Bill Cunningham" <nospam@xxxxxxxxx> wrote in message
news:5MfKg.212$I71.178@xxxxxxxxxxx


a C example would be long...


I am saying a array, eg, that takes 4 times as much space as the input
block
(if each offset is 32 bits).


eg:

foobar
is 6 bytes.

initially, one starts with an array holding the offset of each byte:
0 1 2 3 4 5

as such:
0=foobar
1=oobarf
2=obarfo
3=barfoo
4=arfoob
5=rfooba

sorting by the associated string values gives the order:
4=arfoob
3=barfoo
0=foobar
2=obarfo
1=oobarf
5=rfooba

so, now your array has these offsets:
4 3 0 2 1 5

Ok any particular reason why this order? Or is it arbitrary? OR am I
missing something?


sorting.

note that, just above, the strings have been sorted alphabetically.
this is a very important part of BWT (and is often the slowest as well).


note that in a previous implementation, I had implemented a variation of
quicksort. but, other possibilities exist as well.

one potentially faster idea, as usually it seems most of the sorting work is
done in the first few chars, would be to do sorting initially as:
you scan along, and count all the occurances of the first char of each
string;
you step along, finding the initial position of each char in the output (via
adding the counts of all prev chars);
now one can step along, adding each string to the current position
associated with that string, and incrementing the position.


eg, intput:
byte *buf; //block buffer
int *ibuf; //unsorted int buffer
int *tbuf; //temp buffer
int cnt; //count for items in ibuf

temp:
int cnts[256];
int pos[256];

for(i=0; i<256; i++)cnts[i]=0;
for(i=0; i<cnt; i++)cnts[buf[ibuf[i]]]++;

j=0;
for(i=0; i<256; i++) { pos[i]=j; j+=cnt[i]; }

for(i=0; i<cnt; i++)tbuf[pos[ibuf[i]]++]=ibuf[i];
for(i=0; i<cnt; i++)ibuf[i]=tbuf[i];

for(i=0; i<256; i++)
sort(buf, ibuf+pos[i], tbuf+pos[i], cnts[i]);


.



Relevant Pages

  • Re: FASTEST way to try all strings (a until ZZZZZZZZZZZZZZZZZZZZZZZZ)
    ... > It will be a very huge table so I in my opinion. ... > When it would be used, than it should be converted to a string, however ... >> How would an array of Byte be any faster then an array of Char? ... >> array of Byte is needed, however the OP suggested Chars (A to Z, a to z ...
    (microsoft.public.dotnet.languages.vb)
  • Re: FASTEST way to try all strings (a until ZZZZZZZZZZZZZZZZZZZZZZZZ)
    ... > It will be a very huge table so I in my opinion. ... > When it would be used, than it should be converted to a string, however ... >> How would an array of Byte be any faster then an array of Char? ... >> array of Byte is needed, however the OP suggested Chars (A to Z, a to z ...
    (microsoft.public.dotnet.general)
  • Re: communication via gpib-drivers in c#
    ... > is there a way converting this decimal type with the ascii-code in it into ... > string or an array of chars? ...
    (microsoft.public.dotnet.languages.csharp)
  • communication via gpib-drivers in c#
    ... is there a way converting this decimal type with the ascii-code in it into a ... string or an array of chars? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Comparing Two Files line by line and word by word
    ... Sort them, ... out at sorting if you sort yourself or do it after sorting. ... an array of strings where every string is a word, ... array, and generating a string from it containing all the words, ...
    (comp.lang.c)