Re: comp algorithms
- From: "cr88192" <cr88192@xxxxxxxxxxxxxxxxxx>
- Date: Sun, 3 Sep 2006 07:22:16 +1000
"Bill Cunningham" <nospam@xxxxxxxxx> wrote in message
news:5MfKg.212$I71.178@xxxxxxxxxxx
block
a C example would be long...
I am saying a array, eg, that takes 4 times as much space as the input
(if each offset is 32 bits).
eg:
foobar
is 6 bytes.
initially, one starts with an array holding the offset of each byte:
0 1 2 3 4 5
as such:
0=foobar
1=oobarf
2=obarfo
3=barfoo
4=arfoob
5=rfooba
sorting by the associated string values gives the order:
4=arfoob
3=barfoo
0=foobar
2=obarfo
1=oobarf
5=rfooba
so, now your array has these offsets:
4 3 0 2 1 5
Ok any particular reason why this order? Or is it arbitrary? OR am I
missing something?
sorting.
note that, just above, the strings have been sorted alphabetically.
this is a very important part of BWT (and is often the slowest as well).
note that in a previous implementation, I had implemented a variation of
quicksort. but, other possibilities exist as well.
one potentially faster idea, as usually it seems most of the sorting work is
done in the first few chars, would be to do sorting initially as:
you scan along, and count all the occurances of the first char of each
string;
you step along, finding the initial position of each char in the output (via
adding the counts of all prev chars);
now one can step along, adding each string to the current position
associated with that string, and incrementing the position.
eg, intput:
byte *buf; //block buffer
int *ibuf; //unsorted int buffer
int *tbuf; //temp buffer
int cnt; //count for items in ibuf
temp:
int cnts[256];
int pos[256];
for(i=0; i<256; i++)cnts[i]=0;
for(i=0; i<cnt; i++)cnts[buf[ibuf[i]]]++;
j=0;
for(i=0; i<256; i++) { pos[i]=j; j+=cnt[i]; }
for(i=0; i<cnt; i++)tbuf[pos[ibuf[i]]++]=ibuf[i];
for(i=0; i<cnt; i++)ibuf[i]=tbuf[i];
for(i=0; i<256; i++)
sort(buf, ibuf+pos[i], tbuf+pos[i], cnts[i]);
.
- Follow-Ups:
- Re: comp algorithms
- From: cr88192
- Re: comp algorithms
- References:
- comp algorithms
- From: Bill Cunningham
- Re: comp algorithms
- From: cr88192
- Re: comp algorithms
- From: Bill Cunningham
- Re: comp algorithms
- From: cr88192
- Re: comp algorithms
- From: Bill Cunningham
- Re: comp algorithms
- From: cr88192
- Re: comp algorithms
- From: Bill Cunningham
- comp algorithms
- Prev by Date: Re: Software solves NYT crossword puzzle
- Next by Date: Re: comp algorithms
- Previous by thread: Re: comp algorithms
- Next by thread: Re: comp algorithms
- Index(es):
Relevant Pages
|