Re: A truely BIJECTIVE BWT is here!
- From: biject <davvid_a_scott@xxxxxxxxx>
- Date: Mon, 17 Dec 2007 22:07:17 -0800 (PST)
On Dec 17, 6:03 pm, Klaus Stengel <Klaus.Sten...@xxxxxxxxxx> wrote:
.....
In case we have to compare strings of different lengths while sorting,
we need to keep in mind that we're actually comparing infinite cyclic
character sequences. That's why we need to extend the shorter one by
repeating it until a difference occurs or both were repeated at least
once. For example if we want to compare "AXYA" with "AXY" we have to
expand "AXY" by repeating it: "AXYAXY". Now "AXYA" is again shorter
and equals "AXYAXY" in all available characters, so we have to repeat
it too: "AXYAAXYA". Now we see that "AXYAAXYA" is smaller than
"AXYAXY", as the fifth character "A" < "X". In the sorted list "AXYA"
must be positioned before "AXY".
For this paticular for of code the "return 0;" repeated cycles like
|0|0|0| mapped to one cycle |000|
EVERY CYCLE UNIQUE so if you write the compare function
(and I did not yet) to take advantage of this fact you will only
compare till a DIFFERENCE they can never be the same.
I guess it's just a matter of taste if you want to split up repetitive
cycles. If you want to merge longer repetitive cycles, e.g. |FOO|FOO|
you'll need to do a full cycle comparison somewhere to rule out any
non-repetitive elements, which can be expensive. However I've still no
idea why this should guarantee that every cycle will be unique then. Or
did you mean something entirely different? And what would be the
advantage of doing that?
depending on how you parse the "return " statement differece in
the two versions of code. You could break a single cycle that
repedative
intro several cycels.
But other than this one type of case you never get something like
if partituion ...xyz.....xyz....
a set of cycles like ...|xyz|....|xyz|... not vaiid if different
intervening string
like cycles only come from continuous strecthes in
the string. Other than that all cycle are unique. Which means
the compare for things from diffent cycles are unique.
example say you have n cycles you could sort them seperately just
like old bwt and then merge them since they would be in the order of
how they are used. This could make for fast coding.
example to cycles say sort to {ABCD} not I mean anything not literal
and {abcd}
they might merge like {AabBCcdD] or whatever you get the picture
no partition has the first and last character the same
|x.....x| not valid unless all characters the same
no partitopn splts a run of the same character
......xxxx|xxxx... not valid except for the case if you
wsh of treating run of all the same character as seperate partions
one last fact if |xyz| is a cycle then there is no cycle |yzx|
as to the adavantagess of observing this is how one would code the
comprares for speed I did not do this. There are many ways to speed
it up.
The resulting transform is the last character of each cycle in the
sorted list, almost like the ordinary BWT. Saving some index is no
longer required.
--> 0002RBOOFA
I like
0002RBOOFA with its repeat of 000 and OO and no index better than
BWT
200RBO0OFA,6 with its extra "index" and only the 00 repeat great
example
Yes, in this particular case. Try it with "00FOOBAR20" instead of
"FOOBAR2000" as the input and you'll get "020RBO0OFA" as a result. I
guess any improvements in compressibility happen mostly by chance and
it's easy to generate counterexamples by trying different rotations of
the same input.
Yes when one picks the test one gets the results one wants.
The real question in many minds may be how does it work with real life
large files. I suspect on the average it will beat BWT,INDEX but
only time will tell. But one this for sure old BWT,INDEX is not good
enough to get to codes that match good PPM code. And it saves
a lot of coding for some indexing which every one does differently
and if one is doing lots of blocks those repeated indexs can be
a paing to code and handle especaill when this is functionaly
so easy to use. How many programers use qsort of heapsort
they don't need to know how to code it only how to call it.
This will be much easier to use.
Bye,
Klaus.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
.
- Follow-Ups:
- Re: A truely BIJECTIVE BWT is here!
- From: Klaus Stengel
- Re: A truely BIJECTIVE BWT is here!
- References:
- A truely BIJECTIVE BWT is here!
- From: biject
- Re: A truely BIJECTIVE BWT is here!
- From: Phil Carmody
- Re: A truely BIJECTIVE BWT is here!
- From: Mark Nelson
- Re: A truely BIJECTIVE BWT is here!
- From: Phil Carmody
- Re: A truely BIJECTIVE BWT is here!
- From: Klaus Stengel
- Re: A truely BIJECTIVE BWT is here!
- From: biject
- Re: A truely BIJECTIVE BWT is here!
- From: Klaus Stengel
- A truely BIJECTIVE BWT is here!
- Prev by Date: Re: A truely BIJECTIVE BWT is here!
- Next by Date: Re: compression companies
- Previous by thread: Re: A truely BIJECTIVE BWT is here!
- Next by thread: Re: A truely BIJECTIVE BWT is here!
- Index(es):
Relevant Pages
|