Re: A truely BIJECTIVE BWT is here!



On Dec 13, 1:56 pm, Mark Nelson <snorkel...@xxxxxxxxx> wrote:
On Dec 13, 8:43 am, Phil Carmody <thefatphil_demun...@xxxxxxxxxxx>
wrote:

I suspect this is impossible. In order to achieve an
improvement in compression, BWT needs to map less-
compressible vectors onto more-compressible vectors.
However, there are fewer of the latter.

I don't think this is quite correct Phil. If you look at the BWT
alone, it is truly a bijective function. It is a one-to-one mapping of
the set of input messages to the set of output messages, for all input
and output.

The output vectors that we get are more compressible ONLY because the
input data has a specific type of correlation - the chances that
symbol X appears after a given sequence of symbols, {a, b, c, ... } is
larger than would be suggested by mere chance. And of course, most
interesting files have this property. Files that aren't interesting
won't have this property, but they will still be transformed by BWT -
into equally uncompressible output vectors.

The reason that I am suspicious of David's claim is just that I don't
see how the transform can be reversible without additional
information.

Of course, I'm reluctant to flag it as impossible. If you were naively
reading the original paper without having ever heard of it before, I
would be very surprised if you were to get through the description of
the transform and then say "Aha! This would be totally reversible if I
just had a pointer to the new position of the last character in the
input sequence!" Well, maybe you would, but I sure didn't, and it
still seems pretty remarkable to me that it works at all.

I haven't even looked at David's code yet but I was hoping he could
give a words-only description of what he did to achieve his claim. But
I'll give David a break, he says his attempts at exposition are rarely
successful, and that's not likely to change overnight. What worries me
is my own personal conviction that there is a bijection between clear
writing in spoken language and clear writing in code.

|
| Mark Nelson -http://marknelson.us
|

I took your oriignal BWT compressor and run the Calgory test files
the
resuts are bleow. The BWTS bijective compress beat it in evry case
which was a surprise to me. It should be worse on the average then
pure
BWT but better on the average when you count the indexing which I feel
is not needed

the original test files

07/15/1991 12:55 PM 111,261 BIB
07/15/1991 12:56 PM 768,771 BOOK1
07/15/1991 12:56 PM 610,856 BOOK2
07/15/1991 12:56 PM 102,400 GEO
07/15/1991 12:56 PM 377,109 NEWS
07/15/1991 12:56 PM 21,504 OBJ1
07/15/1991 12:56 PM 246,814 OBJ2
07/15/1991 12:56 PM 53,161 PAPER1
07/15/1991 12:56 PM 82,199 PAPER2
07/15/1991 12:56 PM 46,526 PAPER3
07/15/1991 12:56 PM 13,286 PAPER4
07/15/1991 12:56 PM 11,954 PAPER5
07/15/1991 12:56 PM 38,105 PAPER6
07/15/1991 12:57 PM 513,216 PIC
07/15/1991 12:57 PM 39,611 PROGC
07/15/1991 12:57 PM 71,646 PROGL
07/15/1991 12:57 PM 49,379 PROGP
07/15/1991 12:57 PM 93,695 TRANS
18 File(s) 3,251,493 bytes

Marks compressed size

12/13/2007 03:05 PM 29,567 bib.m
12/13/2007 03:08 PM 275,831 book1.m
12/13/2007 03:11 PM 186,592 book2.m
12/13/2007 03:12 PM 62,120 geo.m
12/13/2007 03:12 PM 134,174 news.m
12/13/2007 03:12 PM 10,857 obj1.m
12/13/2007 03:12 PM 81,948 obj2.m
12/13/2007 03:13 PM 17,724 paper1.m
12/13/2007 03:13 PM 26,956 paper2.m
12/13/2007 03:13 PM 16,995 paper3.m
12/13/2007 03:13 PM 5,529 paper4.m
12/13/2007 03:13 PM 5,136 paper5.m
12/13/2007 03:13 PM 13,159 paper6.m
12/13/2007 03:15 PM 50,829 pic.m
12/13/2007 03:16 PM 13,312 progc.m
12/13/2007 03:16 PM 16,688 progl.m
12/13/2007 03:16 PM 11,404 progp.m
12/13/2007 03:16 PM 19,301 trans.m
18 File(s) 978,122 bytes


Scotts compressed output replaced BWT with BWTS
and UNBWT with UNBWTS so in index needed

12/13/2007 03:05 PM 29,563 bib.s
12/13/2007 03:08 PM 275,779 book1.s
12/13/2007 03:11 PM 186,555 book2.s
12/13/2007 03:12 PM 62,113 geo.s
12/13/2007 03:12 PM 134,166 news.s
12/13/2007 03:12 PM 10,845 obj1.s
12/13/2007 03:12 PM 81,930 obj2.s
12/13/2007 03:13 PM 17,710 paper1.s
12/13/2007 03:13 PM 26,938 paper2.s
12/13/2007 03:13 PM 16,979 paper3.s
12/13/2007 03:13 PM 5,515 paper4.s
12/13/2007 03:13 PM 5,122 paper5.s
12/13/2007 03:13 PM 13,147 paper6.s
12/13/2007 03:15 PM 50,815 pic.s
12/13/2007 03:16 PM 13,302 progc.s
12/13/2007 03:16 PM 16,682 progl.s
12/13/2007 03:16 PM 11,395 progp.s
12/13/2007 03:16 PM 19,298 trans.s
18 File(s) 977,854 bytes

batch file to create above

...\rle %1. %1.1
...\bwt %1.1 %1.2
...\mtf %1.2 %1.3
...\rle %1.3 %1.4
...\ari %1.4 %1.m
...\unari %1.m %1.4m
...\unrle %1.4m %1.3m
...\unmtf %1.3m %1.2m
...\unbwt %1.2m %1.1m
...\unrle %1.1m %1.mm
...\bwts -b200000 %1.1 %1.5
...\mtf %1.5 %1.6
...\rle %1.6 %1.7
...\ari %1.7 %1.s
...\unari %1.s %1.4s
...\unrle %1.4s %1.3s
...\unmtf %1.3s %1.2s
...\unbwts -b200000 %1.2s %1.1s
...\unrle %1.1s %1.ss


fc /b %1. %1.mm
fc /b %1. %1.ss

there was no errors and in each case the BWTS as a drop in
replacement did better than Marks BWT this even surprsed
me.

Actually the total difference was 263 bytes. That is very close
to space wasted by using the indexes.


David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
.