Re: Distance Coding Algorithm



On Oct 25, 5:15 pm, michael <mich...@xxxxxxxxxxxxxxxxxxxxxx> wrote:
On Oct 24, 4:49 pm, Arsène Lupin <deten...@xxxxxxxxx> wrote:



Hi, people!

I'm trying to compile a list of different algorithms that are used on
the second stage of Burrows-Wheeler Transform, to compare which are
more optimal, and for what I've seen Distance Coding is one of the few
algorithms that beat the Move-To-Front transformation. The problem is,
the reference about the subject is too vague, and even knowing that
there's a post of Edgar Binder explaining the algorithm, I just
couldn't come up with an algorithm of just that.

If someone has the algorithm and wanna share, that'd be very helpful.
But ideas of how to implement are also welcome (It'd be good having
the algorithm because we could put it on Wikipedia and everybody who
wants to study the idea could find it easier).

Thanks for all of you for the time,

Arsène

P.S.: I know two other algorithms better than Move-to-Front:
- Inversion Frequencies (which is not always better than MTF -
better for large files)
- Weighted Frequency Count
- ... any other?

Other BWT encoding methods:

M99
M03
QLFC
Double Weighted Frequency Count

- Michael

BBB uses an order 0 context model followed by a chain of adaptive
higher order models and arithmetic coding. This is explained in the
source code comments. http://cs.fit.edu/~mmahoney/compression/#bbb

The high compression ratio on the large text benchmark is due to a
block sorting algorithm that allows blocks to fill 80% of memory. For
smaller files compression is comparable to other BWT compressors.

-- Matt Mahoney


.



Relevant Pages

  • Re: Parallel application ran more slowly than the sequential one?
    ... > Could you tell me what are the reasons to make an parallel application ... Your parallel algorithm may be less efficient than your serial algorithm. ... loading/unloading MPI data structures, sharing files over NFS, process ... Compare the runtime of your parallel program to ...
    (comp.parallel)
  • Re: count of each word occurred
    ... > take each word in order and compare it with the rest of the list ... Another version of this algorithm takes advantage of the fact that it does ... loop over all words in the list ... a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq ...
    (alt.comp.lang.learn.c-cpp)
  • Re: Reading and writing a big file in Ada (GNAT) on Windows XP
    ... common cases (no mapping, single character patterns, and so on). ... block move and compare instructions because they fouled up the pipeline and ... So in this case a better algorithm is probably the way to go (and I ... the source length is>M and the pattern length is>N, ...
    (comp.lang.ada)
  • Re: ArrayList BinarySearch vs Contains
    ... whether a required sorting operation will actually reduce the overall ... performance of an algorithm so that it becomes worse than a linear ... A Linear search does a Fast Compare and a fast iteration to the next ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: how to compare between different training algorithms, to find the best algorithm???
    ... how to compare between different algorithms (such as ... discuss algorithm comparisons. ... If the training set is too large for LM, ... minimize an objective function (say MSE) over the population ...
    (comp.soft-sys.matlab)