Re: A Fast sorting algorithm for almost sorted data



The probability for entry i with occurrences j in list k is j/sum(all
entries in k). If you keep an explicit count (like the last number in
your example sequences), then you merely need to look up the count for
an item and divide it by your count for that sequence. The entire
probability updating and reordering algorithm is quite simple...

counts = [0,0,0,...,0]
count = sum(counts)
for posn in seen:
counts[posn] += 1
new = posn
while new and counts[new-1] < counts[posn]:
new -= 1
if new != posn:
counts[new], counts[posn] = counts[posn], counts[new]


The probability of the symbol in position i is merely counts[i]/count.
That will give you a value between 0 and 1, which you can then combine
as you do now.

I think i missunderstood your initial point... this is exactly what i
am doing now... i was under the impression that you thought by not
using the double value at all is better (i.e. faster) but i could not
see how to do so given that i need to combine the predictions of
different models.

are you saying i should sort each model and then combine them? I will
still need to sort the symbols based on the combined probabilities...


You can speed up the comparison phase with a binary search, or a
doubling followed by a binary search, and using the swapping method
above can get you down to an O(log(po-pn)) bubble mechanism, where po is
the original position and pn is the new position. It can have some
strange behavior...

[5_4, 5_3, 5_2, 5_1]

increment the last one

[5_4, 5_3, 5_2, 6_1]

shift it using the above algorithm...

[6_1, 5_3, 5_2, 5_4]

Note how the 5_4 and 6_1 swapped positions? Yeah.


Nice!, i will have to think on how this is actually working :) i dont
get it just at the moment. but sounds like it should be useful!

Making huffman coding reasonably fast can be difficult. That's one of
the reasons why I implemented a tree-based arithmetic coder. It is
fast, can handle a practically limitless number of entries, and is able
to save some fractions of a bit for each encoded symbol.

I had improved my huffman compressor/decompressor but as you said it
rapidly slowed down once my number of symbols started to get beyond
6000.

You don't necessarily need to use the arithmetic coding, but the
algorithms used to make arithmetic coding faster (see the algorithm
snippet that I included earlier in this message) are also able to
improve upon the algorithm you are currently using.

Thanks for all your input, it really has made me think and see how
little i know :P

Cheers

Nick

.



Relevant Pages

  • "Algorithmic Randomness, Quantum Physics, and Incompleteness"
    ... all finite sequences are to be found infinitely often and ... different members of the infinite set of random numbers. ... Just using random numbers with initial digits 1 thru 9, ... it generated by a shorter input algorithm than the length ...
    (sci.logic)
  • Re: compression - insights into infinite
    ... output sequences, your technology isn't reversible, plain and simple. ... For that, it needs an algorithm. ... Not very convenient for a compressor, ... he notice but he also remembers his promises, and I assure you, you ...
    (comp.compression)
  • Re: compression - insights into infinite
    ... output sequences, your technology isn't reversible, plain and simple. ... For that, it needs an algorithm. ... OUTPUT cell, residue, part A. The problem with this was that some ... Not very convenient for a compressor, ...
    (comp.compression)
  • Re: compression - insights into infinite
    ... output sequences, your technology isn't reversible, plain and simple. ... I'd suggest that maybe I'm measuring something erroneously or ... For that, it needs an algorithm. ... Not very convenient for a compressor, ...
    (comp.compression)
  • Re: Unshuffle algorithm
    ... I have an array of numbers made up of two sequences that ... extract separate arrays made up of these sequences. ... that had been converted to flat images without metadata. ... algorithm, but thanks very much for your interest. ...
    (comp.soft-sys.matlab)