Re: Compressing List of Tuples
- From: Josiah Carlson <josiah.carlson@xxxxxxxxxxxxx>
- Date: Thu, 31 May 2007 18:27:35 GMT
bullockbefriending bard wrote:
I could have sworn that permuting the elements in all possible ways
would allow for more potential matching, but through a bit of testing
myself, it doesn't seem to make any difference.
glad to hear you find the same thing. i don't have a CS background and
basically don't even know what i don't know, so your confirmation is a
relief. one day i must stop using the copy of cormen i bought from
amazon as a doorstopper and start at page 1.
CLRS is tough to get into. Goodrich and Tamassia have a much more readable intro to algorithms book, though it isn't nearly as complete.
What *does* make a difference is how the columns are permuted initially,
because as you choose a particular matching, you remove a different set
of choices for the matching. In a quick test among 262,144 6-tuples of
values randomly from 0-15, it was able to reduce it down to 180,475
elements, but randomly permuting the columns and doing it again produced
180,418 elements. Another run got me 180,475 and 180,489 elements.
quite so. based on some earlier suggestions i had constructed a
modified hamming graph linking tuples which differed in only one field
and with the aid of graphviz 'discovered' partial cubes (although
didn't know their name until your recent post) and then eventually
realised that just because i could see structures on the screen didn't
mean that i had a hope in hell of coming up with an algorithm to
extract them :). i'm going to go away and think if there is some smart
way i could at least extract some information from some graph
structure which tells me something about how i should permute columns
to achieve 'optimal' compression.
If you have a reasonable method of going from your tuples to bit strings, using the method of "Approximate Dictionary Queries" by Gerth Stolting Brodal and Leszek Gasieniec from 1996, you can generate all pairs of bit strings that differ by one bit in time proportional to the total number of bits passed.*
So, if you have some time, you can permute the columns in all possible
ways, running each one of them through the algorithm, then
reverse-permute the best result. Or you can randomly permute the
columns a few times, running each through the algorithm, and pick the
best one.
estimate something like 2.8 hours runtime in python to do 720x on
extremal datasets, but may well be fast enough in STL land.
unfortunately the underlying process is time critical and 10-20 mins
would be max acceptable runtime.
So run it for 10 minutes and choose the best result :) That would get you a solid 40 runs or so.
Using a bit of sloppy math related to the harmonic numbers, after about 40 runs, it is probable that after choosing the first, you would have only found better results about 5 or 6 times, and you would only get better results from the remaining 680 possible permutations 5 or 6 times more (basically, the chance of improving a previous result after having tried i permutations already is 1/i, assuming a random permutation as input).
one again thanks for giving me food for thought.
No problem. I'm glad to help.
- Josiah
* Incidentally, I do have a method that does the same thing, is significantly easier to understand and implement, but isn't an online algorithm; you need to give it all of the bit vectors up front. It's not yet published, but if you need this operation and find the Brodal/Gasieneic algorithm to be too difficult to implement, I could likely be convinced to help.
.
- References:
- Compressing List of Tuples
- From: bullockbefriending bard
- Re: Compressing List of Tuples
- From: Josiah Carlson
- Re: Compressing List of Tuples
- From: bullockbefriending bard
- Re: Compressing List of Tuples
- From: Josiah Carlson
- Re: Compressing List of Tuples
- From: bullockbefriending bard
- Compressing List of Tuples
- Prev by Date: Re: jacko
- Next by Date: Shortest way to represent this variable code to symbol mapping?
- Previous by thread: Re: Compressing List of Tuples
- Next by thread: Compressing List of Tuples
- Index(es):
Relevant Pages
|