Re: Compressing List of Tuples



Once again thanks for pointing me towards this very useful thread.
I've taken the trouble to modify your 3-tuple example to more closely
suit my 6-tuple problem.

You will see that my modified method (perhaps naively) only performs
one rotation through the 6 columns as it does the greedy merge. I'm
afraid I don't really understand your reference to permutations. Well,
I *do* know that there are 6 ways to order a the elements of a 3-tuple
and 720 ways to order the elements of a 6-tuple... but I'm not really
clear on why this should be relevant given that the innermost loop in
your example compares 5 elements of two consecutive tuples in order to
see if a merge condition exists in element 6 - I just can't see why
one would need to care about order of taking these?

There must be something I'm just not getting here. My modified method
achieves 'reasonable' (>20x) compression on a file with 140K+ tuples
(needless to say, these are not random and there is obviously some
hint of underlying structure). However, assuming I *have* somehow
missed part of the point of your algorithm, I'd really appreciate it
if you could point it out to me, as I certainly wouldn't mind
sacrificing longer run time for greater compression.

(In addition, as far as I can tell, your 3-tuple code and my above
method in 3-tuple form, perform identically on input data sets... )

Many thanks,

Peter

Following are your version from the thread you pointed me to + what
seems to work ~OK for me.

def foo(lolol):
#lolol for a list of lists of lists
for i in xrange(2):
for j in xrange(3):
lolol.sort()
k = 1
while k < len(lolol):
if lolol[k-1][:2] == lolol[k][:2]:
lolol[k-1][2].extend(lolol.pop(k)[2])
lolol[k-1][2].sort()
else:
k += 1

#rotate!
for k in lolol:
k.append(k.pop(0))

#reverse!
for k in lolol:
k.reverse()

return lolol


def combination_compression_algorithm_(combinations):
"""Greedily compress all combinations on one field, then next field,
etc.

Parameter combinations is structured so:

[
[[1],[2],[3],[4],[5],[6]],
[[1],[2],[3],[4],[5],[7]],
]

Given the above input, returned combinations should be:

[
[[1],[2],[3],[4],[5],[6,7]]
]
"""
for do_it_six_times in xrange(6):
# we want everthing to be in lexically ascending order of
# combinations: [[1],[2],[3],[4],[5],[6]] < [[1],[2],[3],[4],[5],
[7]]
# < [[1],[2],[3],[5],[14],[14]]
combinations.sort()
k = 1
while k < len(combinations):
# compare fields 0-4 of two consecutive combinations, e.g.
# [[1],[2],[3],[4],[5],[6]], [[1],[2],[3],[4],[5],[7]]
if combinations[k-1][:5] == combinations[k][:5]:
# [[1],[2],[3],[4],[5]] == [[1],[2],[3],[4],[5]] =>
# we append the elements of combinations[k][5]
# to the equivalent sublist in the preceding combination, we
# can we can get rid of combinations[k] and
# combinations[k-1] becomes [[1],[2],[3],[4],[5],[6,7]]
# surprisingly combinations.pop(k) doesn't seem to result
# in slow run times for lists of 100K+ combinations.
combinations[k-1][5].extend(combinations.pop(k)[5])
# following line seems not necessary since top of outer loop
# combinations.sort() puts everything into lexically ascending
# order such that above line will always maintain correct
# element order in combinations[k-1][5]
#combinations[k-1][5].sort()
else:
# greedy algorithm. we keep merging into combinations[k-1][5]
# until we run out of matches. only then do we increment k.
k += 1

# rotate combinations left so that (e.g.)
# combination [[1],[2],[3],[4],[5],[6]] => [[2],[3],[4],[5],[6],[1]]
# the outer loop has us do this 6 times, so that upon termination
# of outer loop, applied the combination[5] merge of the inner loop
# to each of the 6 elements of combination[] and each combination is
# returned to original starting order of elements.
for combi in combinations:
combi.append(combi.pop(0))

return combinations






A *very* similar question to this was posted a couple weeks ago...

http://groups.google.com/group/comp.compression/browse_thread/thread/...

The algorithm I provided in that thread relies on certain features of
the permutations of 3 elements to merge groupings of 3-tuples. If you
were to iterate over the 720 permutations of your 6-tuples, using the
(very simple) matching algorithm I provided, you may be able to get
fairly decent results.

- Josiah


.



Relevant Pages

  • Re: compressing tuples in scheme
    ... I need to write a function merge-tups which given lolol produces this: ... Except for the small matter of actually writing merge-tups in Scheme! ... online help) figured out how to do rotations, mappings over lists and ... in 8-10x compression when dealing with large files of 6-tuples. ...
    (comp.lang.scheme)
  • Re: Maximum char array size?
    ... Maybe I've chosen the wrong algorithm. ... byte-arrays 255 bytes or less in a binary file. ... Beginning of outer loop ... read in next 64 kbytes of file into buffer, ...
    (comp.lang.c)
  • compressing tuples in scheme
    ... I'm interested in merging lists of 3-tuples ... I need to write a function merge-tups which given lolol produces this: ... Except for the small matter of actually writing merge-tups in Scheme! ...
    (comp.lang.scheme)
  • Re: Norvigs use of APPPLY for MATRIX-TRANSPOSE puzzled me!
    ... your outer LOOP only DOES something, ... to create a lists of lists. ... (NIL NIL NIL NIL) ... How can I make that inner loop pass the constructed ...
    (comp.lang.lisp)
  • Re: dictionary containing a list
    ... of the list, not the pointer). ... has an outer loop and an inner loop, maybe (as I think another poster ... it's the catenation of all the correct lists, ...
    (comp.lang.python)