AND THE WINNER IS... [WAS] Re: How to get non-unique elements from an array?



On Wed, 19 Oct 2005, Ryan Leavengood wrote:

On 10/18/05, Ara.T.Howard <Ara.T.Howard@xxxxxxxx> wrote:

a couple of observations: zach's method destroys the array at each iteration and is unique in this respect. to be fair a 'dup' must be added to his approach. you arrays are composed in a quite regular fashion which give the algoritm's that favour linear search a huge advantage - a random distribution of dups makes the task much harder. i implemented changes for the above and now:

Thanks a lot for this Ara, you make some excellent points. I did think about how the linear nature of the test array might skew the results, but I never got around to randomizing them. I also assumed that all the algorithms worked correctly and non-destructively. I considered adding some tests to ensure they all returned the same result, but didn't bother.

also, your concept of 'big' doesn't seem that, er, big - running with
larger numbers is also revealing - i started a process with

I know it wasn't very big, but I figured it would be big enough to show how the algorithms scaled. Plus I'm using my work laptop and have to get some work done, so I can't wait all day while Ruby eats up all my CPU :)

don't worry -- your tax dollars funding my testing ;-)

Like Paul, I've found this thread very enlightening, and it will make me
think twice before using a "cute" solution (though you don't always need the
highest performer all the time.) Though James' solution is pretty cute and
fast, so good job.

Either way this thread was a very good indication of Ruby's TIMTOWTDI
nature.

agreed all the way around. the final (for me) results:

  small:

      harp:~ > ruby a.rb
                      user     system      total        real
      david       5.330000   0.000000   5.330000 (  5.350960)
      jeff        0.370000   0.000000   0.370000 (  0.373027)
      jegII       0.320000   0.010000   0.330000 (  0.324357)
      markvh      0.610000   0.000000   0.610000 (  0.606635)
      martin      0.300000   0.000000   0.300000 (  0.308640)
      paolo       0.540000   0.000000   0.540000 (  0.537689)
      park        0.400000   0.000000   0.400000 (  0.403531)
      paul        0.660000   0.000000   0.660000 (  0.662007)
      robert      0.640000   0.000000   0.640000 (  0.642736)
      samk1       0.410000   0.000000   0.410000 (  0.418733)
      samk2       4.730000   0.000000   4.730000 (  5.043174)
      simonk      1.060000   0.000000   1.060000 (  1.052031)
      simons      1.150000   0.000000   1.150000 (  1.160381)
      zach        0.640000   0.000000   0.640000 (  0.641282)
                      user     system      total        real
      david      12.600000   0.000000  12.600000 ( 12.637510)
      jeff        0.200000   0.000000   0.200000 (  0.203639)
      jegII       0.010000   0.000000   0.010000 (  0.007662)
      markvh      0.880000   0.000000   0.880000 (  0.872928)
      martin      0.010000   0.000000   0.010000 (  0.007404)
      paolo       0.030000   0.000000   0.030000 (  0.028461)
      park        0.010000   0.000000   0.010000 (  0.009939)
      paul        0.020000   0.000000   0.020000 (  0.019959)
      robert      0.010000   0.010000   0.020000 (  0.015718)
      samk1       0.010000   0.000000   0.010000 (  0.009769)
      samk2      10.050000   0.000000  10.050000 ( 10.088609)
      simonk      1.950000   0.000000   1.950000 (  1.949996)
      simons      1.420000   0.000000   1.420000 (  1.415801)
      zach        0.890000   0.000000   0.890000 (  0.890465)


big:

      harp:~ > ruby a.rb 424 42424
                      user     system      total        real
      david     616.620000   0.000000 616.620000 (626.621697)
      jeff       10.610000   0.000000  10.610000 ( 10.635370)
      jegII       3.380000   0.000000   3.380000 (  3.371940)
      markvh     43.970000   0.000000  43.970000 ( 44.051656)
      martin      3.120000   0.000000   3.120000 (  3.122285)
      paolo       7.210000   0.000000   7.210000 (  7.238185)
      park        4.270000   0.000000   4.270000 (  4.291279)
      paul        6.700000   0.000000   6.700000 (  6.729307)
      robert      7.780000   0.000000   7.780000 (  7.779309)
      samk1       4.510000   0.010000   4.520000 (  4.520682)
      samk2     432.290000   0.000000 432.290000 (433.597829)
      simonk     76.810000   0.040000  76.850000 ( 77.561346)
      simons     56.480000   0.000000  56.480000 ( 61.661304)
      zach       42.590000   0.010000  42.600000 ( 42.661407)
                      user     system      total        real
      david     1525.140000   0.010000 1525.150000 (1540.030511)
      jeff       19.840000   0.000000  19.840000 ( 19.871017)
      jegII       0.100000   0.000000   0.100000 (  0.096985)
      markvh    101.610000   0.000000 101.610000 (102.777395)
      martin      0.090000   0.000000   0.090000 (  0.091271)
      paolo      20.650000   0.610000  21.260000 ( 22.851724)
      park        0.130000   0.000000   0.130000 (  0.140809)
      paul        0.330000   0.000000   0.330000 (  0.333704)
      robert      0.210000   0.000000   0.210000 (  0.211370)
      samk1       0.130000   0.000000   0.130000 (  0.125668)
      samk2     1010.770000   0.000000 1010.770000 (1037.171224)
      simonk    178.140000   0.010000 178.150000 (178.632616)
      simons    123.920000   0.000000 123.920000 (124.054895)
      zach       97.430000   0.000000  97.430000 ( 97.559515)


so __martin__ and __jegII__ effectively tie -- the diff is not significant considering my cpu did some other, but not many, things during that time

it's interesting to note their solutions are an order of magnitude better that
the next fastest and __5__ orders of magnitude better than the worst.
defintitely a strong argument to always consider re-working algoritms before
using the c compiler and extconf.rb in anger.

regards.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================



.



Relevant Pages

  • Re: How to get non-unique elements from an array?
    ... to be fair a 'dup' must be added to his approach. ... you arrays are composed in a quite regular fashion which give the algoritm's that favour linear search a huge advantage - a random distribution of dups makes the task much harder. ... (destroys 'a') ... hour ago and it's still not complete so many of the algorithms obviously ...
    (comp.lang.ruby)
  • Re: Performance tuning
    ... implemented heuristics on algorithms to speed things up; ... on arrays - like sorting, but also like filtering, array ... I find it funny that switch uses a dictionary, ... then performance tuning for the sake of performance tuning ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Similarity searching
    ... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... If electricity comes from electrons, ...
    (comp.lang.fortran)
  • Re: problems with dup and clone...
    ... I have tried both dup and clone to copy the default objects but I always ... which reinitalised the instance arrays to and fix'ing the newly ... this program is a log scanner and I generate a class for each ...
    (comp.lang.ruby)
  • Re: Similarity searching
    ... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... Similarity searching is still an active science, but there are a lot of useful algorithms out there now. ...
    (comp.lang.fortran)