AND THE WINNER IS... [WAS] Re: How to get non-unique elements from an array?
On Wed, 19 Oct 2005, Ryan Leavengood wrote:
On 10/18/05, Ara.T.Howard <Ara.T.Howard@xxxxxxxx> wrote:
a couple of observations: zach's method destroys the array at each
iteration and is unique in this respect. to be fair a 'dup' must be added
to his approach. you arrays are composed in a quite regular fashion which
give the algoritm's that favour linear search a huge advantage - a random
distribution of dups makes the task much harder. i implemented changes for
the above and now:
Thanks a lot for this Ara, you make some excellent points. I did think about
how the linear nature of the test array might skew the results, but I never
got around to randomizing them. I also assumed that all the algorithms
worked correctly and non-destructively. I considered adding some tests to
ensure they all returned the same result, but didn't bother.
also, your concept of 'big' doesn't seem that, er, big - running with
larger numbers is also revealing - i started a process with
I know it wasn't very big, but I figured it would be big enough to show how
the algorithms scaled. Plus I'm using my work laptop and have to get some
work done, so I can't wait all day while Ruby eats up all my CPU :)
don't worry -- your tax dollars funding my testing ;-)
Like Paul, I've found this thread very enlightening, and it will make me
think twice before using a "cute" solution (though you don't always need the
highest performer all the time.) Though James' solution is pretty cute and
fast, so good job.
Either way this thread was a very good indication of Ruby's TIMTOWTDI
nature.
agreed all the way around. the final (for me) results:
small:
harp:~ > ruby a.rb
user system total real
david 5.330000 0.000000 5.330000 ( 5.350960)
jeff 0.370000 0.000000 0.370000 ( 0.373027)
jegII 0.320000 0.010000 0.330000 ( 0.324357)
markvh 0.610000 0.000000 0.610000 ( 0.606635)
martin 0.300000 0.000000 0.300000 ( 0.308640)
paolo 0.540000 0.000000 0.540000 ( 0.537689)
park 0.400000 0.000000 0.400000 ( 0.403531)
paul 0.660000 0.000000 0.660000 ( 0.662007)
robert 0.640000 0.000000 0.640000 ( 0.642736)
samk1 0.410000 0.000000 0.410000 ( 0.418733)
samk2 4.730000 0.000000 4.730000 ( 5.043174)
simonk 1.060000 0.000000 1.060000 ( 1.052031)
simons 1.150000 0.000000 1.150000 ( 1.160381)
zach 0.640000 0.000000 0.640000 ( 0.641282)
user system total real
david 12.600000 0.000000 12.600000 ( 12.637510)
jeff 0.200000 0.000000 0.200000 ( 0.203639)
jegII 0.010000 0.000000 0.010000 ( 0.007662)
markvh 0.880000 0.000000 0.880000 ( 0.872928)
martin 0.010000 0.000000 0.010000 ( 0.007404)
paolo 0.030000 0.000000 0.030000 ( 0.028461)
park 0.010000 0.000000 0.010000 ( 0.009939)
paul 0.020000 0.000000 0.020000 ( 0.019959)
robert 0.010000 0.010000 0.020000 ( 0.015718)
samk1 0.010000 0.000000 0.010000 ( 0.009769)
samk2 10.050000 0.000000 10.050000 ( 10.088609)
simonk 1.950000 0.000000 1.950000 ( 1.949996)
simons 1.420000 0.000000 1.420000 ( 1.415801)
zach 0.890000 0.000000 0.890000 ( 0.890465)
big:
harp:~ > ruby a.rb 424 42424
user system total real
david 616.620000 0.000000 616.620000 (626.621697)
jeff 10.610000 0.000000 10.610000 ( 10.635370)
jegII 3.380000 0.000000 3.380000 ( 3.371940)
markvh 43.970000 0.000000 43.970000 ( 44.051656)
martin 3.120000 0.000000 3.120000 ( 3.122285)
paolo 7.210000 0.000000 7.210000 ( 7.238185)
park 4.270000 0.000000 4.270000 ( 4.291279)
paul 6.700000 0.000000 6.700000 ( 6.729307)
robert 7.780000 0.000000 7.780000 ( 7.779309)
samk1 4.510000 0.010000 4.520000 ( 4.520682)
samk2 432.290000 0.000000 432.290000 (433.597829)
simonk 76.810000 0.040000 76.850000 ( 77.561346)
simons 56.480000 0.000000 56.480000 ( 61.661304)
zach 42.590000 0.010000 42.600000 ( 42.661407)
user system total real
david 1525.140000 0.010000 1525.150000 (1540.030511)
jeff 19.840000 0.000000 19.840000 ( 19.871017)
jegII 0.100000 0.000000 0.100000 ( 0.096985)
markvh 101.610000 0.000000 101.610000 (102.777395)
martin 0.090000 0.000000 0.090000 ( 0.091271)
paolo 20.650000 0.610000 21.260000 ( 22.851724)
park 0.130000 0.000000 0.130000 ( 0.140809)
paul 0.330000 0.000000 0.330000 ( 0.333704)
robert 0.210000 0.000000 0.210000 ( 0.211370)
samk1 0.130000 0.000000 0.130000 ( 0.125668)
samk2 1010.770000 0.000000 1010.770000 (1037.171224)
simonk 178.140000 0.010000 178.150000 (178.632616)
simons 123.920000 0.000000 123.920000 (124.054895)
zach 97.430000 0.000000 97.430000 ( 97.559515)
so __martin__ and __jegII__ effectively tie -- the diff is not significant
considering my cpu did some other, but not many, things during that time
it's interesting to note their solutions are an order of magnitude better that
the next fastest and __5__ orders of magnitude better than the worst.
defintitely a strong argument to always consider re-working algoritms before
using the c compiler and extconf.rb in anger.
regards.
-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
.
Relevant Pages
- Re: How to get non-unique elements from an array?
... to be fair a 'dup' must be added to his approach. ... you arrays are composed in a quite regular fashion which give the algoritm's that favour linear search a huge advantage - a random distribution of dups makes the task much harder. ... (destroys 'a') ... hour ago and it's still not complete so many of the algorithms obviously ... (comp.lang.ruby) - Re: Performance tuning
... implemented heuristics on algorithms to speed things up; ... on arrays - like sorting, but also like filtering, array ... I find it funny that switch uses a dictionary, ... then performance tuning for the sake of performance tuning ... (microsoft.public.dotnet.languages.csharp) - Re: Similarity searching
... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... If electricity comes from electrons, ... (comp.lang.fortran) - Re: problems with dup and clone...
... I have tried both dup and clone to copy the default objects but I always ... which reinitalised the instance arrays to and fix'ing the newly ... this program is a log scanner and I generate a class for each ... (comp.lang.ruby) - Re: Similarity searching
... a factor of N*log, if the arrays are big you won't in general have to compare to the last element so I don't see the factor of L**2, and I don't see where that C**2 factor is coming from at all. ... There are also sorting algorithms which don't compare each item to every other item, yet still generate a sorted result. ... Similarity searching is still an active science, but there are a lot of useful algorithms out there now. ... (comp.lang.fortran) |
|