Re: Quicker finding strings? Alternative for array, hash, set?



On Mon, Feb 2, 2009 at 12:00 PM, Patrick Put <patrick.put@xxxxxxxxx> wrote:
I've been searching for this information already, but cannot really find
the answer I'm looking for.

I have to find values in a list of some 40,000 strings. Putting all
these in an array is not likely to be the quickest way.... and it isn't.
A set seems to be faster already. Maybe using a map where I only use the
keys is also faster. But I'm wondering: is there no such thing as a tree
to store and retrieve items in a fast way?

If I remember well, for n elements in a list it should take log(n) steps
then, where it takes n/2 just going through an array if I only want to
pick up the element. In my situation it is much closer to n because most
of the times the value is NOT in the list.

Any ideas?

If you are only concerned with lookup time, a hash has amortized O(1)
lookup time:

require 'benchmark'

words = ("aaaa".."zzzz").to_a # => 456,976 strings

hash = {}
words.each {|w| hash[w] = true}

TIMES = 10
Benchmark.bmbm do |x|
x.report("Array#find existing") do
TIMES.times do
words.find {|w| w == "mmmm"}
end
end
x.report("Array#find non-existing") do
TIMES.times do
words.find {|w| w == "mmmmm"}
end
end
x.report("Hash#[] existing") do
TIMES.times do
hash["mmmm"]
end
end
x.report("Hash#[] non-existing") do
TIMES.times do
hash["mmmmm"]
end
end
end



jesus@jesus-laptop:~/personal/programacion/ruby/Benchmarks/lib$ ruby
hash_array2.rb
Rehearsal -----------------------------------------------------------
Array#find existing 3.160000 0.700000 3.860000 ( 3.870061)
Array#find non-existing 6.280000 1.630000 7.910000 ( 7.935304)
Hash#[] existing 0.000000 0.000000 0.000000 ( 0.000019)
Hash#[] non-existing 0.000000 0.000000 0.000000 ( 0.000018)
------------------------------------------------- total: 11.770000sec

user system total real
Array#find existing 3.130000 0.710000 3.840000 ( 3.866618)
Array#find non-existing 5.820000 1.580000 7.400000 ( 9.195417)
Hash#[] existing 0.000000 0.000000 0.000000 ( 0.000030)
Hash#[] non-existing 0.000000 0.000000 0.000000 ( 0.000032)


Jesus.

.



Relevant Pages

  • Re: Idea for ECMA/C# Standard - compile time hash for performance
    ... especially allocating the whole array. ... The key is I don't want to burn a huge amount of memory and I want to incur ... the minimal amount of lookup time possible. ... > more, why not just use an array directly, even if it's a sparse array? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Matching times and classes
    ... Microsoft Excel MVP ... Did you enter the formula as an array? ... time *and* if the end time is greater than or equal to the lookup time. ... Enter this array formula** in Daily Sch B2: ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Matching times and classes
    ... Did you enter the formula as an array? ... type in a regular formula you hit the ENTER key. ... time *and* if the end time is greater than or equal to the lookup time. ... Enter this array formula** in Daily Sch B2: ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Matching times and classes
    ... Did you enter the formula as an array? ... type in a regular formula you hit the ENTER key. ... time *and* if the end time is greater than or equal to the lookup time. ... Enter this array formula** in Daily Sch B2: ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Traversing a hash with array refs as keys?
    ... Hash keys can only be strings. ... So is your key an array or an array reference? ... As mentioned above keys must be strings. ...
    (comp.lang.perl.misc)

Loading