Re: Hash tables



Jon Harrop <jon@xxxxxxxxxxxxxxxxx> writes:
I think it is fair to say that microsecond worst-case insertion time is not
a common requirement for a dictionary.

Maybe, I can only comment on the problems that I have to solve.


That is a tautology. My test program is designed solely to empower
programmers to be able to choose when to make the trade-offs themselves
based upon real performance measurements. My books and articles are often
unusually rich in such content because I believe it is very valuable.

Your program only tests two structures, uses a simple key and used a
relatively small number of entries. That does't mean the result
wasn't useful for that test but I don't think it is empowering
programmers to draw any conclusions about hash tables in general from
that test unless it is clear that the test results scale up or there
is a qualifier indicating that they don't.


Perhaps I can also qualify that keys are usually small constant-sized data

That would be a very good qualifier especially if small is quantified.
The simplest fixed size key in any problems I face is 8 bytes (20 for
IPv6) and typically much longer and and can be variable e.g. the from
tag, to tag and branch tag in a SIP call are all variable length
strings and collectively the minimum could be around 10 bytes but in
real calls typically would be upwards of 60 bytes.


Perhaps. Once you've loaded the first bit of your key then you've loaded a
whole cache line though. Furthermore, comparison is likely to terminate
early, particularly if you put the most randomized bytes first.

That's the purpose of inserting the hash first but that's a best case
optimization everthing still has to work consistently for the worst
case where the hacker sees the code and hits you with many packets
that hash to the same small set of values.


You originally said there were up to 8 million live sessions, which is 23
comparisons.

I agree using the lower bound of 10^6 gives a slight advantage to the
tree so let's be fair and use 10^8 and thus 23 comparisons. I don't
think it will change the outcome.


While the trie&hash _may_ win that one, I
for one could not call it either way without writing the code and
testing it since the winner may depend on which is most cache
friendly for the particular cache/size being used.

Agreed but I think these are very unusual requirements.

As noted above, we can only really speak to the problems we have to solve.
My problems may not be your problems or common when averaged over all
possible problems but it is a domain which whose services are being
used utilized as part of this conversation (at least at my end).


[1] The rehash on re-size + additional probes on collision
until/unless one is willing to re-size until the table is the same
size as the trie key in which case it isn't a hash table anymore.

I disagree.

Since you didn't elaborate I assume you don't want to argue the point.
.



Relevant Pages

  • Re: English versus German
    ... /* setup handler list binary searchable array hash table (in ... idea that code SHOULD be literate and readable. ... restriction obviously impacts on the spelling the programmers use. ... English) is alien to C. Sort of a Whorfian effect is going on and the ...
    (sci.lang)
  • Re: Hash order bug?
    ... I think this is just a common mistake among newbie-ish programmers. ... did I realize that the underlying data structure is a hash, ... to retrieve the key array and sort it first, ...
    (comp.lang.ruby)
  • Re: Magic number in Boolean
    ... public int hashCode() { ... Programmers are a superstitious lot. ... Powers of two, of course, are the most magical of all: ... about hash tables and prime numbers. ...
    (comp.lang.java.programmer)
  • Re: hash tables, non-random keys
    ... we are programmers, we don't need credentials... ... to obsessively test things and/or debate over test ... ways to optimize hash tables). ... For the recent part of the key space, ...
    (comp.programming)
  • Re: development process
    ... If that checked out code works well enough you tag ... >> communicate with your programmers or if your programmers do ... >> improving communication and the latter by educating or firing ... > Do you advocate firing people that make mistakes? ...
    (comp.lang.java.programmer)