Re: hash table size



"Andy Walker" <anw@xxxxxxxxx> wrote in message
news:fplbdt$ke4$1$8300dec7@xxxxxxxxxxxxxxxxxxx
In article <l5A*pO37r@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
David Richerby <davidr@xxxxxxxxxxxxxxxxxxxxxx> wrote:
In a well-written engine, the optimal hash table size really is `as
big as you can have it without swapping.'

Whoa, it's not *quite* as simple as that.

(a) Even in a well-written engine, you may want/need to "clear
out" the table from time to time. Initialising several terabytes
may take a little while even on a modern machine ....

Actually, current programs don't do that. They leave the hash alone. They
don't even initialize it when the program is first run.

Either they actually use the data from the previous search, or they
invalidate it at the time of access...

There are several ways to invalidate at access...

1) Change all the hash keys so you get a new random value. This isn't a
prefered choice because many people use the hash as a way to quickly check
for repetition.

2) The hash entry contains some sort of counter as part of the 'lock'.

3) At time of access into the hash, XOR in an additional key that changes
for every search.


This way old entries from the previous search (or uninitialized hash memory)
will very very rarely be considered valid.

The odds are so remote that you are more likely to encounter a normal false
hash hit during the search.

And if you actually check the returned move as valid (not everybody does
because it takes time and they feel the odds are acceptible), then the odds
are so remote that you are accessing old data as current data that you can
totally ignore it.

You have a better chance of getting random bit flip errors in the memory or
in the cpu.


As for terabytes... well, not too many people have access to machines with
terabytes of physical memory....


(b) Some of the standard operations on a hash table involve
indexing and finding remainders modulo the table size. These may
be more efficient if the table size is a power of 2, and if the
relevant numbers fit into [in C terms] ints rather than longs.

It is true that doing an AND is faster than a modulo. But the time
difference isn't really all that significant considering how long main
memory is going to take to access. And the savings can greatly outweigh the
few extra cycles a modulo would take.

But yes, many people do prefer a power of two size.


And as for the 'ints' vs. 'longs'... Sorry. That was true in the days of 16
bit compilers, but these days on 32 bit systems, both int & long will be 32
bits. And on a 32 bit system, 32 bits will be enough to access all of
physical memory. (With the exception of certain hacks that Intel did for
some servers to increase memory by paging it.) So there will be no need for
64 bit 'long long' which would involve a more complicated modulo.

And on 64 bit systems (like the powerful chess programs all use and even
many hobbiests use), int's and long's are both going to be 64 bits. (Well,
actually that needs to be qualified a bit, since the C99 standard is a
little vague on how to handle all 8, 16, 32, 64 and 128 bit data sizes that
can be done on a 64 bit system.)

And no 64 bit system is going to have more than 2^64 bytes of memory, so
there's no need to get into 128 bit 'long long'.


(c) If you manage to lock down all, or most, of the storage of
a machine, you may perhaps be unpopular, in a multi-user environment,

Well DUH.

However, on a multi-user environment you are likely to have memory
limitations imposed so you can't use more than your fair share. In fact,
you may only be given a couple meg of physical memory and all the rest of
your data will be paged in & out as needed. Definetly not a good thing for
large hash tables.

If you are on a system without that restriction, then it's probably not a
problem for other reasons, such as verbal agreements between users, etc.
etc.

And if you are on a system without that restriction and you don't have
agreements etc. and you do it anyway and cause problems... (shrug) Tough.
That's your problem when the sysadmin catches you.

with other users of the machine. *Your* program may not be swapping,
but *theirs* are! [Even on a home PC, the other users may be *you*,
trying to run other things, such as the engine that this program is
playing against.]

No.

If you are runing a chess program for a reason (such as a tournament), then
you know not to use it for other things.

If you intend to use it for other things, then just don't tell it to use all
your available memory for the hash. (Although your other programs will
reduce the cpu power available to the chess program, which may cost it a
full ply of search.)

And if you do want to run some other program and a chess program with
massive hash tables at the same time.... (shrug) Get a second computer.
It's not like they are expensive. A second computer can be bought for $400
that's more than good enough for basic usage, web surfing, etc. For just a
few dollars more, you could get a low end laptop instead.



.



Relevant Pages

  • Re: hash table size
    ... talking about chess programming ideas and what they do rather than actually ... you can clear out the memory. ... Just a couple cycles per hash check. ... The amount of physical memory each user gets is likely to be somewhat ...
    (rec.games.chess.computer)
  • Re: Firewall vs. IPS - Differences now (ISS, Intrushield 2.1?)
    ... > You risk running out of memory. ... That's like saying "it's trivial to DoS Aho-Corasic if you know the ... DoS's and improvements via use of the Jenkins hash are most illuminating. ... > replacement policy gives the worst behavior since an attacker can flood ...
    (Focus-IDS)
  • Re: [PATCH] tracing/kmemtrace: normalize the raw tracer event to the unified tracing API
    ... branch tracer needs that too for example. ... SLUB statistics and is only good for detecting allocation hotspots, ... to track would be the memory object itself. ... would be to do a static, limited size hash that could track up to N memory ...
    (Linux-Kernel)
  • Re: Ruby, memory and speed
    ... and then emit a summary on standard input. ... The hash will grow to about 500 000 keys. ... Is there any tuning of GCC so it kicks in less frequently when the memory consumption is large? ... Or could it be that the Hash algorithm chokes when the number of keys gets large? ...
    (comp.lang.ruby)
  • Re: My first chess program
    ... >> of hash tables difficult to get my head around. ... I'd recommend making it search first and ... Writing a chess program that makes half- ... programming links to work with, ...
    (rec.games.chess.computer)

Loading