Re: Implementation of pop count instruction
- From: "Dan Koren" <dankoren@xxxxxxxxx>
- Date: Tue, 27 Sep 2005 05:25:35 -0400
"Dan Koren" <dankoren@xxxxxxxxx> wrote in message
news:43390d28$1@xxxxxxxxxxxxxxxx
> "Laurent Desnogues" <l-desnogues_delme@xxxxxx> wrote in message
> news:dhav2r$lrf$1@xxxxxxxxxxxxxxxxxx
>> Dan Koren wrote:
>> [...]
>>>
>>> Take a look at http://aggregate.org/MAGIC and
>>> www.hackersdelight.org for various ideas on
>>> how to implement POPC using bit shifts and
>>> bitwise logical operations. I have recently
>>> coded and compared all the POPC algorithms
>>> presented there. Not surprisingly it turns
>>> out that the fastest one is to do byte by
>>> byte lookups through a 256-entry table,
>>> which can be easily parallelized and
>>> implemented in hardware. Just to get an
>>> idea, a double word (8 byte) POPC takes
>>> 1.67 ns on a 1.2 GHz Ultra SPARC III. I
>>> got similar results on PA-RISC: 2.08 ns
>>> at 900 MHz.
>>
>> You really should take results measured in tight loops with
>> a grain of salt when using table lookups. It often happens
>> that when your code is put in real code (as opposed to benchmark
>> loop code) the result is very different due to cache trashing,
>> even when the table is small.
>
> Hhmmm.... Do you realize the
> entire table is 256 bytes and
> fits completely in one or two
> cache lines?
>
Not to mention that this code is
executed so frequently by the
library routines that rely on
it that it stays in the cache.
When did you last check cache
sizes for large server systems?
The HP system on which I test
my libraries has 3 MB L1 cache
per chip (1.5 MB per core) and
32 MB L2 cache (shared).
dk
.
- Follow-Ups:
- Re: Implementation of pop count instruction
- From: Terje Mathisen
- Re: Implementation of pop count instruction
- From: Nick Maclaren
- Re: Implementation of pop count instruction
- From: Laurent Desnogues
- Re: Implementation of pop count instruction
- References:
- Implementation of pop count instruction
- From: Stephen Fuld
- Re: Implementation of pop count instruction
- From: Laurent Desnogues
- Re: Implementation of pop count instruction
- From: Dan Koren
- Implementation of pop count instruction
- Prev by Date: Re: Implementation of pop count instruction
- Next by Date: Re: Implementation of pop count instruction
- Previous by thread: Re: Implementation of pop count instruction
- Next by thread: Re: Implementation of pop count instruction
- Index(es):
Relevant Pages
|
|