Re: Implementation of pop count instruction
- From: Derek Gladding <derek-spammenot@xxxxxxxxxxxxx>
- Date: Sun, 25 Sep 2005 20:12:57 -0700
Stephen Fuld wrote:
> I am not a hardware guy, and this certainly isn't a homework problem, so
> please be gentle with me. :-)
>
> I have been thinking about how a pop count instruction would be
> implemented
> in hardware. I have come up with several possibilities, but none of them
> seem "optimal".
>
> One could have a tree of adders - so for a 64 bit register, 32 two bit
> adders feeding 16 four bit adders, etc. This works, but takes six cycles,
> which seems like a lot. (maybe it doesn't matter given how infrequently
> such an instruction is executed.)
>
> One could have a look up ROM, but it would have to be pretty big to save
> any
> time. For example, with a 64K by 5 ROM, one could do it in 4 lookups and
> pipeline all but the last the add, but this still takes five cycles. If
> you could use a dual ported ROM (are there such things?), you could cut it
> down to about 3 cycles.
>
> You could just use a humongo hunk of random logic and get the answer in a
> single cycle, but the amount of logic required seems to be way to large to
> be practical.
>
> I guess if you allowed analog circuitry, you could somehow input all the
> signals into some kind of A to D converter, but that seems to way out.
>
> So, for real CPUs that have this instruction, how is it implemented?
>
For the one that I have personal knowledge of: an initial layer of (4,2)
compressors then some fast adders in a tree as you describe.
32 bits -> 8 x (4,2) compressors -> 4 x 2 bit adders ...
- Derek
.
- References:
- Implementation of pop count instruction
- From: Stephen Fuld
- Implementation of pop count instruction
- Prev by Date: Implementation of pop count instruction
- Next by Date: Re: What do you think of Sun's Niagara
- Previous by thread: Implementation of pop count instruction
- Next by thread: Re: Implementation of pop count instruction
- Index(es):
Relevant Pages
|
|