Re: Xilinx 3s8000?



In article <1147022814.787257.294510@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Peter Alfke <alfke@xxxxxxxxxxxxx> wrote:

Ron wrote:
So to multiply two 704 bit numbers
together (depending upon how it's implemented of course) would require
roughly sixty 64-bit multiplies and a bunch of adds. ...

If I remember right, 704 is 11 times 64, so the multiplication would
take 121 of those 64-bit multipliers, not "roughly sixty"...

It depends on precise details of the implementation, and you have to
write moderately ugly code because the x86 multiply instruction
produces its outputs in fixed registers, but if you apply
Karatsuba-style techniques enough you can get down to below sixty
.... one 12x12 requires six 4x4 + some fiddling, one 4x4 requires three
2x2 + some fiddling, one 2x2 requires four, that's 72, or three +
extra fiddling to make 54.

However, Karatsuba techniques really work much better on FPGAs where
you can do the extra adds in parallel; I don't think you'll gain
anything from the three-level decomposition on Opteron.

As you might imagine, I would be ecstatic to see a few wide
multipliers appearing in FPGAs - a 64x64->128 unit isn't _that_ large
an IP block - but the market for number theorists isn't large enough
to pay for the masks, and RSA accelerators are produced in enough
volume for it to be worth making ASICs.

Tom
.



Relevant Pages

  • Re: Xilinx 3s8000?
    ... together (depending upon how it's implemented of course) would require ... roughly sixty 64-bit multiplies and a bunch of adds. ... write moderately ugly code because the x86 multiply instruction ...
    (comp.arch.fpga)
  • Re: Xilinx 3s8000?
    ... Ron wrote: ... together (depending upon how it's implemented of course) would require ... roughly sixty 64-bit multiplies and a bunch of adds. ...
    (comp.arch.fpga)