Re: DRC has announced its newest FPGA that drops into AMD's Socket 940
- From: "JJ" <johnjakson@xxxxxxxxx>
- Date: 28 Apr 2006 20:24:49 -0700
For any processor with no substantial caches, one might assume every
5th opcode is a load or store, for a nice register heavy design, maybe
every 10th opcode. For a classic SDRAM interface the performance will
be very poor. The usual thing to do is to gang up lots of very
expensive Brams into I, D caches which gives up alot of the parallel
bandwidth they each have when used separately. Even still each core now
uses lots of Bram, some cpu logic, an SDRAM controller and a good chunk
of the I/O is gone. That sort of system can be replicated maybe 4 times
depending on I/O count and none of these has any performance to write
home about. But one could put additional algorithmic content next to
each node.
Memory limits and hence I/O pads is the crux of the problem. My
Transputer design uses 1 Bram/PE hence on paper maybe 554 PEs might fit
in the biggest FPGA but that doesn't work. The Lut/Bram useage takes it
down to half that and then assume the MMUs consume the rest of the
fabric in a regular tileing. Still the memory traffic of 250 odd PEs
can't be funneled through maybe only 4 memory interfaces even RLDRAM,
so the PE count either has to come way down and or more of the Brams
have to be used as local caches which gives up alot of their bandwidth
again.
One way around the I/O limit I have been thinking of is to bring the
RLDRAM inside the FPGA. SInce we can't do that, instead replicate the
RLDRAM logical architecture of n concurrent slower banks using up all
remaining BRam aggregating them into cache that can be shared with
multiple PEs at the L1 level. Only when those miss does the L2 RLDRAM
come in to play, so trading down PEs for Bram caches allows more
Transputer nodes to share the few RLDRAM interface.
..( (n*PE + MMU + Bram cache)*k + MMU + RLDRAM interface) *4 or so.
Q
I am curious about how many separate memory channels people have
actually put onto the largest FPGAs, I suspect on the highend for
independant RLDRAM controllers it is around 4 due to specialized use
of the clock resources needed to make the DDR interfaces work. I also
wonder if these serial interface DRAMs have come out yet that would
allow many more memory channels to per FPGA.
John Jakson
transputer guy
.
- References:
- Prev by Date: Re: DRC has announced its newest FPGA that drops into AMD's Socket 940
- Next by Date: Re: Opteron HT coprocessors
- Previous by thread: Re: DRC has announced its newest FPGA that drops into AMD's Socket 940
- Next by thread: Re: DRC has announced its newest FPGA that drops into AMD's Socket 940
- Index(es):
Relevant Pages
|