Re: FBDIMM vs DIMM - any compatibility?



On May 15, 6:55 am, a...@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl)
wrote:
a...@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl) writes:
Thomas Womack <twom...@xxxxxxxxxxxxxxxxxxxxxx> writes:
I believe you should order 4x1gig FBDIMM for the 2950, it has a
four-channel memory controller and my suspicion is that it will work
measurably faster if all four channels have the same size of memory on
them;

I doubt that (but I am not an expert). My guess is that the channels
are driven separately, or in pairs. Having more DIMMs in an FB-DIMM
channel is slower, so his system might be faster with 2*2GB+2*2*1GB
than with 4*2*1GB, although the performance difference is probably in
the noise for real applications.

Ok, the manual of our Supermicro X7DBE+ board speaks about two
branches (with two banks each), that can be used in an interleaved
way. It recommends putting in four modules at a time for best
performance through interleaving. I interpret correctly that they
mean by interleaving that consecutive 128-bit lines are present in
alternating branches.

Assuming that, I believe, that while this may be helpful for
single-stream workloads, I think that it is probably better for
multi-stream workloads not to use interleaving, because then the two
branches can have different banks open, resulting in twice the number
of open banks, and consequently lower latency when accessing these
banks; also, it means that two streams can be served at the same time,
rather than the second stream having to wait until the first has
finished a cache line (plus various latencies). I believe that is why
AMD is introducing two independent memory controllers on Barcelona.

The manual is not very clear what is required. At one point it says
that "_you must install four modules at a time_", but in the next
sentence it says that this is just for optimum performance, and
further on it says that DIMMs of the same size and type are necessary
for interleaving.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
a...@xxxxxxxxxxxxxxxxxxxxxxxxxx Most things have to be believed to be seenhttp://www.complang.tuwien.ac.at/anton/home.html


1. FBDIMM's are not know for low latency, and adding more DIMM's to
each channel will slow you down to the tune of about 5 ns per DIMM
added - for each and every access.

2. The corollary to the "DIMM depth" issue is that more FBDIMM's in
the channel will give you more bank parallelism. In general, more
banks = fewer access conflicts. Which mean that in a heavily loaded
system, you can more easily get closer to your peak theoretical
bandwidth.

3. Assuming that performance (of generic application) is independent
of memory system capacity, but purely a function of bandwidth and
latency (and money is not an obstacle), then some applications
(typically single threaded, low BW demand app) will perform better
with one DIMM per channel, while others (more memory level parallelism
apps, high BW demand) will perform better with two or more DIMM's per
channel.

4. Since those systems that have FBDIMM's all have multiple CPU's,
these systems are optimized for high BW loads and high capacity loads,
so I'd personally go for the parallelism to get the BW than save the
latency (again, assuming that money is not an obstacle, and capacity
is not an issue) . Obviously, it depends on the workload of interest,
but probably 2 DIMM's deep per channel is likely a happy medium for
this class of systems. Having 4~16 CPU's all banging on only 8 or 16
banks of DRAM arrays will tend to have a high number of array
conflicts, especially since these banks aren't wholly independent.

5. Blackford uses CAS-with-auto-precharge (closed page policy), so no
open banks in Blackford. Ergo, interleaving is always good, no worries
about bank conflicts against open banks. (Ref: "Intel 5000 series:
Dual Processor Chipsets for Servers and Workstations" page 19)

6. One stated advantage of the FBDIMM memory system is that you can
pile on a bunch of DIMM's in a channel, and get all the parallelism
out of those channels. Furthermore, (assuming that you have all DIMM's
populated) the FBDIMM protocol lets you merge bursts from different
DIMM's. So the FBDIMM channels can service multiple request streams
concurrently, much better than a "regular" DDR2 SDRAM memory system.




.