Re: ECC (Was: building audio pc)




"Peter A. Stoll" <Lyn2Stoll_spamdel@xxxxxxxxx> wrote in message
news:Xns995EDABB2F26DHaifa10Kulim07Michel@xxxxxxxxxxxxxxxxx
"Soundhaspriority" <nowhere@xxxxxxxxxxx> wrote in
news:Ar-dndLAyPc7WxjbnZ2dnUVZ_oqmnZ2d@xxxxxxxxxxxx:

"Peter A. Stoll" <Lyn2Stoll_spamdel@xxxxxxxxx> wrote in message
news:Xns995EA1C08822DHaifa10Kulim07Michel@xxxxxxxxxxxxxxxxx
Laurence Payne <lpayne1NOSPAM@dslDOTpipexDOTcom> wrote in
news:atg883dohglenclp4ike5vek6465o662jc@xxxxxxx:

Peter, how do you feel about machines without ECC ram? I'm seriously
allergic to it. See http://news.com.com/8301-10784_3-9721344-7.html
The ram makers claim that bit flips have decreased with every process
generation, but there is physical basis for increased rate as the
geometry shrinks.

Well, you can't know unless you know how conservatively they are
designing the cell capacitance in each generation, and how cold the
packaging they've managed to arrange.

When they went from 130nm to 90nm, and 2.5v to 1.8v, with DDR-->DDR2, you're
saying they could have neverthless preserved the total stored charge? It's
not just a matter of capacitance, but voltage.


The dominant DRAM bit-error problem in healthy parts I know about comes
from alpha particle hits. Oddly enough, most of the offending alphas
come from the packaging materials themselves, so the vendor in principle
can have a pretty good idea of the likely rate.

Alpha was pretty well eliminated, but cosmic rays cannot be controlled. When
you go up to 6,000 and above (Doris County, CO, for example) the cosmic ray
flux increases drastically, and my laptop (coincidentally?) crashed.


The key design parameter affecting sensitivity to alpha hits (capture
cross section if you like the terminology) is: how much false charge
added to the cell does it take to flip its value? As the actual false
charge obtained from an interacting alpha is a fairly well-known
distribution, if you know the voltage margin you can choose what
capacitance just meets your design failure rate goal. In the early 1980s
the magic value was about 50 femtofarads, but that will have changed with
such features as epitaxial silicon (tends to reduce capture volume) lower
operating voltage, colder packaging materials, etc.

For many generations now, DRAM processes have included bizarre features
solely included to greatly increase how much capacitance can be packed
into the cell area (which must be very, very small for decent cost).
Trenches many times deeper than the cell lateral dimension have been part
of the solution for generations.

So whether things get better or worse by generation is a business and
therefore a design decision, not just some inevitable consequence of
geometry.

DRAM's are, or at least were, blessed by having some very knowledgeable
customers who both care about and are able to measure their field failure
rates. (this is utterly untrue of microprocessors--which in consequence
probably fail at drastically higher rates). My big fear would be that
DRAM manufacturers might spot a market opportunity to sell a drastically
worse random-error rate design to the consumer market. Heaven help us if
Microsoft is the outfit measuring our error rates!!!

I would guess that with the overclocking madness, and a general shift in the
use of consumer equipment toward multimedia and entertainment, that this has
already happened.

Ever since HP screamed bloodly murder (truthfully) about how much better
Japanese DRAMs were compared to U.S. DRAMs in the early or mid 1980s
folks in that business have been acutely aware that they could not afford
to be much worse than their competitors, nor much worse than
expectations.

Yes, I remember that well.

Still, I'd prefer ECC, but I don't insist on it. In years of running one
system in ECC, I never saw a reported single-bit corrected fault.

I have never seen a corrected fault in my server system. It might not
report. It does appear that some kinds of corruption peculiar to the Athlon
write-back cache design is caused by an MMU design problem, and in the case
of Nvidia cards, it does indeed manifest as a parity error with certain
video operations, such as mpeg decoding. But I'm unaware of anything like
that with Intel CPUs.

Nevertheless I black-screened to alleged parity error half a dozen times.
Quite like these were not parity errors at all, but software bugs which
trapped to that interrupt vector.

The most quoted benchmark value for cosmic ray induced bit flipping is
once/256 megabytes/two weeks. Is this within a power of 10 of the actual
figure?

Bob Morein
Dresher, PA
(215) 646-4894


.



Relevant Pages

  • ECC (Was: building audio pc)
    ... how do you feel about machines without ECC ram? ... The dominant DRAM bit-error problem in healthy parts I know about comes ... The key design parameter affecting sensitivity to alpha hits (capture ...
    (rec.audio.pro)
  • Re: OpenVMS Seminar in Toronto (2005-02-24) a few points
    ... >HP has no motivation whatsoever to do anything but gouge Alpha ... "Relatively little additional design cost"?!? ... designed before the HP-Compaq merger. ... DS15 was designed after the merger, but the overriding goal was fast time ...
    (comp.os.vms)
  • Re: adding ceramics across power pins
    ... I occasionally design SMA connectors into real pcb's to TDR/TDT the ... power planes and later measure actual operating noise. ... A case in point would be LDOs. ... All that capacitance the apps guys like ...
    (sci.electronics.design)
  • Re: OpenVMS Seminar in Toronto (2005-02-24) a few points
    ... But the design work was started much earlier. ... >had time to do so with Alpha which also got new models recently. ... The solution to too-expensive AlphaServers running VMS is less-expensive ... ideas to reduce the cost of AlphaServers today. ...
    (comp.os.vms)
  • Re: DECpc AXP 150 motherboard in a PC case?
    ... The enclosure for the DECpc AXP 150, aka Jensen, was designed and built ... to the Alpha. ... nobody put that into the design spec. ... management port problems, console problems, weird boot problems, it's a long ...
    (comp.sys.dec)

Loading