Re: ECC (Was: building audio pc)
- From: "Soundhaspriority" <nowhere@xxxxxxxxxxx>
- Date: Sat, 30 Jun 2007 00:38:33 -0400
"Peter A. Stoll" <Lyn2Stoll_spamdel@xxxxxxxxx> wrote in message
news:Xns995EDABB2F26DHaifa10Kulim07Michel@xxxxxxxxxxxxxxxxx
"Soundhaspriority" <nowhere@xxxxxxxxxxx> wrote inWhen they went from 130nm to 90nm, and 2.5v to 1.8v, with DDR-->DDR2, you're
news:Ar-dndLAyPc7WxjbnZ2dnUVZ_oqmnZ2d@xxxxxxxxxxxx:
"Peter A. Stoll" <Lyn2Stoll_spamdel@xxxxxxxxx> wrote in message
news:Xns995EA1C08822DHaifa10Kulim07Michel@xxxxxxxxxxxxxxxxx
Laurence Payne <lpayne1NOSPAM@dslDOTpipexDOTcom> wrote inPeter, how do you feel about machines without ECC ram? I'm seriously
news:atg883dohglenclp4ike5vek6465o662jc@xxxxxxx:
allergic to it. See http://news.com.com/8301-10784_3-9721344-7.html
The ram makers claim that bit flips have decreased with every process
generation, but there is physical basis for increased rate as the
geometry shrinks.
Well, you can't know unless you know how conservatively they are
designing the cell capacitance in each generation, and how cold the
packaging they've managed to arrange.
saying they could have neverthless preserved the total stored charge? It's
not just a matter of capacitance, but voltage.
The dominant DRAM bit-error problem in healthy parts I know about comes
from alpha particle hits. Oddly enough, most of the offending alphas
come from the packaging materials themselves, so the vendor in principle
can have a pretty good idea of the likely rate.
Alpha was pretty well eliminated, but cosmic rays cannot be controlled. When
you go up to 6,000 and above (Doris County, CO, for example) the cosmic ray
flux increases drastically, and my laptop (coincidentally?) crashed.
I would guess that with the overclocking madness, and a general shift in the
The key design parameter affecting sensitivity to alpha hits (capture
cross section if you like the terminology) is: how much false charge
added to the cell does it take to flip its value? As the actual false
charge obtained from an interacting alpha is a fairly well-known
distribution, if you know the voltage margin you can choose what
capacitance just meets your design failure rate goal. In the early 1980s
the magic value was about 50 femtofarads, but that will have changed with
such features as epitaxial silicon (tends to reduce capture volume) lower
operating voltage, colder packaging materials, etc.
For many generations now, DRAM processes have included bizarre features
solely included to greatly increase how much capacitance can be packed
into the cell area (which must be very, very small for decent cost).
Trenches many times deeper than the cell lateral dimension have been part
of the solution for generations.
So whether things get better or worse by generation is a business and
therefore a design decision, not just some inevitable consequence of
geometry.
DRAM's are, or at least were, blessed by having some very knowledgeable
customers who both care about and are able to measure their field failure
rates. (this is utterly untrue of microprocessors--which in consequence
probably fail at drastically higher rates). My big fear would be that
DRAM manufacturers might spot a market opportunity to sell a drastically
worse random-error rate design to the consumer market. Heaven help us if
Microsoft is the outfit measuring our error rates!!!
use of consumer equipment toward multimedia and entertainment, that this has
already happened.
Ever since HP screamed bloodly murder (truthfully) about how much betterYes, I remember that well.
Japanese DRAMs were compared to U.S. DRAMs in the early or mid 1980s
folks in that business have been acutely aware that they could not afford
to be much worse than their competitors, nor much worse than
expectations.
Still, I'd prefer ECC, but I don't insist on it. In years of running one
system in ECC, I never saw a reported single-bit corrected fault.
I have never seen a corrected fault in my server system. It might not
report. It does appear that some kinds of corruption peculiar to the Athlon
write-back cache design is caused by an MMU design problem, and in the case
of Nvidia cards, it does indeed manifest as a parity error with certain
video operations, such as mpeg decoding. But I'm unaware of anything like
that with Intel CPUs.
Nevertheless I black-screened to alleged parity error half a dozen times.
Quite like these were not parity errors at all, but software bugs which
trapped to that interrupt vector.
The most quoted benchmark value for cosmic ray induced bit flipping is
once/256 megabytes/two weeks. Is this within a power of 10 of the actual
figure?
Bob Morein
Dresher, PA
(215) 646-4894
.
- Follow-Ups:
- Re: ECC (Was: building audio pc)
- From: Peter A. Stoll
- Re: ECC (Was: building audio pc)
- References:
- building audio pc
- From: Black Tortoise
- Re: building audio pc
- From: Drums
- Re: building audio pc
- From: Eeyore
- Re: building audio pc
- From: Laurence Payne
- Re: building audio pc
- From: Peter A. Stoll
- Re: building audio pc
- From: Soundhaspriority
- ECC (Was: building audio pc)
- From: Peter A. Stoll
- building audio pc
- Prev by Date: ECC (Was: building audio pc)
- Next by Date: Great deals on Mp3 players!
- Previous by thread: ECC (Was: building audio pc)
- Next by thread: Re: ECC (Was: building audio pc)
- Index(es):
Relevant Pages
|
Loading