Re: PC Motherboard Chipsets and Parts Vendors



"Mike Rivers" wrote ...
Soundhaspriority wrote:
Mike, this reasoning is not valid. We're talking soft errors, which occur
when nothing is wrong with the memory module.

If there's nothing wrong with the memory module, then why is there an
error? If there's an error, there's something wrong with the memory
module, if only intermittently.

Mike, that's like saying that you have defective paint on your
car if it gets dinged by one of those baseball-size hailstones.
There's nothing wrong with the paint. You can't protect against
giant hailstones unless you want to drive an armored tank.

Cosmic "rays" (actually particles) come from outer space and
pass through everything. They use detectors hundreds of metres
underground to measure them. They can cause random "noise"
in most any chip (not just memory). There is no known way to
block them, so you can't "shield" your computer.
http://en.wikipedia.org/wiki/Cosmic_ray

Fortunately, the likelyhood of a cosmic ray flipping a bit in your
computer memory is too unlikely to make ECC worth the expense
unless you're running something high-stakes (life-safety systems,
servers for hundreds or thousands of users, etc.)

"The error rate in today's consumer-level memory is so low
so that for most everyday applications, adding ECC is pure
overkill. For standard DDR2 memory, the error rate is
something like 100 soft errors over 1 billion device hours.
If there are 16 memory devices or chips on a given module,
that translates to one soft error every 30 years. Even if you
only have two such DIMMs in a system, that's still less than
one error for more than the lifetime of the system as a whole."
http://searchwincomputing.techtarget.com/tip/0,289483,sid68_gci1251848,00.html
Thanks to "JulienBH" for this reference. There are more but
I'm not motivated to find them.

The causes are cosmic rays and poorly modeled noise.

I don't know what "poorly modeled noise" is,

Perhaps Bob will explain.

but you're not going to convince me that cosmic rays cause computer errors
no matter how many web sites you find that say they do. Maybe in the lab,
but in my house? Inside a solid metal case? Should I be wearing a metal
helmet?

They CAN cause memory (and even CPU) errors, and
nobody disputes that. The issue is whether it happens
often enough to warrant spending extra $$$ on ECC to
detect and/or correct it. I'd pay $5-10 extra for ECC,
but not $50-100. Its just not worth it.

Case in point: Virtually none of us run computers with ECC
and it is unlikely that any of us have experienced a significant
problem from a cosmic "ray" causing an error in our computers.
There are dozens of different hazards to yourself, your property,
your computer, your media, your data, etc. that are much more
likely (and worth spending $$ to protect against or mitigate)
than cosmic rays causing soft errors.

Why would anyone sell a computer device that is allowed to make errors and
still considered to be working normally?
Isn't there a better terminology you can use?

There are cosmic "rays" passing through your roof and maybe
even through your brain right now as you read this. Fortunately
our bodies, and most of what we make and use are not affected
by this natural phenomenon. Things as microscopic and sensitive
as integrated circuits ARE susceptible, but both scientific data
and actual real-world experience suggest that it doesn't make
the top 10 list of things to worry about happening to your PC.

Cosmic "rays" cause a tiny fraction of the noise we hear in
any analog circuit, but it is a ~10th order effect compared to
the much more common things like Johnson-Nyquist noise
(i.e. "thermal noise").

This is not fringe stuff. Google has around a hundred thousand blade
servers in racks that cover acres, and all of them use ECC technology.

If I had Google's money and technology support resources, I'd do what they
tell me to do. But I'm just an occasional user and I don't work my
computers so hard that they crash. If I have soft errors, I don't know it.

Most servers use ECC because it is prudent (and not a significant
cost differential in "heavy-iron" computing). But most end-user
computers ("workstations") do not use ECC because the cost vs.
benefit ratio is significantly negative.

"To alleviate this problem, Intel has proposed a cosmic ray
detector which could be integrated into future high-density
microprocessors, allowing the processor to repeat the last
command following a cosmic ray event."
http://en.wikipedia.org/wiki/Cosmic_ray

"The risk from cosmic rays may not be thought of as a big
problem on a single computer with a single chip, as there is
the potential for error only perhaps every several years.
But Mr Hannah explained that on a supercomputer with
10,000 chips, there was the potential for 10 or 20 faults
a week.....
"He said that discussions are now under way within Intel
about how to build such a detector and see how it works.
"But he admitted that it will be hard to say when such a
device may become a practical reality. "
http://news.bbc.co.uk/2/hi/technology/7335322.stm

The only reason it isn't part of the consumer market is that it's so hard
for the consumer to understand why he should have it.

You'd think that if it were that critical more people would
perceive it. OTOH..., well, draw your own conclusions.

With explanations like "soft errors but working normally" and "cosmic
rays," and most important, that non-ECC is the most common type of memory,
it's no wonder the consumer doesn't understand why he needs it. While
you're at it, why not try to convince people that they should listen to
24-bit DVDs instead of MP3s?

An excellent point. If SER was any kind of significant issue
for the average computer user, you can be sure that the
people who market them would jump at the chance to prove
to customers that the extra $$ was worth it. Instead, even
people who make ECC admit that it isn't really necessary
in your average PC.


.



Relevant Pages

  • Re: 2 Gigabyte ddr2 memory modules for P5K WS
    ... P5K because it has a serial port and I need one for an old Wacom ... and it appears to be pretty stable with 8 gigs of memory. ... It is possible to get ECC protection for memory products, ... x38 chipset, but am I really future proofing a computer, or will I ...
    (alt.comp.periphs.mainboard.asus)
  • Re: 2 Gigabyte ddr2 memory modules for P5K WS
    ... x38 chipset, but am I really future proofing a computer, or will I ... Things like CSI and integrating the memory controller on the processor, ... Crush all chipset maker opposition (VIA is already out of the picture, ... I might consider four sticks with ECC. ...
    (alt.comp.periphs.mainboard.asus)
  • Re: ECC and DMA to/from disk controllers
    ... that a disk controller correctly reads data from disk and ... has correct data in the controller memory and buffers. ... ECC bits would be calculated in server memory based on incorrect data from ... One example is "bus mastering" DMA itself. ...
    (Linux-Kernel)
  • Re: Any comparisons of Opteron F and Woodcrest yet?
    ... memory capacity. ... What do L, SMP and ECC mean? ... Socket F and AM2 are very similar, as are Conroe and Woodcrest. ... Each Woodcrest/Conroe and Socket F/AM2 processor contains 2 cores. ...
    (comp.arch)
  • Re: Using Corsair pc3200 512 mb x 2 value select _Double Sided_ ram modules in P5P800 SE motherboard
    ... I ran the memory with no problems as ... bit) memory modules are not supported in this motherboard". ... You should be able to take any DIMM that worked in P4C800-E Deluxe, ... (An ECC DIMM would no longer support ECC ...
    (alt.comp.periphs.mainboard.asus)