Re: Single-bit corrected errors



On Tue, 28 Feb 2006 23:37:07 -0500, Keith <krw@xxxxxxxxxx> put finger
to keyboard and composed:

On Wed, 01 Mar 2006 07:41:49 +1100, Franc Zabkar wrote:

On Tue, 28 Feb 2006 13:57:16 -0500, Keith <krw@xxxxxxxxxx> put finger
to keyboard and composed:

On Tue, 28 Feb 2006 18:46:54 +1100, Franc Zabkar wrote:

On Mon, 27 Feb 2006 21:58:47 -0500, Keith <krw@xxxxxxxxxx> put finger
to keyboard and composed:

On Sat, 25 Feb 2006 20:10:36 -0500, daytripper wrote:

Google "modified Hamming code"....

Sorry, Del. I know all about hamming codes, but couldn't find this
particular syndrome, symptom described in the hundreds of pages...

http://www.google.com/search?rls=en&q=syndrome+ecc

Single-bit Error Correction Error Correcting Code (SEC ECC):
http://www.realworldtech.com/page.cfm?ArticleID=RWT121603153445&p=3

Single bit Error Correction, Double bit Error Detection, Or SECDED
ECC:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121603153445&p=4

Oh, I've been doing HammingCodes since the early '70s (I emailed an Excel
spread*** that did various SEC/DED hamming code examples to Dean Kent a
few years back).

...but I don't see where it tells me which DIMM caused my "000A7DA0 - 00D1". ;-)

I don't understand why you are having a hard time pinning down the
problem to a particular stick, if indeed that is your problem. Just
run with two sticks at a time until you find the faulty pair, then run
each stick separately. You don't need an understanding of ECC to do
this, just some rudimentary troubleshooting ability. If the problem
isn't in your DRAM, then AFAICS it has to be in the CPU.

Evidently you don't understand that I do *not* want to swap random sticks
in a working system. I've done more now than I really want to do. These
things aren't made to swap parts all day! Let me put it another way, if
this is the way you do business, I feel sorry for your customers.

If ever there was a thread that was a testament to your incompetence,
then this is it.

You claim to be an "engineer" who "qualified boards in a former life",
yet you approach the simplest of tasks with great trepidation. What is
it about swapping memory modules that fills you with apprehension?

Despite your unconvincing protestations to the contrary, the evidence
shows that you have no real understanding of ECC, otherwise you would
have known what "syndrome" bits were. I first encountered ECC in
16-bit memory boards some 20 years ago when I was doing chip level
repairs on minicomputer hardware, so it's hardly new technology.

I also find it inconceivable that you would not be aware of memory
testing software such as Memtest. I would have thought that this was a
standard diagnostic tool in *every* genuine PC technician's toolkit.

You've also admitted that you don't understand the BIOS error report,
so what's left for you to do except "swap parts all day", as you have
already done? And why is it that some *ten days* after your initial
post, after "swapping parts all day", you are still no nearer a
solution? In any case, why would you think that you *need* to "swap
parts all day"? Have you never heard of binary search, ie start with
4, then go down to 2, and then finally to 1. Quite simple, really.

Finally, should you eventually determine that the SBE is not due to a
single faulty bit in motherboard RAM, then AFAICS your only remaining
culprit is the CPU cache.

- Franc Zabkar
--
Please remove one 'i' from my address when replying by email.
.


Loading