Re: kernal panics -> hardware issue?



D P Schreber wrote:
I've had a 15" 2.16GHz MBP for about six weeks now, and in that time the
machine has crashed four times with kernel panics. There is no
third-party hardware in or attached to this machine and there are no
third-party kernel extensions. The panic log is below.

Up until now I've been thinking the panics were simply bugs in the intel
kernel, which is after all still fairly new and not nearly as thoroughly
field-tested as the ppc kernel. In my experience, hardware problems
will show up everywhere, in application crashes and random weird
behavior as well as kernel crashes. That's definitely not what I'm
seeing.

If software is the root cause of this, there's obviously no point in
sending it back to Apple for repair; I'll just have to wait for a fix
via Software Update. In the meantime I can continue using the machine
every day, since it works fine 99% of the time. I would really prefer
not to be without it if sending it to Apple won't fix anything anyway.

So, I'm soliciting opinions, to get a sense of what other expert users
think: keeping everything above in mind, are the panics below caused by
hardware problems or software problems, and why do you think so?

This problem is almost certainly a firmware/driver issue. Similar backtrace info is like a rash all over Apple Discussions, and the jury is out on the exact cause, though there is a strong indication this is at least partially hardware related.

The only real hint we have is that it crashed at least once in com.apple.iokit.IOPCIFamily, and the kernel looked like it was doing something with the graphics module when it got a page fault. This is bad, and it means that software (or firmware) has done something that forced the hardware to raise a big red flag.

The interesting thing is that we have a variety of kernel panics here. The only commonality appears to be code that is related to moving large amounts of data to/from devices to the kernel. In one case is was the graphics adapter, but all those mbuf panics are related to networking, I think.

Something you can try:

Run with Ethernet only for awhile and see if the panics continue. There are indications that the AirPort device (at some point) may be at fault. There is some speculation that larger frames or more traffic over WPA 801.11g might be the culprit. Some people have reported success replacing the Airport card with one that has newer firmware, though others have indicated that this is not the case, and they required a logic board exchange.

Can you get it to panic if you run the system test on it (I assume that these models come with a bootable system test disc)?

But this is a real problem, and it is system enough to clearly be Apple's responsibility. If this is still under warranty I'd make it clear to them that this is not an isolated incident, and something appears to be wrong. I'd carefully reseat all the devices you can get at, and reinstall to the default OS. If the panics continue it is clearly not your fault, or merely something to wait for them to fix.

If Apple isn't convinced that there is an edge case (rare or not) with one of their drivers or hardware firmware releases then no subsequent update will fix anything, anyway. It looks like some models are hosed and Apple is quietly replacing logic boards.

Bottom line: if you paid for AppleCare or the unit is still under warranty it sounds like you have done your due diligence and the hardware is busted. It looks like some small percentage of MBPros have a problem that makes them DOA and Apple is replacing them.
.



Relevant Pages

  • Repeated panics...Suspect USB issues
    ... I believe the crashes to be USB-related. ... page fault while in kernel mode ... the panics started was the addition of the printer/adapter, ...
    (freebsd-stable)
  • Re: stable sata patch: panic at kernel boot (cant dump)
    ... DA> | kernel panic. ... DA> | Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel ... No, the system panics reliably, just after sio initializing (for me it seems ... I did not use hardware RAID, I use vinum over these 5 ...
    (freebsd-stable)
  • Re: Questions regarding a PANIC situation
    ... OGC using running on an Informix Database. ... answer that the problem was an Hardware issue. ... So the source of the panic is somewhere within the OS kernel itself ... Most times, when we experience panics, it's the direct ...
    (comp.unix.sco.misc)
  • Re: Failover Kernel
    ... Just putting a backup kernel into the memory and receiving keepalives by primary kernel. ... In normal conditions, backup kernel just will sit in its place, will monitor the status of primary kernel and will do nothing else more. ... it's clear that this system will not work for all the scenarios (like bad hardware etc.). ... If either image crashes during boot, ...
    (Linux-Kernel)
  • spin_trylock/spin_unlock panic in 2.6.9-42.0.3.EL
    ... I have a system running CentOS 4.4 (kernel 2.6.9-42.0.3.EL) which crashed several times in just over an hour a couple nights ago. ... In irc, someone suggested this might be caused by bad hardware, but I have time believing that hardware going bad could cause the kernel to lose track of spinlocks, especially in exactly the same way repeatedly. ... This system has been in use and stable for several months prior to these crashes. ...
    (Linux-Kernel)