Re: Weird kernel errors



On Thu, 01 Dec 2005 18:53:55 +0000, Alex Butcher
<alex.butcher.news1005@xxxxxxxxxxxxxx> wrote:

> On Thu, 01 Dec 2005 13:24:58 +0000, Chris Croughton wrote:
>
>> I'm using a Gentoo system, with kernel 2.6.10-gentoo-r6, on an Athlon
>> 1800XP system (Jetway motherboard). I'm getting a couple of oddities,
>> both annoying:
>>
>> The time is horribly variable. On occasions it suddenly loses 4-5
>> seconds. This may be related to the kernel error message:
>>
>> [kernel] Hangcheck: hangcheck value past margin!
>>
>> which as far as I can see means that a kernel timer has 'hung' for a
>> long time (when it occurs it seems to repeat about every 3-4 minutes).
>> This is worrying...
>
> I guess that could be down to buggy TSC implementation on your CPU (which
> is what hangcheck uses) or perhaps something is causing the machine to
> pause for >=180 seconds more than hangcheck is expecting.
>
> Is the BIOS flashed to the most recent version? (I'm thinking of ACPI bugs
> here)
>
> Is power management enabled in the BIOS?

Yes and I can't see an option. But oddly the behaviour seems to have
disappeared, no sign of the error message for several weeks now. And I
didn't do anything. Chrony is now happily keeping the time synchronised
to within a few milliseconds of the remote server (it was jumping about
all over the place).

>> The second is that my hard disks won't accept DMA enabling, so are slow.
>
> Some drivers/controllers won't let you enable DMA using hdparm, but will
> happily continue using it if it has already been setup by the BIOS before
> the kernel loads.

I've found it, eventually. It was setting up my new AMD64 machine that
found it, I ran hdparm -Tt when I was installing and it reported nice
speeds. Yesterday I put in a 250GB drive and ran hdparm on it -- no
DMA. No DMA on any of the drives. I wondered whether I'd broken
something. Then I thought to retry with the boot/install CD (not
something I can generally do with the fileserver) -- and DMA was enabled
and speeds were high again! So I liooked at dmesg from both the LiveCD
and the normal boot, and found that although both said that they were
using the generic IDE drivers the LiveCD one indicated that it was a VIA
82* controller and my one didn't. So I rebuilt the kernel with the VIA
82C* driver and it all enables DMA happily.

So then I rebuilt the fileserver kernel with the same change, and
rebooted. Having forgotten to rerun LILO (the AMD64 uses GRUB), it then
hung and I had to get a LiveCD for the x86 to run LILO. And then it
happily enabled DMA and I get around 60MB/s.

(Weirdness: I set the acoustic management to "slow and quiet" (-M128)
and the drive seems to be faster!)

> What values does 'hdparm -tT' report? Have you tried booting the kernel
> with 'elevator=deadline'? What effect does that have on the figures
> reported by 'hdparm -tT'?

It was getting around 5MB/s, around 7MB/s if I forced 32 bit transfers
(-c1). Now I'm getting 30MB/s on the 30GB drive, 50MB/s on the 80GB
drive and 60MB/s on the 250MB drive. Which is rather better, and
actually saturates the NFS throughput rather than being the
bottleneck...

> So CurCHS/CurSects seems to be the 8GByte ATA limit. This doesn't matter,
> as Linux is using the LBA48 geometry to access the entire disc. LBAsects
> must reflect the 137.4GByte LBA28 limit.

Yes, doing -I gives

CHS current addressable sectors: 4128705
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 488397168
device size with M = 1024*1024: 238475 MBytes
device size with M = 1000*1000: 250059 MBytes (250 GB)

> Various BIOS LBA48 implementations, chipsets, kernels and partitioning
> tools have various bugs that mean that sometimes partitions aren't created
> on the boundaries that other partitioning tools (and, worse,
> filesystems/block device layers) think they should have. Some of these
> bugs can be fatal. Yes, it sucks, particularly if you're running more than
> one OS (e.g. Windows, which has been reported to write outside its
> allotted partitions if they aren't "right").

I've seen that. If I actually need to dual-boot I use Partition Magic
(I know, it's non-free (as in beer as well as freedom) and runs on
'doze, but it's the best I've found) to create the partitions and Linux
or 'doze to format them as apropriate, that way everyone is happy.

> Read more about LBA48 at <http://www.48bitlba.com/>

Thanks. "No one will ever need more than a 8/32/137GB disk..." I have
over two terabytes of disk now in my house (that's frightening!), but
most of it is in USB drives and the largest drive I have so far is
250GB. It may take me a while to reach the LBA48 limit...

Chris C
.



Relevant Pages

  • RE: Enabling DMA for disks in Redhat 3.4
    ... Enabling DMA for disks in Redhat 3.4 ... > DMA using ATA/EIDE drives. ... I haven't had much luck with rebuilding the kernel ... - Changing DMA mode 'on the fly' has been disabled for some time. ...
    (RedHat)
  • Re: chk drive for errors?
    ... the IDE drives appear as SCSI drives, the same as SATA drives, and limits ... the number of partitions to 15. ... That's an old limit that SCSI had. ... kernel that is pretty much plain jane, with solid drivers, for most ...
    (alt.os.linux.suse)
  • Re: CMD680, kernel 2.4.21, and heartache
    ... Software RAID, mirroring drives. ... timeout waiting for DMA' and then a 'reset: ... Kernel version is gentoo-sources 2.4.20-r7, ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: chk drive for errors?
    ... the IDE drives appear as SCSI drives, the same as SATA drives, and limits ... the number of partitions to 15. ... That's an old limit that SCSI had. ... updated default kernel with the included older module but my entire hde ...
    (alt.os.linux.suse)
  • Re: Enabling DMA for disks in Redhat 3.4
    ... > DMA using ATA/EIDE drives. ... > So far the only light is rebuilding the kernel to support DMA as it seems ... - Changing DMA mode 'on the fly' has been disabled for some time. ...
    (RedHat)