[9fans] Random SATA errors with SMP on a dual core machine.



Has anyone else seen this? I am experiencing random SATA errors when
I turn on SMP on a dual core machine.

After a several-year hiatus, I just got some new hardware to build a
plan 9 network at home. My file server is a 1U rackmount machine with
the following hardware:

1. SuperMicro PDSML-LN2+ motherboard
- builtin ICH7R SATA controller
- builtin Intel 82573L Gigabit Ethernet adapter)
2. 1.8GHz Dual-core Intel Core2 Duo processor
3. 2GB RAM
4. 2 x 750GB SATA drives
5. 1 x 2GB Compact Flash removal disk.

Note that this machine has neither a CD nor DVD drive. This is
because I misread the vendor's quote: they could not fit a slimline CD
or DVD drive into the 1U chassis along with two hard drives but I
didn't realize that until I pulled the machine out of the box. I got
around this by installing Plan 9 onto the compact flash card on
another machine that did have a CD drive, then bringing it up on this
machine.

The first problem I had was using the SATA drives; the SATA drivers in
the distributed kernel had problems, so I updated them to the latest
from Erik's directory on sources. Specifically:

% 9fs sources
% cd /n/sources/contrib/quanstro/root/sys/src/9/pc
% cp sdata.c sdiahci.c ahci.h /sys/src/9/pc
% cd ../port
% cp devsd.c sd.h sdloop.c /sys/src/9/port
% cd ../../libfis
% mkdir /sys/src/libfis
% cp fis.h mkfile /sys/src/libfis
% cd /sys/src/libfs
% mk install

I then edited the appropriate mkfile to refer to /386/lib/libfis.a and
built the 'pcf' kernel, copied it to 9fat (on the CF card) and
rebooted. I'm not sure that I didn't miss any steps, but I was able
to fdisk, prep and flfmt the SATA drives and load the operating system
by running the (slightly edited) installation scripts from
/sys/lib/dist/pc/inst, choosing a fossil+venti configuration. To this
point, I'd only been using one core as '*nomp=1' was set in plan9.ini.
At this point, everything is still running as a terminal.

Now the problem that I am seeing is that, if I boot the machine up
with both cores enabled, I get some relatively small amount of use out
of the SATA drives, then I get a (seemingly) random i/o error and then
all further access to the drives fails. I am still booting from the
CF disk, but using the fossil on the SATA drives as the root. I was
also having problems with rio, but upon further investigation, I see
that there are known issues with VESA and MP, but even if I don't load
the VGA registers and stay in CGA mode things still behave strangely
(for instance, my venti got corrupted and all of /sys/include
disappeared). However, if I set '*nomp=1' in plan9.ini, everything
works fine.

Has anyone seen this before?
Is this a known issue?
Even better, is there a fix?

Btw: my long term intention is to use the fs driver to mirror fossil
and venti across both of the SATA drives, keep a small fossil on the
CF card for emergencies, and keep a partition there for secstore data.
But I haven't gotten to that stage yet.

- Dan C.

.



Relevant Pages

  • Re: SBS 2003 + SATA + failing software mirror
    ... On my SCSI test rig everything works as you would expect..with SATA I'm getting conflicting issues. ... I'm booted back to the boot floppy. ... If you have a boot floppy, it would point to a different rdisk or rdisk, reflecting the other half of the mirror. ... Installed onto a small HP server with two 250GB SATA drives software RAID 1 mirrored. ...
    (microsoft.public.windows.server.sbs)
  • Re: SATA Raid driver problem...
    ... > possible to copy from IDE to SATA, IDE to IDE or SATA to SATA. ... > Ghost but it didn't work anymore once I tried copying SATA drives. ... >>> correctly and so that Windows setup will proceed and complete. ... >>> Windows will require the driver disk to be inserted during the first ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: Win2K SP4 and SATA
    ... I take it I'd need a Windows 2000 SP4 CD for that to work though, as a repair ... Install SATA controller drivers onto EXISTING Win2000 system. ... >> SATA drives? ...
    (microsoft.public.win2000.setup)
  • Re: Starting with SBS2000
    ... cheap nasty rubbishy SATA drives ... them BSOD simply because the cheap nasty SATA RAID1 (Intel mobo, ... RAID onboard) lost one of the cheap nasty rubbishy SATA drives. ... >> servers like P4 hyperthreading, a gig of ram, with a variety of hard ...
    (microsoft.public.windows.server.sbs)
  • Re: PCIe Core2 Duo Motherboard?
    ... I'm currently evaluating a machine with a looks-good-on-paper motherboard, the Intel DG965OT. ... FreeBSD detects a generic PCI ATA controller, and then fails to detect the optical drive attached to it. ... The kernel appears to detect four of the six SATA headers on this board. ... Will it not run on Core 2s or is this a shortcoming in the documentation? ...
    (freebsd-questions)