Re: Screen Resolution in Kubuntu



On 28 May 2007, Tony Houghton outgrape:

In <87646dnb6b.fsf@xxxxxxxxxxxxxxx>,
Nix <nix-razor-pit@xxxxxxxxxxxxx> wrote:

The periodic check definitely is necessary with journalling. Filesystem
corruption (or, more accurately, metadata corruption) can happen for
numerous reasons, among them:

OK, I'd better turn the checks back on. I think I'll base it only on
interval-between-checks though and not reenable max-mount-counts because
that's annoying on a machine that gets turned off regularly.

max-mount-counts is useful mostly iff mounting/unmounting is likely to
cause extra corruption. I've heard of some ancient Unix systems on which
this was true, but I suspect that parameter is mostly there out of
historical inertia now. (Note that mke2fs will try to a different value
for this parameter in every filesystem to stop thundering herds of
fscks: you can't really avoid those with the age-based check.)

[Snip]
- controller or RAM problems (intermittent bitflips, often due to
alpha particle absorption from decay of radioisotopes in the chip
housing, but sometimes caused by solar activity and random
wandering neutrons from secondary radiation, runs at about two
bitflips per month per Gb with current densities)

Do they usually get fixed by simple parity checks and only cause a
problem if two or more occur together?

They can get *detected* by simple parity checks: ECC can fix them.
(Parity checking can tell you that a bit was flipped, but not which
one.)

Two or more are very unlikely to happen together in the same word
unless you're running inside a nuclear reactor ;)

Or do they go unnoticed because
they usually affect content rather than its structure?

They go unnoticed because most RAM is mostly not going to be read again
at any given point (e.g. it's cache which will be recycled) or because
it causes little problem (perhaps it's a text page: you might be unlucky
and have an app die: that happens often enough anyway...)

But as RAM densities go up, the error rate will rise sharply: and as
memory capacities go up, the error rate per machine will *also* rise...
this will become a much bigger problem than it is now.

Basically I'm not sure what would be a suitable interval. Regardless of
drive capacity or usage, one month seems a tad on the inconveniently
frequent side, a year two long for safety. 3 months do you think?

I use 90--180 days myself. I also do daily short SMART checks, weekly
long ones, and monthly RAID array checks, and have smartd and mdadm in
monitor mode running to spot major problems as they arise. These have
proved useful when drives started to die. (mdadm is especially critical
because the Linux md subsystem is good enough that otherwise you can
lose a disk and go into degraded mode on your RAID-5 array and not
notice for months, until *another* disk goes, and whoops now it's too
damn late, you're dead.)

I find it quite impressive that all this checking uses hardly any RAM or
system time on the low-end 320Mb-RAM PIII that's running my RAID arrays.
(<1Mb RSS and a bunch of background disk I/O for a few hours a month,
that's all.)


Of course, if you care about your disk contents, you'll use RAID and
back up sometimes (I'll admit that it's been too long since I backed up
because my backup scripts broke and I haven't fixed them, but I've got
the RAID :) )

--
`On a scale of one to ten of usefulness, BBC BASIC was several points ahead
of the competition, scoring a relatively respectable zero.' --- Peter Corlett
.



Relevant Pages

  • Re: amd64 sata_nv (massive) memory corruption
    ... I then did some more debugging, and isolated the original data corruption ... problem to a bad pair of RAM sticks. ... that the sata interface appears to be stable. ... Originally, I had the sata disk paired to a pata disk in a RAID array, and the ...
    (Linux-Kernel)
  • Re: Stop Error Message: 0X0000007A(OXC03E10A8.....................
    ... enough to read the report. ... It could be that a minor case of disk corruption caused ... Run Chkdsk /f /r on the system partition. ... chkdsk may be a bad decision if disk corruption is suspected. ...
    (microsoft.public.windowsxp.setup_deployment)
  • Re: "no such file" from one node only
    ... Yes there are some disk errors but none of those are recent. ... Then I did the creation from node A. Lo and behold, ... But the directory still has some corruption in it. ... suggested, maybe 1 bad read loaded bad data into a cache, and some got ...
    (comp.os.vms)
  • how to do/repair a raid1 missing disk install (was: Re: lilo + raid = disaster (again))
    ... Lilo promptly corrupted some of your data. ... Get a boot disk that supports RAID1, your SATA drives, and whatever ... Repair whatever is the first thing in your RAID array. ...
    (Debian-User)
  • Re: moving boot drive to new HD & recovering data from corrupt HD
    ... If the old disk is the one that is "physically" corrupted - I wouldn't even ... Richard Urban Wrote: ... The last bit of information, excessive reallocated sectors, sheds a ... Remember the difference between logical corruption and physical ...
    (microsoft.public.windowsxp.hardware)