Re: Lack of bit field instructions in x86 instruction set because of patents ?



In article <okVul.15712$as4.1365@xxxxxxxxxxxxxxxxxxxx>,
Stephen Sprunk <stephen@xxxxxxxxxx> wrote:

The more reliable a system is designed to be, the more complicated it
must be -- and the bigger the problem when it does fail, as everything
eventually does.

Absolutely NOT! You do not get genuine reliability by complexity;
you get it in by designing the system to (a) not have certain classes
of errors in the first place, (b) being able to detect all errors of
certain other classes and having a reliable recovery method and, most
interestingly, (c) by ensuring that other errors are self-correcting.
And, lastly, by being able to prove (or at least provide very strong
evidence for) the claim that ALL failures (andticipated and otherwise)
fall into one of those classes.

The only place that the actual programming comes in is in coding up
the design in such a way that you haven't introduced new error modes.
And THAT is where the POSIX/C threading model fails so badly.

OTOH, systems designed to accept small-scale failure as a normal
practice rarely if ever suffer massive failures (for a variety of
reasons). Compare the Internet to the phone system: the Internet has
never been entirely down in its entire history, while the phone system
has -- but people think the phone system is more reliable because it
only has a system-wide outage once a decade or so, while we experience
tiny parts of the Internet being down every day...

I don't remember the telephone system ever having failed in toto in
any modernised country except when its government has gone bananas
and used centralised powers to turn the thing off (yes, Thatcher,
Blair etc., I am thinking of you)!

Also, that's not true at all. You are correct that the Internet was
carefully designed to be robust against most small-scale failures
becoming large ones and, by and large, its design works in that
respect. But it also provides an example of where accepting frequent
small-scale failures does NOT lead to reliability of the whole - the
DNS!

We have already seen local failures cause havoc on a Tier-1 server;
with the network probabilities involved, there is only one that can
continue normally without relying on any others - and then with a
fairly severe loss of function. There are actual failure modes
against which the Internet is not resilient.

More seriously, most systems that are designed to accept small-scale
failures (ATMs, telephones, most power grids, even the Internet) rely
critically on the Law Of Large Numbers - and will fail catastrophically
when that fails. And it can, where a vast number of local failures
are trigged by a common cause.


Regards,
Nick Maclaren.
.



Relevant Pages

  • Re: msnbc/oberg: Murphys Law rules outer space
    ... > different spacecraft, different sub-systems, both ... The reason behind the failures lies not ... >> important than budget and design. ... you're going to have real trouble getting the reliability up. ...
    (sci.space.policy)
  • Re: Lack of bit field instructions in x86 instruction set because of patents ?
    ... systems do not provide genuine reliability, ... the design in such a way that you haven't introduced new error modes. ... designed with the assumption that failures would be common -- which gave ... In practice failures are not as common as expected, ...
    (comp.arch)
  • Re: SpaceX Falcon I Hold-Down Firing Scheduled
    ... > willing to accept some failures. ... > this company who calulated the reliability etc. ... Will they not launch the ... It's the difference between a design calculation and a test measurement. ...
    (sci.space.policy)
  • Re: Lack of bit field instructions in x86 instruction set because of patents ?
    ... started with a naive design and, as each error is discovered, they hack on the code to deal with it -- introducing multiple new errors, so the complexity grows exponentially. ... practice rarely if ever suffer massive failures. ... Compare the Internet to the phone system: the Internet has never been entirely down in its entire history, while the phone system has -- but people think the phone system is more reliable because it only has a system-wide outage once a decade or so, while we experience tiny parts of the Internet being down every day... ... failures does NOT lead to reliability of the whole - the DNS! ...
    (comp.arch)
  • Re: Signal failure between Paddington and Reading Friday pm
    ... design and implementation. ... back up situations can actually reduce the reliability of the overall ... The reasons for power failure are many and varied. ... and has to travel by road (because the trains are stopped and to carry ...
    (uk.railway)