Re: Software Optimization Guide for AMD Family 10h Processors




In article <1182037409.884017.141920@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
already5chosen@xxxxxxxxx writes:
|>
|> [O.T.] Last time I looked in SPARC v9 architecture manual there were
|> only three models. And I am 100% sure that all SPARC SMP systems ever
|> shipped by Sun or Fujitsu (don't know about more exotic vendors)
|> adhere to only one model, specifically TSO.

I may have miscounted, but I vaguely recollect that it looks as if there
are three, but one has two variants. Whatever. Call it three.

|> > PowerPC uses a very different
|> > one from the Intel x86, and most other CPUs are different yet again.
|>
|> [O.T.]Yes, x86 memory ordering rules in SMP systems are not very well
|> defined. Nevertheless they are well understood.

I wish :-( Yes, the simple cases are well-understood, but the subtleties
aren't. And, when writing robust or portable, the subtleties matter.

|> You try to make a simple matter complex.
|> All IBM and Unisys should assure in their 32-way boxen is as much
|> cache consistency as in two-way x86 box with shared bus. So they do.

Ah. The difference between theory and practice is less in theory than
it is in practice. I should put a lot more trust in a clear statement
by the relevant vendor than your claim that, because your opinion is
that they should do that, they do.

|> Otherwise how this boxes would off-the-shelf Windows/Linux?

All of the information that I have received from people who have tried
to get high communication, medium-sized SMP applications to work on
those systems is that they don't. So far, I haven't found anyone who
has tracked down why, but the problem is that (a) such machines are
rare, (b) they are often used only for applications that don't stress
the cache coherence and most of all (c) there are DAMN few people with
the skill, obstinacy and time to track down the causes.

|> And BTW I don't understand how interrupts belong here.

Precisely. Drink deep or taste not the Pierian spring.

A large number of operations are completed by interrupt; this includes
many floating-point ones (except perhaps on POWER), but also includes
many memory access ones. Nowadays, TLB misses are almost always handled
'early' (because of physical caches), but ECC obviously can't be. Now,
most architectures rely on their pipeline constraints (e.g. write buffers)
to maintain their SMP invariants, but obviously this doesn't apply if the
pipeline is interrupted.

At a naive level, there is a single pipeline that is stopped, dead,
while the interrupt is handled. Well, that hasn't been true for years
(and wasn't true on all systems even in the 1960s). In theory, the FLIH
is supposed to ensure that invariants are preserved but, in practice, it
often doesn't or even can't do so. So you get a failure of the SMP model
if you get an inconvenient interrupt.

This can also happen for certain high-priority asynchronous interrupts,
too, and can lead to occasional ordering problems "that can't occur".
In particular, on a large SMP, one CPU often needs another to do something
URGENTLY, and so interrupts it at very high priority. I have seen an
impossible memory model failure on an SGI Origin due to that, with
positive identification.

|> You seem to agree that Intel does compete "in the medium to large SMP
|> arena" with Itanium. To your information, Intel-made Itanium chipsets
|> don't extend themselves beyond 8-ways in theory, and 4-ways in
|> practice. Even these chipsets are now mostly obsolete and soon EOLed.
|> All larger Itanium systems are based on HP, Fujitsu, SGI, NEC, Hitachi
|> and Unisys chipsets to none of which Intel has any rights.
|> I don't see how situation with IBM and Unisys chipsets for XeonMP is
|> any different.

You clearly don't know anything about those Itanium agreements. I know
very little, but I can tell you from direct, CONTRACTUAL statements from
several Tier 1 vendors that they are VERY different from the x86 range
ones. In particular, Intel DOES have SOME rights and powers over the
chipsets developed for use with it - I know a little of what they are,
but that was under NDA, so I obviously can't say more.

If you take a look at the online comics' articles of the late 1990s,
you will see references to just such rights. I doubt that they knew any
more than me, but it is confirmation of my statement.


Regards,
Nick Maclaren.
.



Relevant Pages

  • Re: [PATCH] Re: Bad network performance over 2Gbps
    ... The IRQBALANCE option causes interrupts to bounce all around on SMP ... quickly burying the CPU in migration cost and cache misses. ... I agree that having a full-featured userspace balancer daemon with lots of ...
    (Linux-Kernel)
  • Re: What happened to computer architecture (and comp.arch?)
    ... regarding l2 cache hit rate. ... "811" addressed both I/O interrupts trashing cache hit ratio as ... software when kernel was involved in interrupts & redrive. ... I had done an internal version of SMP ...
    (comp.arch)
  • Re: How to handle interrupts on SMP systems
    ... > We developed a driver for PCIHOTLINK card under linux kernel 2.4.18-3. ... > SMP system. ... But it is not generating any interrupts. ... degree of optimization that the compiler performs. ...
    (Linux-Kernel)
  • Re: Interrupts being reported twice?
    ... > Back to the current config: ... > here saying it doesn't make any sense to do so in the SMP case, ... > something that really doesn't consume any CPU time? ... and the CPU is actually receiving interrupts from ...
    (freebsd-current)
  • Re: CE6 real-time problem
    ... Can you tell us more about the chipsets and other bridges that you use? ... interrupt is most of the times acceptable and around 20 microsec. ... Our chipset has an Intel LPC Interface Bridge device, ... these operations are being performed the interrupts seem to be ...
    (microsoft.public.windowsce.platbuilder)