Re: Cluster computing drawbacks



In article <dcbf12$m3i$1@xxxxxxxxxxxx>, Randy <joe@xxxxxxxxxxxxxxx> wrote:
>
>A good example lies with data mining. Non cache-coherent SMPs are
>trivial to program for such tasks. Also Monte Carlo sims. Also any
>other embarrassingly parallel task (at which clusters shine). It's only
>the tasks at which clusters suck that it'll be hard to program
>noncoherent SMPs. Effectively, that should put them at parity with
>clusters. Except that you *can* program them with less explicit memory
>movement primitives, or at least without handshakes.
>
>In the case of MPICH2's asynchronous memory movement primitives,
>noncoherent SMPs may well outshine all comers. And they scale just as
>well as clusters...

Yes. But that sort of code is a right b*gger to debug.

>Speeding up parallel programs on CC-SMPs is *entirely* about managing
>cache line locality (and access interference).

Er, not quite. But I agree that is a very high proportion of the task,
and especially the nastiest bits.

>Yup. But scaling to hundreds of processes is not the only or even the
>primary measure of success in HPC. Doing more science per unit of time
>is. If the tradeoff between scalability, system cost, programmer time,
>dusty deck reuse, and shortening the programmer learning curve changes
>as system architectures evolve, then the prepared mind is going to RUN
>the hell away from MPI. ASAP. IMHO.

True. And that is the main reason that so many users have run away
from OpenMP back to MPI :-)

>I'm just saying that there are other fish to fry, and it's possible that
>the time to explore alternative HPC programming models may be upon us.

Agreed. BSP. Dataflow. Something even more radical :-)

>SMP has been a much abused term for a long term. NUMA vs CC-NUMA
>illustrates a comparable historical hiccough, since most folks assume
>NUMA to imply cache coherence, which it does not.

Symmetric Multi-Processing, anyone?

>Better yet, let's reexamine the programming languages while we're at it.
>C/C++/Fortran/HPF suck almost as much as MPI.

The parallel versions (including HPF) considerably more so. MPI is
actually quite a good standard, has relatively few ambiguities,
allows efficient and portable code, and enables practical debugging.
Yes, it is very low level. Sad.

>Unlike some who can't abide the notion of parallel programming without
>using MPI, I'm intrigued by the prospects of newer alternatives which
>probably *can* be explored by slapping a few shmgets/shmputs onto
>equivalent examples of a 1) serial C/Fortran code, 2) CC-SMP code, and
>3) MPI code to see how the implementations compare. I'd love to get a
>feel for the effort needed to A) compose such programs from scratch and
>B) evolve a dusty deck serial program to 2 and 3. I suspect there's
>money in them thar hills.

I have dabbled with that. Stick to MPI, my lad ....

It is dead easy to convert a clean but dusty deck to OpenMP. Oh,
you want it to run FASTER than the serial version? How very
unreasonable of you.

>After we've shown competitive performance potential, it seems like
>adding a smart compiler to the mix would be a natural progression,
>perhaps delving the data transparency I implied earlier.

God help me, NO!!! This has been tried and failed more times than
I care to think. The first requirement is a language that is
designed for parallelisation - Fortran is dire, C++ is indescribably
worse, and the English language contains no curses foul enough to
describe how C interacts with this.

>This must have been done once upon time in the days of T3E, and probably
>before. Probably it was, but since everything in CS has to be
>reinvented every decade anyway, maybe it's time to revisit the cost
>model of non-cache-coherent shared-memory programming.

That is getting back to sanity, in the sense that our world model is
now a stack of turtles rather than in something indescribably less
structured.

The opportunity that is being missed is incoherent SMP as a system
model - i.e. not as an application model. This would be a very
good basis for implementing a shared file cache, message passing
(MPI and SHMEM, if you must), efficient FIFOs between CPUs and so
on. It could even be used by consenting adults in private, but I
really don't want to have to explain to the average kiddy how to
use it.


Regards,
Nick Maclaren.
.



Relevant Pages

  • Re: Cluster computing drawbacks
    ... |>> Non cache-coherent SMPs ... |>>You're suggestion is as good as mine. ... People like MPI because it works, ... pretty well every sane choice takes me further away from HPF ...
    (comp.arch)
  • Re: Cluster computing drawbacks
    ... clusters are inferior to shared memory multiprocessors because they usually provide lower interconnect bandwidth and always provide *much* higher communication latency. ... almost all supercomputers ever built are clusters. ... The only exceptions might be the vector Crays, NEC, Fujitsu, and SMPs. ... Without the MPI overhead, the Altix's memory latency should be *significantly* less than 993 ns, by at least 2X. ...
    (comp.arch)
  • Re: Cluster computing drawbacks
    ... BTW, SGI's numbers are all optimistic and depend on the fact that only one CPU is moving a line, not all the cpus in the system. ... But transparent access to data among multiple processes is the raison d'etre of SMPs, and as long as the system will fully support 5% of the processes doing this, SMPs have significant programmatic advantages over clusters. ... I'd argue that cache coherence really isn't necessary for most folks skilled in programming shared memory for HPC. ...
    (comp.arch)
  • Re: Cluster computing drawbacks
    ... >>>which means that they all communicate at the same time. ... to the advantage of clusters and detriment of SMPs. ... You know, Greg, I suspect you piss off a lot of potential customers ...
    (comp.arch)
  • Re: Cluster computing drawbacks
    ... > clusters are inferior to shared memory multiprocessors because ... because the cluster's hardware and your programming language do ... > a lot harder to program in parallel than SMPs. ... > a much better job of making parallel programming easier. ...
    (comp.arch)