Re: Cluster computing drawbacks
- From: nmm1@xxxxxxxxxxxxx (Nick Maclaren)
- Date: 28 Jul 2005 21:21:39 GMT
In article <dcbf12$m3i$1@xxxxxxxxxxxx>, Randy <joe@xxxxxxxxxxxxxxx> wrote:
>
>A good example lies with data mining. Non cache-coherent SMPs are
>trivial to program for such tasks. Also Monte Carlo sims. Also any
>other embarrassingly parallel task (at which clusters shine). It's only
>the tasks at which clusters suck that it'll be hard to program
>noncoherent SMPs. Effectively, that should put them at parity with
>clusters. Except that you *can* program them with less explicit memory
>movement primitives, or at least without handshakes.
>
>In the case of MPICH2's asynchronous memory movement primitives,
>noncoherent SMPs may well outshine all comers. And they scale just as
>well as clusters...
Yes. But that sort of code is a right b*gger to debug.
>Speeding up parallel programs on CC-SMPs is *entirely* about managing
>cache line locality (and access interference).
Er, not quite. But I agree that is a very high proportion of the task,
and especially the nastiest bits.
>Yup. But scaling to hundreds of processes is not the only or even the
>primary measure of success in HPC. Doing more science per unit of time
>is. If the tradeoff between scalability, system cost, programmer time,
>dusty deck reuse, and shortening the programmer learning curve changes
>as system architectures evolve, then the prepared mind is going to RUN
>the hell away from MPI. ASAP. IMHO.
True. And that is the main reason that so many users have run away
from OpenMP back to MPI :-)
>I'm just saying that there are other fish to fry, and it's possible that
>the time to explore alternative HPC programming models may be upon us.
Agreed. BSP. Dataflow. Something even more radical :-)
>SMP has been a much abused term for a long term. NUMA vs CC-NUMA
>illustrates a comparable historical hiccough, since most folks assume
>NUMA to imply cache coherence, which it does not.
Symmetric Multi-Processing, anyone?
>Better yet, let's reexamine the programming languages while we're at it.
>C/C++/Fortran/HPF suck almost as much as MPI.
The parallel versions (including HPF) considerably more so. MPI is
actually quite a good standard, has relatively few ambiguities,
allows efficient and portable code, and enables practical debugging.
Yes, it is very low level. Sad.
>Unlike some who can't abide the notion of parallel programming without
>using MPI, I'm intrigued by the prospects of newer alternatives which
>probably *can* be explored by slapping a few shmgets/shmputs onto
>equivalent examples of a 1) serial C/Fortran code, 2) CC-SMP code, and
>3) MPI code to see how the implementations compare. I'd love to get a
>feel for the effort needed to A) compose such programs from scratch and
>B) evolve a dusty deck serial program to 2 and 3. I suspect there's
>money in them thar hills.
I have dabbled with that. Stick to MPI, my lad ....
It is dead easy to convert a clean but dusty deck to OpenMP. Oh,
you want it to run FASTER than the serial version? How very
unreasonable of you.
>After we've shown competitive performance potential, it seems like
>adding a smart compiler to the mix would be a natural progression,
>perhaps delving the data transparency I implied earlier.
God help me, NO!!! This has been tried and failed more times than
I care to think. The first requirement is a language that is
designed for parallelisation - Fortran is dire, C++ is indescribably
worse, and the English language contains no curses foul enough to
describe how C interacts with this.
>This must have been done once upon time in the days of T3E, and probably
>before. Probably it was, but since everything in CS has to be
>reinvented every decade anyway, maybe it's time to revisit the cost
>model of non-cache-coherent shared-memory programming.
That is getting back to sanity, in the sense that our world model is
now a stack of turtles rather than in something indescribably less
structured.
The opportunity that is being missed is incoherent SMP as a system
model - i.e. not as an application model. This would be a very
good basis for implementing a shared file cache, message passing
(MPI and SHMEM, if you must), efficient FIFOs between CPUs and so
on. It could even be used by consenting adults in private, but I
really don't want to have to explain to the average kiddy how to
use it.
Regards,
Nick Maclaren.
.
- References:
- Cluster computing drawbacks
- From: Emidio S.
- Re: Cluster computing drawbacks
- From: Randy
- Re: Cluster computing drawbacks
- From: Greg Lindahl
- Re: Cluster computing drawbacks
- From: Randy
- Cluster computing drawbacks
- Prev by Date: Re: Cluster computing drawbacks
- Next by Date: Re: Code density and performance?
- Previous by thread: Re: Cluster computing drawbacks
- Next by thread: Re: Cluster computing drawbacks
- Index(es):
Relevant Pages
|