Re: [9fans] GCC/G++: some stress testing
- From: plalonde@xxxxxxxxx (Paul Lalonde)
- Date: Sun, 2 Mar 2008 20:36:02 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
CSP doesn't scale very well to hundreds of simultaneously executing threads (my claim, not, as far as I've found yet, anyone else's). It is very well suited to a small number of threads that need to communicate, and as a model of concurrency for tasks with few points of contact. For performance, the channel locks become a bottleneck as the number of cores scale up. As far as expressiveness, there are still issues with composability and correctness as the number of threads interacting increases. Yes, you at least get local stacks, but the work seems to get exponentially harder as the number of systems in the simulation (um, game engine) increases.
By shared cache I mean any number of caches that are kept coherent at the hardware level without serializing instructions.
Programming the memory hierarchy is really a specific instance of programming for masking latency. This goes well beyond inserting prefetches in an optimization stage, presenting itself as problem decompositions that keep the current working set in cache (at L3, L2, or L1 granularity, depending), while simultaneously avoiding having multiple processors chewing on the same data (which leads to vast amounts of cache synchronization bus traffic). Successful algorithms in this space work on small bundles of data that either get flushed back to memory uncached (to keep more cache for streaming in), or in small bundles that can be passed from compute kernel to compute kernel cheaply. Having language structures to help with these decompositions and caching decisions is a great help - that's one of the reasons why functional programming keeps rearing its head in this space. Without aliasing and global (serializing) state it's much easier to analyze the program and chose how to break up the computation into kernels that can be streamed, pipelined, or otherwise separated to allow better cache utilization and parallelism.
Currently, the best performing programs I know for exploiting the memory hierarchy are C and C++ programs written in a "kernel composition" kind of model that the language supports poorly. You can do it, but it feels more like coding in assembly than like expressing your algorithms. Much of the template metaprogramming is about taking measures of cache spaces and data sets and turning out code (statically) tuned to those sizes. There's a huge opportunity for a JIT language to allow better adaptability to changing data sets and numbers of active threads.
Paul
On 2-Mar-08, at 10:59 AM, erik quanstrom wrote:
Almost certainly. And so is C. Programming many-core shared-cache
machines in languages with global state and aliasing is just plain
wrong, in the same way that programming in assembly instead of C is
wrong. Add a highly heterogeneous real-time task mix on top of that,
and you're in for a world of poor cache performance and deadlocks,
which could be avoided by better choices of implementation language.
i don't understand this argument. are you saying that csp doesn't work
in c? or are you saying that csp has caching problems that some other
languages solve?
also, could you define what you mean by "shared cache" a bit more.
would you consider an intel quad core cpu to be a "shared cache"
machine, since the two l2 caches sit on the same fsb?
Programming for the memory hierarchy is way more important than
optimizing CPU clocks anymore (though that winds up still having a
place in some compute kernels). I wish our programming languages
reflected that change in perspective.
what do you mean by "programming for the memory heirarchy"?
- erik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFHyw9XpJeHo/Fbu1wRAgNJAJ9pCFh0kixsaCir2fGKXBZhXTXsDQCfROva
LKnBfk+TaRKNrih36OBexbA=
=Mhdg
-----END PGP SIGNATURE-----
.
- Follow-Ups:
- Re: [9fans] GCC/G++: some stress testing
- From: Roman V. Shaposhnik
- Re: [9fans] GCC/G++: some stress testing
- References:
- Re: [9fans] GCC/G++: some stress testing
- From: erik quanstrom
- Re: [9fans] GCC/G++: some stress testing
- Prev by Date: Re: [9fans] Another new user question
- Next by Date: Re: [9fans] GCC/G++: some stress testing
- Previous by thread: Re: [9fans] GCC/G++: some stress testing
- Next by thread: Re: [9fans] GCC/G++: some stress testing
- Index(es):
Relevant Pages
|
|