Re: Superstitious learning in Computer Architecture



John and Andrew,

Even without going to logarithmic ALUs or wafer scale integration, there
is STILL an easy order of magnitude left to be collected by abandoning
the scalar-only architecture of the Pentium.

That was abandoned years ago. All modern CPUs do vector arithmetic.

Oh, _that_ kind of vector arithmetic. AN/FSQ-7, TX-2, AN/FSQ-32 style
vector arithmetic. I don't think he was talking about *that* kind of
vector arithmetic.

MMX or AltiVec, that sort of thing is just a step above scalars.

No, no, he means *real* vector arithmetic.

**YES**

Where you have one instruction, and it plows through three arrays in
memory... doing one floating-point multiply per cycle in the pipelines
for just about as long as you want.

With the Cyber 200 series computers, it could be up to 4 arrays in and one out:
1. Two arrays whose elements are being multiplied or divided.
2. An array whose elements are being added to the product of step #1
3. A bit vector array, where the results whose bits are zero are discarded (not stored). This is needed to make IF statements in vectorized loops work without having to actually code a loop.
4. The output array.

If you have an architecture that can pull *this* off without the
vectors having to be in the cache, you're talking about stuff like the
SX-6 from NEC. And its CPU is said not to require more transistors than
a modern Pentium.

My point exactly - by the time you blow the millions of transistors to build anticipatory execution and cache, you would be MUCH better off (by about an order of magnitude in speed) to change architectures to something like the Cyber 200 or the NEC SX-6.

Of course, you need to spend a lot on memory for one of those chips...
2,048-way interleaving means you can't just put in *one* memory stick;
let's see, now, 1,024 memory sticks at about $50 a pop... no wonder a
single-CPU SX-6r costs $180,000 since the memory is probably about half
of that!

About the price of the 1954 vintage IBM-650 mentioned in my article, with its 2K words of rotating drum memory.

And, given inflation, that's no more expensive than the original PDP-8!

I think that with a little work, they can make this more reasonable.
After all, there was a style of memory module that only had 16 data
lines, but yet kept up with conventional ones with 64 data lines... and
current conventional memory modules at least do two-way interleaving
these days.

The trick of course is to do everything on the surface of a single wafer, where pins are essentially free - thousands of pin-equivalents would be absolutely no problem. Depending on the diameter and the number of processing steps, wafers now cost ~$1000 each if you don't bother chopping them up into little chips, mounting the chips, testing everything, labeling the chips, etc., etc.

With adequate funding, the first wafers could be out in a year or so.

I envision slightly thicker laptops with flat metal bottoms, designed to be used on flat metal surfaces that can dissipate lots of heat. These would also work without the heat dissipation, but only at a very low duty cycle of maximum speed, which should be OK for most casual use.

Steve Richfie1d
.



Relevant Pages

  • Re: Superstitious learning in Computer Architecture
    ... the scalar-only architecture of the Pentium. ... you need to spend a lot on memory for one of those chips... ... current conventional memory modules at least do two-way interleaving ...
    (comp.arch.arithmetic)
  • Re: Fast string operations
    ... Looping: I thought looping over arrays in managed code was "slow" ... array handling and such. ... The problem with TrimHelper is that it always returns a new string instance. ... The customer perceives this as a memory leak. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: High Memory Consumption of Classes and Arrays
    ... Only the array itself has overhead. ... memory as a reference type. ... > least consume 40 bytes of memory. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: Fast linked list
    ... > amounts of memory rather than huge chunks of it. ... random insertions into a vector/dynamic array are not as slow ... to cause a cache miss. ... some hard numbers on speed differences between lists and arrays. ...
    (microsoft.public.vc.mfc)
  • Re: Fast linked list
    ... > amounts of memory rather than huge chunks of it. ... random insertions into a vector/dynamic array are not as slow ... to cause a cache miss. ... some hard numbers on speed differences between lists and arrays. ...
    (microsoft.public.vc.language)