OoO VAX (was: Code density and performance?)



"John Mashey" <old_systems_guy@xxxxxxxxx> writes:
>I'm surprised any long-time participant in comp.arch would ask, since
>this topic has been discussed here numerous times, including the topic
>of architectural issues that made the
> (elegant) VAX and (fairly elegant 68020)
> more difficult to do cost-effective fast implementations for than
>the
>(simpler) 68010 and the (inelegant X86).

However, IIRC those discussions were in the context of relatively
simple pipelined implementations (like R3000 or 486), not in the
context of out-of-order implementations (like R10000, P6, K7).

IIRC some of the issues you mentioned wrt the VAX were:

- Instructions can update multiple registers (and memory), and also
have multiple opportunities for trapping, making it hard to implement
precise interrupts. In an OoO engine I see no big problem here: the
instruction and its results wait in the retirement stage until all
results are in and the instruction is guaranteed not to trap; then it
is commited.

- Instructions can access many different memory locations, potentially
requiring many pages in memory, and maybe many TLB entries in the TLB.
This may be a nightmare for the OS writers, but I don't see that it is
a problem for the hardware.

What architectural feature of the VAX would be a problem in the
context of current implementation techniques?

>But, the bottom line is that the engineers who *knew* the VAX best, and
>who implemented it many time, and many of whom were *really, really
>good* engineers, came to believe that they simply could not keep
>implementing competitive CPUs that were VAX ISA. Some of them were
>already starting to think that in the mid-1980s, but lots more thought
>so a few years later, and so did certain DEC sales managers, who knew
>that if the cutomer wanted VMS, they won, but if the customer wanted
>some UNIX, they just couldn't compete.

Sure, in 1990-1995 probably most people (including me) believed that
RISCs gave a significant performance advantage over VAX and over the
386 architecture, too (and at the time, they did). So DEC went to
Alpha, and Intel and HP started the IA-64 effort. But that was before
OoO implementations of the 386 architecture (the P6, which was
released in late 1995).

>In the *real world* of (very competent) designers who made their living
>doing VAXen, they just couldn't figure out how to keep doing it
>competitively.

Sure, in 1990 they couldn't; IIRC OoO execution with precise
exceptions was just research papers at the time. That does not mean
that a few years years later they still could not, if they had tried.

>Anyone who has
>the opinion that it was reasonable to be designing new VAXen in 1996,
>expecting them to be competitive, has to believe these guys are
>clueless idiots.

It was not reasonable, because the decision foer the Alpha and against
the VAX had been taken several years earlier. That does not mean that
it was impossible to design a technically competetive in 1996 and it
does not say anything about the clue and intelligence of these people.

>NOTE: that doesn't mean that I am claiming "so, they had to do Alpha,
>and do it the way they did it", as they were other options.

Your are thinking about MIPS CPUs, right? Do you think that they
would have had fared better if DEC had continued to use them?

>> Looking at what Intel and AMD did with the 386 architecture, I am
>> convinced that it is technically possible to design competetive VAXen;
>> I don't see any additional challenges that the VAX poses over the 386
>> that cannot be addressed with known techniques; out-of-order execution
>> of micro-instructions with in-order commit seems to solve most of the
>> problems that the VAX poses, and the decoding could be addressed
>> either with pre-decode bits (as used in various 386 implementations),
>> or with a trace cache as in the Pentium 4.
>
>You're entitled to your opinion, which was shared by the VAX9000
>implementors.

And the VAX 9000 was the basis for the NVAX, and according to
<http://research.compaq.com/wrl/DECarchives/DTJ/DTJ700/nv-foreword.txt>:

|From its initial shipment in October 1991 through today (a year later),
|NVAX was (and is) the fastest shipping CISC microprocessor in the world,
|whether measured by clock rate, SPECmarks, or transactions per second.

So apparently even in the time of simple pipelining, the architectural
features of the VAX did not stop it from outperforming the 386
implementation of the time (in the form of the 486), contrary to what
you imply. And I think it only got (relatively) easier to implement
VAXen with OoO implementations.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton@xxxxxxxxxxxxxxxxxxxxxxxxxx Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html
.



Relevant Pages

  • RE: generic strncpy - off-by-one error
    ... > To: Peter Kjellerstedt ... Cannot do that as the first loop modifies count. ... In my later implementations this is not possible any longer. ... Remember that many architectures already have their own architecture ...
    (Linux-Kernel)
  • Re: Singapore Server Rescue
    ... Macro for that architecture. ... programming language was Patch...and due to the elegant design of the VAX ... Bigger disks were not available from Digital! ... The boss finally bought a VAX 8200 that we found ever so much more satisfactory. ...
    (comp.os.vms)
  • Re: Code density and performance?
    ... > Looking at what Intel and AMD did with the 386 architecture, I am> convinced that it is technically possible to design competetive VAXen;> I don't see any additional challenges that the VAX poses over the 386> that cannot be addressed with known techniques; out-of-order execution> of micro-instructions with in-order commit seems to solve most of the> problems that the VAX poses, and the decoding could be addressed> either with pre-decode bits, ... Many important senior VAX implementors disagreed. ... mileage from the same techniques. ... In my view, also bearing in mind the investment by Digital, it would have been far better to implement the VAX IS on a Mips core, much in t5he same manner as IBM has exploited the Power PC architecture. ...
    (comp.arch)
  • Re: Alpha AXP is dead
    ... Among others, a 512 byte page is too small, and the instruction length ... low level, causing code bloat (about 3:1 versus VAX) thus demanding ever bigger caches, demanding bigger bandwidth to feed the monster. ... specific for the architecture ... But consider that IBM doesn't ...
    (comp.sys.dec)
  • Re: Code density and performance?
    ... On the other hand, I think some of the snide comments about the designers are a bit out of place, being akin to Monday-morning quarterbacking 30 years' later. ... The VAX was still better than most other ISAs of the time for UNIX & C. ... They wanted an architecture that would last through changes in technology. ... If the density of memory chips continues to quadruple every three to four years, then the 32-bit address of the VAX architecture will be adequate for at least another decade without requiring programmers to map virtual memory onto a larger ...
    (comp.arch)