Re: Code density and performance?
- From: Eliot Miranda <eliotm@xxxxxxxxxxx>
- Date: Fri, 15 Jul 2005 17:35:06 GMT
Nick Maclaren wrote:
In article <1121333201.729594.240150@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
"jon@xxxxxxxxxxxx" <jon@xxxxxxxxxxxx> writes:
|> > Heidi Pan's "High Performance, Variable-Length Instruction
|> > Encodings" implies that a 25% reduction in code size relative to
|> > MIPS should be possible without making a superscalar
|> > implementation excessively difficult.
|> |> You can get a 40% reduction with just a mix of 16 and 32-bit
|> instructions, without harming performance or making superscalar
|> execution more difficult than it otherwise would be. And yes, for
|> embedded apps, 40% reduction in code size is well worth it, that's why
|> we have MIPS-16, Thumb, ARCcompact etc.
In the past, I have estimated that you could get a 50-75% reduction by a two-level ISA, where the top level was designed for the high level language. Designing ISAs for ease of code generation was first proposed in the late 1960s, as far as I know, but was never done in a mainstream system, and hasn't been attempted at all in recent decades.
Debaere & Campenhout's Interpretation and Instruction Path Coprocessing
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=5164
is a fine variation on this theme and was done in the early 90's. Their idea was to speed-up threaded code by tacking a "decode ROM" onto the instruction-fetch logic of a 680x0 processor. The decode ROM contained the native instruction sequences for each threaded code. Threaded code was fetched by external logic, looked up in the decode rom, and the corresponding native instructions fed to the processor.
Using a predesigned processor and tacking on some support logic for a higher-level language makes good economic sense. One saves much of the design effort of a ground-up design, and can hitch a ride on a commodity processor whose implementations follow Moore's law instead of being left behind by a small market niche as were Lisp machines.
I suspect this would have been a better way to implement e.g. picojava.
Its also reminiscent of what one could do with configurable processors like Tensilica, where you have a predefined RISC core (which incidentally has two instruction sets, either 16-bit or 24-bit) and the ability to define additional logic to interface to the core using tools that compile C definitions of the logic.
[I don't know how Alpha PALcode is implemented, but it is perhaps a similar idea]
In order to do this, you need a microcode approach, possibly with a programmable microcode. In the heyday of the RISC dogma, this was stated to be incompatible with performance, but the Pentium 4 and Opteron have shown that that is now false, if it ever were true (which is very doubtful).
The above examples show you don't need to use microcode at all. You simply need to provide ways of interfacing additional logic to an existing core. Using FPGA technology one can imagine e.g. adding associative memories used to speed-up specific language features such as oo method lookup.
[and of course whether these kinds of approaches can beat compilation technology is a separate question. Because compilation technology looks at much more of the problem than a single bytecode it typically nets much greater performance benefit.]
-- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd
.
- Follow-Ups:
- Re: Code density and performance?
- From: Nick Maclaren
- Re: Code density and performance?
- References:
- Code density and performance?
- From: Dysthymicdolt
- Re: Code density and performance?
- From: jon@xxxxxxxxxxxx
- Re: Code density and performance?
- From: Nick Maclaren
- Code density and performance?
- Prev by Date: Re: simd for 390(or z990)?
- Next by Date: Re: Code density and performance?
- Previous by thread: Re: Code density and performance?
- Next by thread: Re: Code density and performance?
- Index(es):
Relevant Pages
|