Code density and performance?



For a new ISA intended to be suitable for markets from
higher-performance embedded systems to server/workstation-class
systems (implying superscalar implementations and a 64-bit
architecture among other things), would it be worthwhile seeking
code density greater than a relatively 'pure' RISC like MIPS or
Alpha? Since all implementations would have an on-chip L2 cache,
it seems likely that code density would have only a moderate
impact on performance for common codes. OTOH, all else being
equal (which it would not) expending somewhat significant effort
for even a modest _architected_ performance gain seems worthwhile
in that the benefit would apply to all implementations.

Heidi Pan's "High Performance, Variable-Length Instruction
Encodings" implies that a 25% reduction in code size relative to
MIPS should be possible without making a superscalar
implementation excessively difficult. A more constrained
instruction encoding (with simpler decoding) might lose half of
that advantage, so the question might be: is a 10-15% reduction in
code size worth a signficant effort in architecture design and a
moderate additional effort in implementations? (Even before
reading that paper I had been thinking about a 64-bit instruction
word ISA that would encode more than two instructions on average
by exploiting unused bits in common RISC encodings with modestly
more variability in encodings but fixed locations for register
identifiers and parts of opcodes. The ISA might also have
somewhat more complex instructions than a pure RISC--e.g.,
load/store register pair--and highly implicit encodings for some
operations--e.g., a very short form for subroutine return with
the link register being implicit. Unlike a traditional LIW
architecture, it would be possible to jump into the middle of a
word; the entire instruction word would still be fetched but the
first operation would be converted to a no-op.)

In addition to decode complexity constraints, there is also the
question of whether such places an excessive burden on compiler
developers (at least in terms of generating dense code).
Unfortunately the standard methods for achieving greater code
density tend to increase hardware complexity, compiler
complexity, or both. (A smaller register count can make
CSE-optimizations more difficult to evaluate as well as generally
making optimized register allocation more complex. Implicit
arguments make instruction decoding more complex. Special
purpose registers [even a division as general as address and
integer] makes register allocation more complex. Modal
extensions of opcodes adds complexity to both hardware and
software. Variable length encodings increase hardware
complexity. Complex instructions add both hardware and software
complexity.)

(Since a new ISA in this target market has minimal chance of
being implemented [especially an ISA developed by a mere
technophile], this is a highly theoretical question.)

Paul A. Clayton
Not speaking for Big Sam's Anatomical Warehouse
http://home.earthlink.net/~paaronclayton/Big_Sams

.



Relevant Pages

  • Re: Fazing out x86
    ... important are fighting on old core. ... we have simple ISA, we can have simpler core, many of them, why not 2 ... x86 core and 4 new core and BIGGER caches!? ... but at the expense of complexity. ...
    (comp.arch)
  • Re: ISA Server or Firewall Appliance?
    ... ISA is a very flexible piece of software, ... Complexity is not ... in regards to an internal firewall). ... internal network is NOT a good idea, ...
    (Focus-Microsoft)
  • Re: Compilation Was: Is DEFCONSTANT broken?
    ... added complexity itself to duplicate effort in order to perform branch ... partially is due to tradeoffs between code complexity and instruction ... we'd keep programming in assembler. ... code from text or in-memory lists takes more work than running machine ...
    (comp.lang.lisp)
  • Re: ptrace single-stepping change breaks Wine
    ... the instruction will set TF, so we should not touch it afterwards. ... > go back to the old state, revert all the recent ptrace changes ... The _only_ real complexity is actually following the silly LDT ... "prefetch" check does exactly the same thing except it seems to get a few ...
    (Linux-Kernel)
  • Instruction Scheduling Complexity
    ... In the paper "Optimal Instruction Scheduling Using Integer ... "The complexity of local instruction scheduling can depend on the ... latency is defined to be the difference ...
    (comp.compilers)