Re: The coming death of all RISC chips.



In article
<2bd56b7f-c703-4917-9199-2d09c54544a9@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Jacko <jackokring@xxxxxxxxx> wrote:

Hi

Lets suppose for the sake of argument you may be right. Then there
will be no alternate thinking.

I am one of those few people that dont mind being wrong, because I will
learn something in the process. But I definitely did not expect to be
proved right so quick. ARM is producing lots of Cortex designs that are
Thumb2 only, and dumping their own old ARM instruction set.
The low end Cortex is only 30,000 transistors so the cost of a 16+16
variable instruction set cannot be much. ARM is aiming at the 8 bit
chips, and scales up to PC levels.

http://www.arm.com/products/CPUs/archi-thumb2.html

I do not dispute the want for higher floating.fixed point performance
in some application markets. But adding multiplexers and limiting the
number of execution units, is counter productive.

The FPU market is one of the few that needs more flops and has the
dollars to buy them. These people are moving to programmable graphics
chips where they can use the 800 vector processors on say a ATI 700
series.

If you have half the number of registers with two execution units
performance could be twice as good, considering latency of instruction
fetch is small compared to execution latency. Cross coupling these
execution units would extract a minor speed penality. Unless alternate
register windows opened onto each set in alternate cycles (1 clock
delay for other set).

The last DEC Alpha did this to win the clock rate war for a year or two,
unfortunately this chip was slower than the previous chip at getting
work done.

Now assuming each result calculated is place in the head of the
register que, then no target register need be specified, but the
compilier would have to be smarter. This shift register queue model
could be effective.

Sounds like the four stack processor described here last year(s), found
his web site to refresh my memory. For simplicity of implementation he
went with a 64 bit VLIW with lots of NOP subfields when a stack is not
used that cycle.
The 32 bit RISC chips went with fixed width width instructions for the
same reasons, but its not 1980 anymore, anyone can design a RISC chip
today. Its time to actually optimize the instruction path and get back
that 5% performance you lost by going VLIW.
Actually ARM is claiming 31% smaller code, and 38% faster code. Of
course the marketing department rigged those numbers. ;)

In fact if more than one queue per execution unit was maintained, then
processing could be distributed between register cells, and execution
bypass (alternates) could be selected from operations down the queue
chain.

The concept of injecting NaN opcodes and its implication for exception
processing here is not beyond imagination.

The instruction format itself , beyond the constraints of containment
of operator set, is a secondary issue to the flow control strategy for
achiving maximal thruput and acceptable latency.

Everyone seems to have to same flow control strategy, a three cycle
pipeline for the low end cheap/simple/small/slow market, a five cycle
design for the mid/high/fast market with dual instruction execution, and
longer pipelines with wider execution for PC class designs.

The sweeping statement of instruction format standardization to 16 bit
is premature.

In a mere half decade or so you will no longer be able to buy a new chip
with the old ARM instruction set. MIPS will do the same if they exist.
When IBM announces their variable width instruction set the fat lady
will have sung.

cheers jacko

Salutations, Brett
.



Relevant Pages

  • Re: problem using FILE pointer
    ... The instruction set ... IMO the ALU width defines the chip, but I won't debate that here. ... family (now as ColdFires) is still a major target that is actively ... Coldfire is only partly 68K compatible - it has the same instruction ...
    (comp.arch.embedded)
  • Re: Structured Programming using Forth
    ... so radically different as that just to program a chip. ... Or if you can execute out of each processor's RAM, ... port a micro-next loop repeats the same instruction word without ... In the case of a port another processor has to write ...
    (comp.lang.forth)
  • Re: [RFC][PATCH] x86: make text_poke() atomic
    ... Unexpected Instruction Execution Results ... But given int3 ... IPI to _each_ CPU to make sure they issue a synchronizing instruction ...
    (Linux-Kernel)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... and others) support speculative execution and out of order execution ... Kindly explain how you get past the previous instruction to begin ...
    (sci.electronics.design)
  • Re: Opteron versus P4
    ... Athlon has a three-way fully pipelined FPU. ... micro-benchmarks with the x87 instruction set, ... down its FP execution units at a rate of one per clock. ... the Athlon can achieve twice the execute ...
    (borland.public.delphi.language.basm)

Loading