Re: Register-less CPU
- From: MitchAlsup <MitchAlsup@xxxxxxx>
- Date: Tue, 20 May 2008 09:05:40 -0700 (PDT)
On May 19, 12:53 pm, James Harris <james.harri...@xxxxxxxxxxxxxx>
wrote:
On 19 May, 18:26, MitchAlsup <MitchAl...@xxxxxxx> wrote:
On May 18, 1:40 pm, Jeffrey Dutky <jeff.du...@xxxxxxxxx> wrote:
While it may seem like all decoder networks are the same depth (a
layer of inverters and a layer of and gates) the actual logic gates
built in silicon have a property called "fan out" which limits how
many gate inputs a gate output can drive. As the decoder network gets
wider you need to add trees of buffers to increase the fan out, which
slows down the decoding process (or you can build bigger, higher power
gates, but this tend to be slower as well).- Hide quoted text -
In my experience, structures (i.e. register files) with 8 entries can
be accessed in 1/4 cycle, 16-32 (and occasionallly 64) entries can be
accessed in 2/4 cycle at modern CPU speeds, 64-128 can be accessed at
3/4 of a cycle, and those that are larger at 4/4 cycles. Architects
blens these access times into pipelines to hide most of the grubby
details.
Given the 1/4 cycle access to one of 8 could you divide the registers
into eight banks of 8 possibly selected on their lower three bits (so
the first bank would be reg0, reg8, reg16, reg24 etc) then select the
bank also in 1/4 cycle thus providing 2/4 cycle or better access to 64
regs?
But then you have to pay a wire delay and a fanout delay to drive the
index bits into the 8 decoders, and then merge the individual outputs
into the final output, ending up as I stated before.
Secondarily there are three different fan-in-fan-out problems and all
have to be carefully crafted for lowest latency.
The first is driving the index bits into the decoder 2 through 8 is no
additional delay (when the drivers into the decoder array are properly
sized), 16-32 is about gate of logic delay and approximately 1 gate of
wire-delay, 64-256 is 2-3 gates of delay and another 2 gates (min) of
wire delay.
The second is taking the output of the decoder and making it big
enough to drive the select line that accesses the array. 1-8 bits are
free, 16-32 bits are 1 gate of delay, 64-128 bits are 2 gates of delay
and 1 gate of wire delay.
The third is uniquely sensing the selected register bit competing for
the same wire that (2**n)-1 unselected register bits are also
connected. We used to use sense amps, but in deep submicron, these has
become problematic, so we sense with an inverter in 8-16 cells, and
then use strong drive through an or-tree to generate the final readout
result.
One does not add these trees of fan-out buffers to slow things down,
one adds these trees of fan-out buffers in order to speed things up.
If we did not add these buffere, the R*C components of the wires would
make the array even slower. These buffers just end up taking less time
than if we did not use them.
Thanks for all the details. I'm surprised there are wire delays given
the shortness if the connections between components. Are they perhaps
times for the outputs to drive their values against the input
capacitances of depdendent gates (rather than delays of the 'wires'
themselves)?
Some of it (maybe 20%) is due to the driver, but most of it is R*C
with a good deal fo the C part on the gates attached to the wire.
Are the above considerations applicable to more than one technology?
They have held rather true from about 0.25 through 0.12µ, we not quite
so bad between 0.8µ and 0.35µ, and were largely ignored at larger
scales. This is a direct consequence of wires getting worse and
transistors getting more leaky as technology scales. {Note: leaky not
slow}.
Mitch
.
- References:
- Re: Register-less CPU
- From: James Harris
- Re: Register-less CPU
- From: MitchAlsup
- Re: Register-less CPU
- From: Jeffrey Dutky
- Re: Register-less CPU
- From: James Harris
- Re: Register-less CPU
- From: Jeffrey Dutky
- Re: Register-less CPU
- From: MitchAlsup
- Re: Register-less CPU
- From: James Harris
- Re: Register-less CPU
- Prev by Date: Re: Register-less CPU
- Next by Date: Re: Register-less CPU
- Previous by thread: Re: Register-less CPU
- Next by thread: Re: Register-less CPU
- Index(es):
Relevant Pages
|
|