Re: Bulldozer details + bobcat




"Brett Davis" <ggtgp@xxxxxxxxx> wrote in message
news:ggtgp-191C1C.01014603122009@xxxxxxxxxxxxxxxxxxxxxx
In article <4B01BAD2.6080002@xxxxxxxxxxxxxxx>,
"Andy \"Krazy\" Glew" <ag-news@xxxxxxxxxxxxxxx> wrote:

Andy "Krazy" Glew wrote:
Brett Davis wrote:
For SpMT I looked at Alpha style paired integer pipelines ...

You haven't said enough about the physical layout to talk about
those
clustering effects.

The physical layout matters a lot, and hence has its own
terminology.

8-wide on a bit interleaved datapath is pushing the envelopew.


8-wide as 2 4 wide bit interleaved datapaths is not so bad,
although you
will pay a lot of area for the wire turns to get to side by side
datapaths. S I might call this an 8-wide cluster composed of two
adjacent 4-wide bit-interleaved clusters. With whatever bypassing
you
say.

As I mentioned earlier, there is a sweet trick you can play with
datapaths that are paired and opposite, reflections. You bit
interleave
the input and output wires within the, say, 4-wide cluster, and
then you
bit interleave the outputs (but not the inputs). Trouble with this
trick is that it takes one of the ends of the datapath away - where
do
you put the scheduler? The data cache?
So, I call this "opposed" clustering.

I am afraid I need a Transistor 101 refresher:

The speed of light in copper, or aluminum, in a modern fab process,
and the resulting mm covered at 3GHz, and translated into the number
of wires you can cross at a right angle, distance wise.

The time a transistor takes to switch, and the resulting number of
transistors you can jump through in 3GHz for a CPU stage.

The number of active inputs you can drive with one transistors
output.
The number of inactive inputs you can drive with one transistors
output.
(Does a hard limit of only two active listeners out of eight help.)


I do not really like the Alpha 4+4 and two cycle delay to cross,
actually I hate it. I can live with 0 cycles to two closet ALUs,
with
a one cycle delay for every second crossed. With 64 bit registers
that
means crossing 127 wires. You have like 15 wire layers, so there is
a
desire to stack the 2in1out wires to reduce the distance covered by
the bypass wires if you do not interleave. (Ignoring the big
economic
incentive to use big previous gen patterning for the upper level
wires.)

How many layers are 45nm on a 45nm CPU process, verses 65nm layers,
and 90nm? Most die slices I have seen tend to march up to low res
fast.

I am willing to skin this old cat in new ways to make a 8 way design
work. RRAM is going to force many of the upper layers to high res
anyway, so I will be catching a wave that enables high density wire
stacking where it is needed. Lots of breakthroughs and patents to be
had in this area...

Brett

In CMOS, the transistors don't really have a switching time. Rather
there is delay associated with charging or discharging the gate
(transistor gate, not logic gate) between the on voltage and the off
voltage (usually VDD and Gnd respectively). Since both gate
capacitance and current capability of a transistor are proportional to
width, this leads to the concept of gate limited delay, or the minimum
delay of a gate with no interconnect driving an identical gate, and
driven by an identical gate.

To the gate delay one must add the effects of the interconnect and fan
out.

So far as I know, most of the layers in a 45 nm process are 45 nm
although typically the last few metal layers are thicker and coarser
for lower resistance power distribution.

Unless things have changed recently I don't think anybody is doing 15
levels of interconnect.

del


.



Relevant Pages

  • Re: Bulldozer details + bobcat
    ... I call this "opposed" clustering. ... of wires you can cross at a right angle, ... The number of active inputs you can drive with one transistors output. ... The number of inactive inputs you can drive with one transistors output. ...
    (comp.arch)
  • Re: Why OR gates?
    ... that any of these will drive the output of the OR gate, ... connected wires are high at the same time. ... Since you are using TTL, and since TTL usually has OC outputs (but do check ... wired-ORs, though it isn't a good idea if the gates being OR'd are ...
    (sci.electronics.design)
  • Re: Picking N-th ready element (e.g. in an OOO scheduler)
    ... The dual wires just make the metal routing between the drains ... longer and so with higher resistance. ... logical gate factor.. ... I called it a potential divider, ...
    (comp.arch)
  • Re: Ballistic chronograph using a Spartan 3E starter board
    ... I'm trying to create a "chrono" to measure the muzzle velocity of ... between gate changes and then do the division to arrive at the ... stable on both high and low signals from the photosensor. ... You mentioned long wires. ...
    (comp.arch.fpga)
  • Re: Why OR gates?
    ... silly question from a non-EE that probably has a ... If I have several TTL logic outputs that I want to "OR" together, ... that any of these will drive the output of the OR gate, ... connected wires are high at the same time. ...
    (sci.electronics.design)