Re: Intel publishes Larrabee paper



In article <ggtgp-4EC5AC.17084105042009@xxxxxxxxxxxxxxxxxxx>,
Brett Davis <ggtgp@xxxxxxxxx> wrote:

In article <Vr6dnRcUBPijNUXUnZ2dnUVZ_sLinZ2d@xxxxxxxxxxxx>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:

Brett Davis wrote:
In article
<baa19365-e7f8-4c8a-bbca-319671d990dc@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
MitchAlsup <MitchAlsup@xxxxxxx> wrote:
As noted, here, about 2 years ago, I indicated that a company could
build an x86 CPU that would have 50% of an Opteron's performance in

That was a great post!

The natural conclusions that come out of this is that Intels next big
uber chip will have four big cores and 64ish Larrabee cores.

This will go up against AMDs four big cores plus 800 ATI vector
processors. The ATI chip will also do graphics, removing a big expensive
chip off the motherboard. Intel will claim the same for Larrabee, and
Intel is happy to sell chips with crappy graphics. ;)

I think your prediction is going to be pretty close.

Have you all read the DDJ article by Mike Abrash where he goes into the
low-level detail of the LRB cores?

http://www.ddj.com/architect/216402188

The Larrabee architecture did not quite make me barf. ;)
Using x86 as the base it is understandable that the third vector operand
can be a memory operand, the opcode bits are there and you have a
limited register set. Make virtue out of necessity, etc.

There is a certain logic to Intels design decisions for Larrabee if you
think four way multi-threading makes sense due to memory latency stalls
for reads.

32 registers times 4 sets, plus some rename registers and you are
looking at ~200 actual registers. If you try and do this with 256
visible registers your actual register set becomes large, this might be
an issue, or not.

The die size tradeoffs of 4 way multi threading is mostly about reducing
the cost of x86 decode die space, and other die costs. To bring costs
down near ATI space, for the amount of work done. For anyone besides
Intel if multi threading makes sense the first thing they would do is
dump the x86 instruction set, to free up that waste.

The future looks to be RRAM, in which case we will get 8 gigs of
embedded RAM that is only two dozen cycles away or so. Simple prefetch
and a big register set can hide that latency. Multi threading dies,
except as a marketing gimmick.

Brett
.



Relevant Pages

  • Re: non load/store architecture?
    ... Programmers can write great code on a RISC ... Especially in a PC marketplace dominated by the x86. ... Lots of registers and an orthogonal instruction set are important - both have these. ... The lack of registers means much more memory IO, which causes stalls and requires complex scheduling. ...
    (comp.arch.embedded)
  • Re: Switch from SBCL to Erlang backend due to scalability issues(GC).
    ... to the SBCL developers saying that they don't use a precise GC on x86 ... should be more likely that their stack locations have to be touched ... than on architectires with more registers. ... and initializing all those stack locations would ...
    (comp.lang.lisp)
  • Re: What cpus to compile for (in order)?
    ... >>> I'm at the stage of looking at code generation options and would like to ... >> in my case, I aim for x86 and x86-64, and that is about it. ... >> as for ia64, I am not sure if it has much chance of really going anywhere ... structure, different treatment of registers, ...). ...
    (comp.lang.misc)
  • Re: Ommiting frame pointer
    ... which speeds things up on processors with few registers (in ... just because of extra indirections. ... I've read that some architectures have no performance penalty for PIC, but x86 and amd64 definitely do not fall in that group. ...
    (comp.lang.c)
  • Re: memory leak in <vector> STL
    ... >> loading an invalid address into registers can actually cause CPU to ... > I believe that you only have a problem on x86 if you dereference the ... Specifically, I'm referring to ...
    (microsoft.public.vc.stl)