Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)



On Mon, 8 Aug 2005, Andi Kleen wrote:

Just modern x86s cannot predict them, so they're really slow.

I know they are slow. I was arguing against the assertion that calls are simple on IA-32.


Which was probably John's point. It can be implemented, but it's slow.

And lots of complicated VAX instructions can be implemented but in a way that is likely to be slow. Mashey seems to mean that that was a huge problem. I think it might not have been (as does Anton Ertl, as far as I can tell) if a sensible subset could have been made fast.


On the other hand the x86 designers probably didn't really care about
them very much because they are rarely used. So we don't know
if they couldn't have made them fast if they wanted.

Why not? If we cache the most often used call gate selectors the way segment descriptors can be cached?


Also I think John's "zero microcode" description was a bit misleading
since even modern RISCs seem to have microcode of some sort (e.g. POWER4
for cracking the more complex old POWER string instructions). The basic
concept of having a ROM somewhere in the CPU that contains cracked up
code for rarely used instructions is probably not that bad.

It seems like a very good idea. Works fine for IA-32, too.

Of course, they aren't used much these days...

Interrupts and system calls are very similar to call gates.

I know.

Having easy access to PC is important for position independent code.

But how much do you need? Isn't it enough to have a PC-relative addressing mode? Without the auto increment/decrement stuff?


Old x86 (before x86-64) always suffered from not having this
and requiring ugly workaround like abusing call for this.

Yes, not having a way to do position independent code easily is bad. That is not the same thing as saying that it is good to make the PC a "GPR".


The funny thing is that the "obvious" implementation of getting
the PC on x86 which is

       call 1f
1:      pop  %reg

totally screws up performance on modern x86s which have a call return
stack because the next RET will be mispredicted. A lot of code generated
by older compilers suffered from this badly.

Something like this might be faster:

	call 1f
  1:	pop  %reg
	add  2f-1f, %reg  ; would that be five?
	push %reg
	ret
  2:

It depends on whether POP adjusts the internal return stack or not.

SP is also quite frequently referenced for accessing stack local
variables without a frame pointer or modifying the stack frame.
It would have been a much worse ISA if the only way to do that
would have been the hyper complicated CALL instructions.

SP-relative addressing is very nice, too, but do you need to shift and rotate the SP? Arithmetically and logically? Xor it? When was the last time you even compared it with anything?


x86 has half the same mistake - SP is a "general purpose register" but
at least the PC isn't.

It was a long standing mistake that x86-64 finally fixed.

But not by making the PC a GPR.

-Peter
.



Relevant Pages

  • [PATCH for review] [84/145] x86_64: annotate arch/x86_64/lib/*.S
    ... Some of the alternative instructions handling needed to be adjusted so ... movq %rdi,%rcx ... addq %rdx,%rcx ...
    (Linux-Kernel)
  • Re: How to Place a New Folder Under "My Computer" in the Hierarchy?
    ... Thank you, Ramesh. ... I just got stuck on step 10 of the instructions sent by ... >>> user accounts. ... >>> REG DELETE ...
    (microsoft.public.windowsxp.customize)
  • Re: INC versus ADD,1
    ... I want to make sure that I understand why ADD reg, ... SAL eax, 2 ... Both INC and ADD are read-modify-write instructions, ... may just depend on cache-status and if possible, the CPU ...
    (comp.lang.asm.x86)
  • Re: Creating a dis-assembler on my own - shucks !
    ... a far call or jump to a location specified in a 16-bit reg is ... > instructions... ... > to be standard alphabet text strings yields a ton of single and double ... that is never taken and points at obfuscated data. ...
    (comp.lang.asm.x86)
  • Deploy Custom templates in office 2007
    ... Trying to deploy custom templates in Office 2007 Pro Plus, ... I have created my Office package with OCT, I have the reg file pointing to a ... server and that reg key included in the .msp. ... xl and pp as described in the instructions. ...
    (microsoft.public.office.setup)