Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- From: "Peter \"Firefly\" Lund" <firefly@xxxxxxx>
- Date: Tue, 9 Aug 2005 00:17:47 +0200
On Mon, 8 Aug 2005, Andi Kleen wrote:
Just modern x86s cannot predict them, so they're really slow.
I know they are slow. I was arguing against the assertion that calls are simple on IA-32.
Which was probably John's point. It can be implemented, but it's slow.
And lots of complicated VAX instructions can be implemented but in a way that is likely to be slow. Mashey seems to mean that that was a huge problem. I think it might not have been (as does Anton Ertl, as far as I can tell) if a sensible subset could have been made fast.
On the other hand the x86 designers probably didn't really care about them very much because they are rarely used. So we don't know if they couldn't have made them fast if they wanted.
Why not? If we cache the most often used call gate selectors the way segment descriptors can be cached?
Also I think John's "zero microcode" description was a bit misleading since even modern RISCs seem to have microcode of some sort (e.g. POWER4 for cracking the more complex old POWER string instructions). The basic concept of having a ROM somewhere in the CPU that contains cracked up code for rarely used instructions is probably not that bad.
It seems like a very good idea. Works fine for IA-32, too.
Of course, they aren't used much these days...
Interrupts and system calls are very similar to call gates.
I know.
Having easy access to PC is important for position independent code.
But how much do you need? Isn't it enough to have a PC-relative addressing mode? Without the auto increment/decrement stuff?
Old x86 (before x86-64) always suffered from not having this and requiring ugly workaround like abusing call for this.
Yes, not having a way to do position independent code easily is bad. That is not the same thing as saying that it is good to make the PC a "GPR".
The funny thing is that the "obvious" implementation of getting the PC on x86 which is
call 1f 1: pop %reg
totally screws up performance on modern x86s which have a call return stack because the next RET will be mispredicted. A lot of code generated by older compilers suffered from this badly.
Something like this might be faster:
call 1f 1: pop %reg add 2f-1f, %reg ; would that be five? push %reg ret 2:
It depends on whether POP adjusts the internal return stack or not.
SP is also quite frequently referenced for accessing stack local variables without a frame pointer or modifying the stack frame. It would have been a much worse ISA if the only way to do that would have been the hyper complicated CALL instructions.
SP-relative addressing is very nice, too, but do you need to shift and rotate the SP? Arithmetically and logically? Xor it? When was the last time you even compared it with anything?
x86 has half the same mistake - SP is a "general purpose register" but at least the PC isn't.
It was a long standing mistake that x86-64 finally fixed.
But not by making the PC a GPR.
-Peter .
- Follow-Ups:
- Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- From: Jan Vorbrüggen
- Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- References:
- Re: Code density and performance?
- From: Eric P.
- Re: Code density and performance?
- From: John Mashey
- Re: Code density and performance? [really Part 1 of 3: Micro economics 101]
- From: John Mashey
- Re: Code density and performance? [really Part 2b of 3: Micro economics 101]
- From: John Mashey
- PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- From: John Mashey
- Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- From: Peter \"Firefly\" Lund
- Re: Code density and performance?
- Prev by Date: Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- Next by Date: Re: internal call/ret stack
- Previous by thread: Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- Next by thread: Re: PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
- Index(es):
Relevant Pages
|