Re: Is a RISC chip more expensive?
- From: "James Van Buskirk" <not_valid@xxxxxxxxxxx>
- Date: Sat, 1 Sep 2007 23:27:51 -0600
"Wilco Dijkstra" <Wilco_dot_Dijkstra@xxxxxxxxxxxx> wrote in message
news:XWiCi.47131$ie3.21895@xxxxxxxxxxxxxxxxxxxxxxx
I thought the GEM compilers were pretty good at the time. I do recall
the instruction grouping rules on the 21164 were complicated. What
you're saying sounds like a broken instruction scheduler (or one
defaulting to a different core). Schedulers work in a similar way as
described, but they don't have the intuition humans have in quickly
finding a good solution. Trying out all possibilities is too slow, so they
use heuristics, and these are only as good as the compiler writer...
Humans are also good at bypassing typical constraints a compiler
would have like reordering memory accesses or moving instructions
past branches.
The GEM compilers were good enough to win the SPEC sweepstakes at
the time, but good enough doesn't mean they were good. If you
looked at the code generated you would see a lot of garbage in
there. Complex arithmetic wouldn't get simple optimizations
like mixed complex-real arithmetic would convert the real number
to complex first, then carry out the arithmetic operation. Shifts
are problematic in Fortran, as the ISHFT intrinsic can shift either
way. As I recall, left shifts would be generated by the compiler
but not right shifts.
What makes you think there even was a scheduler? When you write
out some assembly code, you normally calculate or estimate the
latency of the code sequence and compare with other code
sequences and perhaps some ideal. It's not clear that GEM or
its successor in Intel would do such of a thing. It's not a
question of trying out all possibilities, just working out the
latencies of a couple of possible candidates via emulation would
have been a good thing had it ever happened.
Moving instruction past branches or out of loops is a double-
edged sword. Consider a loop like:
a = iand(a,not(b))
In a similar situation the GEM compiler moved the logical negation
of the scalar b outside the loop that modified the vector a. The
problem with this is Alpha already has an instruction that ANDs
an element of a with the logical negation of b, so moving the
negation outside the loop created an extra unnecessary step! You
may think that's not a problem because the extra stuff happens
outside the inner loop, but the code was being run in a context
where it was used until the setup time for the loop was greater
than would be the case for alternative code. The extra work
involved in setup changed the crossover point and increased the
overall execution time appreciably. Alphas could knock out the
inner loops so efficiently that optimizing middle loops made a
difference, and GEM wouldn't optimize middle loops very well.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
.
- References:
- Re: Is a RISC chip more expensive?
- From: Wilco Dijkstra
- Re: Is a RISC chip more expensive?
- Prev by Date: Re: AMD announces SSE5 Instructions
- Next by Date: Re: Is a RISC chip more expensive?
- Previous by thread: Re: Is a RISC chip more expensive?
- Next by thread: Re: Is a RISC chip more expensive?
- Index(es):
Relevant Pages
|
Loading