Re: Future Risc




"Nick Maclaren" <nmm1@xxxxxxxxxxxxx> wrote in message news:fsihhm$in8$1@xxxxxxxxxxxxxxxxxxxxxxx

In article <us3Hj.12719$uX5.7929@xxxxxxxxxxxxxxxxxxxx>,
"Wilco Dijkstra" <Wilco_dot_Dijkstra@xxxxxxxxxxxx> writes:
|>
|> [ IEEE 754 decoding ]
|>
|> That's how you do it indeed. Add one to the exponent field before
|> even decoding it, then test for exponent is 0 or 1 using a simple mask.
|> This catches all special cases in one simple test+branch. Ie. you never
|> have any problems with the implicit leading bit.
|>
|> The same is true for hardware - special cases use different paths.

Grrk. Yes, but it means that you have to do all that grobble BEFORE
starting the operation. With a simpler format, you can start the
actual operation in parallel with checking for special cases.

No you don't have to check first. Hardware just does the operation assuming
the operands are normals. If they turn out to be special (you know that after
the first cycle), it either uses the results from the special paths or takes other
action. Denormals may result in traps, replays or are just flushed to zero on
the special path. So denormals don't add to the latency of the basic operation -
at the cost of being handled slower of course.

More than one hardware architect has said to me that handling IEEE
denorms cost them a cycle on latency, which is why their default
for HPC was not to support them. IBM was the only HPC RISC vendor
that I know of that supported them by default for HPC and, even then,
operations using them (on the Power3/POWER4) were much slower than
ones using normal operands. I have some measurements for various
architectures somewhere.

That's right. It's unavoidable given you need to normalise them before and
denomalise after each operation. It's much better to keep the common
case fast.

It's very obvious when you write the code to emulate the instruction.
With IEEE 754, there are data dependencies between decoding denormals
and the actual operation. With a simpler format (even including NaNs,
missing values, infinities, infinitesimals and several forms of zero),
there aren't. No existing, practical language allows me to put that
in my code, but I can see how to do it if there were one.

Are you talking about removing denormals altogether? That's would be
great indeed. I don't believe they are very useful, and there doesn't
appear to be a good way to make denormals fast. Pre-normalizing them
in registers might work if you use a bigger exponent internally, but we
all know how you can screw up floating point completely that way. Think
x87 :-)

Logically, thread A does the actual operation assuming ordinary
numbers, thread B deals with the special cases, and thread B cancels
thread A if it gets a hit. It doesn't matter if thread A produces
nonsense for the special values, as its result isn't used in that
case. Dead easy to parallelise.

That's what you could do in hardware using IEEE. Just add the same
hardware on the special path including the extra shifts. This would
increase the result latency by ~2 cycles on denormals. But few would
consider this effective use of transistors... I think this might be what
some of the Power CPUs do.

Wilco


.



Relevant Pages

  • Re: Future Risc
    ... |>>> normal numbers and not for denormals causes hardware a few headaches! ... |>> For hardware, I would not think the implicit bit would be a big deal. ... |> bit set in the exponent field. ... as a hardware design, ...
    (comp.arch)
  • Re: Future Risc
    ... normal numbers and not for denormals causes hardware a few headaches! ... This will leave either 1 or 0 in DL, depending upon having at least one bit set in the exponent field. ... In real life my sw would rather specialcase Zero, and then handle Denormal as a special case of zero exponent. ...
    (comp.arch)
  • Re: Fixed point Vs Floating point
    ... Actually both the 8087 and its successors, ... Do you know of any processor that handles denormals in hardware, ... That's what I mean by "no hardware support". ...
    (sci.electronics.design)
  • Re: Fixed point Vs Floating point
    ... POWER processors all do correct and complete IEEE 754 floating point. ... I have to fill the entire volume with noise of a few times FLT_MIN before starting to avoid it running like molasses for several hundred iterations until all the denormals get flushed out. ... That's what I mean by "no hardware support". ...
    (sci.electronics.design)