Re: Denesting



<bobjaffray@xxxxxxxx> wrote in message news:1120745487.324600.270040@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

bj: Every time I post to comp.prog.forth@xxxxxxxxxxxxxxxx the post bounces. I can only copy and paste online to the website. Maybe someone can explain why sending a regular email doesn't work.

That's probably because it's comp.lang.forth, not comp.prog.forth.

...
bj: OK. Enlighten me on the difference. Straight STC I would
think would simply include the call opcode in the high-level
code instead of only parameter field addresses. Inlining
the primitives goes a step further. That is far as I have
gotten in a simplistic conception of it. What next after that?

Did you miss Stephen Pelc's excellent example above? I repeat it here:

---------------
Most optimised Forths produce
very similar code on a PC for this definition, which uses
eleven Forth source tokens to produce five instructions
and 16 bytes of compiled code. The example is on VFX Forth.

: n>char  \ n -- char
 dup 9 >
 if  7 +  then
 [char] 0 +
;  ok
dis n>char
N>CHAR
( 0049EB70    83FB09 )                CMP     EBX, 09
( 0049EB73    0F8E03000000 )          JLE/NG  0049EB7C
( 0049EB79    83C307 )                ADD     EBX, 07
( 0049EB7C    83C330 )                ADD     EBX, 30
( 0049EB7F    C3 )                    NEXT,
( 16 bytes, 5 instructions )
ok
----------------

The answer to "what next" is optimizing the resulting code, so the result is both shorter and faster than literally "rolled out" contents of definitions. To take a really simple example, in un-optimized Forth the sequence

       DUP 9 >  IF

above would copy the top stack item, push 9 on the stack, do a compare yielding a truth flag, and then evaluate the flag and jump or not. The optimized version takes advantage of the fact that 9 fits into a short literal, that it can *not* discard the top stack item (which < would normally do), and use the processor flags for a conditional branch. The result: 4 Forth words (which would take 6 cells on a conventional ITC Forth) shrink to two machine instructions. Both shorter *and* faster, by a considerable margin.

In such a system, you really end up with using calls & returns only where the statistics are much in your favor (that is, the called subroutine is long enough that the "overhead" is trivial). I'd be amazed if your very complicated scheme could improve even 1%; we're in the territory where it simply isn't cost-effective to try harder.

Add to that the fact that you probably spend 80% of your CPU time in 20% of your code. If you have some particular function that's really, really time-critical, you can always hand-optimize that, maybe. Designing complicated co-processor schemes really won't pay off.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310-491-3356
5155 W. Rosecrans Ave. #1018  Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================


.



Relevant Pages

  • Re: Is C99 the final C? (some suggestions)
    ... You mean vague terminologies like "stack"? ... wrap whatever "spawn" mechanism you have in your language (or use some ... >> and because of Java's bignum class, it meant that exposing a widening multiply ... >> you use to determine this is just related to examining the carry flag. ...
    (comp.lang.c)
  • Re: Named stack items
    ... >> separate step during which optimizing can be done. ... is to take a collection of operations on the named stack ... build your NSI compiler and show us the benefits. ...
    (comp.lang.forth)
  • LSE64 - reference
    ... Strings are represented by arrays in which cell 0 is the number of characters and the remaining cells are ... Variables and arrays yield their addresses when executed. ... The flag register is separate from the stack. ...
    (comp.lang.forth)
  • Re: Data stack frame superior
    ... >When stack parameters are implemented in an optimizing compiler, ... >they will have the same speed as a return stack approach. ... CODE noptime2 NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NEXT END-CODE. ...
    (comp.lang.forth)
  • Re: Local variables controversial?
    ... And it had the base register and ... > stack frames, and it had more registers and modes ... Intel has indeed been optimizing successive generations of x86 chips ... ideal Forth chip and the number of hardware supported stacks. ...
    (comp.lang.forth)

Loading