Re: Denesting
- From: "Elizabeth D Rather" <eratherXXX@xxxxxxxxx>
- Date: Fri, 8 Jul 2005 19:52:21 -0700
<bobjaffray@xxxxxxxx> wrote in message news:1120745487.324600.270040@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
bj: Every time I post to comp.prog.forth@xxxxxxxxxxxxxxxx the post bounces. I can only copy and paste online to the website. Maybe someone can explain why sending a regular email doesn't work.
That's probably because it's comp.lang.forth, not comp.prog.forth.
... bj: OK. Enlighten me on the difference. Straight STC I would think would simply include the call opcode in the high-level code instead of only parameter field addresses. Inlining the primitives goes a step further. That is far as I have gotten in a simplistic conception of it. What next after that?
Did you miss Stephen Pelc's excellent example above? I repeat it here:
--------------- Most optimised Forths produce very similar code on a PC for this definition, which uses eleven Forth source tokens to produce five instructions and 16 bytes of compiled code. The example is on VFX Forth.
: n>char \ n -- char dup 9 > if 7 + then [char] 0 + ; ok dis n>char N>CHAR ( 0049EB70 83FB09 ) CMP EBX, 09 ( 0049EB73 0F8E03000000 ) JLE/NG 0049EB7C ( 0049EB79 83C307 ) ADD EBX, 07 ( 0049EB7C 83C330 ) ADD EBX, 30 ( 0049EB7F C3 ) NEXT, ( 16 bytes, 5 instructions ) ok ----------------
The answer to "what next" is optimizing the resulting code, so the result is both shorter and faster than literally "rolled out" contents of definitions. To take a really simple example, in un-optimized Forth the sequence
DUP 9 > IF
above would copy the top stack item, push 9 on the stack, do a compare yielding a truth flag, and then evaluate the flag and jump or not. The optimized version takes advantage of the fact that 9 fits into a short literal, that it can *not* discard the top stack item (which < would normally do), and use the processor flags for a conditional branch. The result: 4 Forth words (which would take 6 cells on a conventional ITC Forth) shrink to two machine instructions. Both shorter *and* faster, by a considerable margin.
In such a system, you really end up with using calls & returns only where the statistics are much in your favor (that is, the called subroutine is long enough that the "overhead" is trivial). I'd be amazed if your very complicated scheme could improve even 1%; we're in the territory where it simply isn't cost-effective to try harder.
Add to that the fact that you probably spend 80% of your CPU time in 20% of your code. If you have some particular function that's really, really time-critical, you can always hand-optimize that, maybe. Designing complicated co-processor schemes really won't pay off.
Cheers, Elizabeth
-- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310-491-3356 5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454 Hawthorne, CA 90250 http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
.
- Follow-Ups:
- Re: Denesting
- From: Marcel Hendrix
- Re: Denesting
- References:
- Denesting
- From: bobjaffray@xxxxxxxx
- Denesting
- Prev by Date: Personal and off topic.
- Next by Date: Re: 4^2... Another mad idea (was Re: Denesting)
- Previous by thread: Re: Denesting
- Next by thread: Re: Denesting
- Index(es):
Relevant Pages
|
Loading