Re: Build your own Forth for Microchip PIC (Episode 837)

none Byron Jeff wrote:
In article <137rks1o4s6mi1b@xxxxxxxxxxxxxxxxxx>,
Elizabeth D Rather <eratherXXX@xxxxxxxxx> wrote:
none Byron Jeff wrote:
I'm not necessarily opposed to the optimized-machine-code approach. It's
just that with the PIC based forths I've seen so far, there's no
incremental way to do it. So it reverts back to the traditional compile
the whole app, download the whole app, test cycle that associated with
traditional HLLs. And the target in this instance cannot be programmed
at wire speed. So downloading a entire program to test takes a while.
It's not like a PC where compiling is virtually instantaneous.
Well, we're able to do it incrementally on all the targets we support so far. The actual compilation is on the host, but the download of the result is immediate. We have a switch setting that either compiles the whole thing and downloads it, or downloads individual definitions as soon as they're done. We use the first mode to send the kernel, if necessary, and then switch to the second mode for interactive development.

OK. So we're getting somewhere. Now I happen to still be stuck due to
the unreasonablly small hardware stack size for my particular target.
I presume that the optimized compiled code is based on STC with

Yes, but it doesn't matter. The point I'm trying to make is that the choice of implementation model doesn't preclude incremental compilation.

Well, your choice of the PIC16 clearly has some benefits, which you've outlined, but it appears as though it's really handcuffing you in terms of designing a workable development cycle.

I don't think so. Execution effiency isn't my primary goal, and that's
what will be handcuffed. I was working with a bytecode interpreter that
is probably averaging 40 instructions per bytecode on average. At the
time I started working on it years ago, my bytecode memory was a
bitbanged serial EEPROM. Execution effiency isn't a worry. It's the item
I can most afford to give up initially.

You are handcuffed in the sense that you would like to be able to download small amounts of code into ram and execute it. You don't seem to have enough ram to do this, not to mention not enough stack space, etc. Hence, a civilized development environment will be very much more difficult to arrange than on other platforms.

I don't see how the issue of incremental compilation and downloading is dependent on the execution model.
Regardless of what your compiled stuff looks like (machine instructions, addresses, or tokens) you still have to be able to download little bits of it and execute it, right?

The little bits is the key. PicForth and Mary are organized as a one
shot compilation environment. You compile the entire system and download
it at one go. I want pretty much the opposite. The current tools I have
available to me are not organized that way.

Right. But the problem is that the tools are designed for the limitations of the platform. A less limited platform can more easily support the kind of tools you're seeking.

Unless (I'm really pretty ignorant of PIC architecture) you have a Harvard architecture and it's code space you have no access to.

It's a Harvard architecture that's difficult to access and slow to
update. So transferring little bits at a time is a highly desireable

Yes. But you need somewhere to transfer them to that's a little more accessible than that. PIC doesn't seem to support ram development, which is the best way to do incremental testing.

In that case, either addresses or tokens could work (which is why we used tokens on the AVR 8515).

Tokens is the winner in my case too. Still a bit concerned about if I'm
constrained in my token size (or if it really matters).

Not really.

But in my self-chosen constrained target environment this approach fails
on several levels given the goals I hope to achieve:

1. The pic's hardware stack is limited. Subroutine calls are simply not
an option because the stack overflows after only 8 levels of calls. This
is controllable in an assembly environment. But with Forth specifically
designed around making calls, it's a guaranteed path to doom.
Well, it's somewhat limiting, but shouldn't be fatal. Most Forth apps aren't really nested very deeply. We've run some pretty hairy apps on 8051's with on-chip stacks of limited size.

That's encouraging. I just owrry about reliability because if you
overflow the hardware stack, your application is guaranteed to crash eventually.
And that stack is completely unmapped in memory. It's probably the thing
about the part that drives me the most crazy because you can't implement
any effective context switching without access to that stack.

Among the many reasons we avoid using PICs.

Exactly how limited were those 8051 stacks?

64 bytes (32 cells) as I recall. Could have been 48 bytes. It's been a while.

2. Taking this route commits you to compiling your entire application
because once you do away with the inner interpreter, then everything on
the target must be compiled.
No, it doesn't. Changing to direct code compilation didn't affect our development cycle at all. Unless you're saying this based on another PIC-specific obstacle we haven't heard about yet.

Not sure. I think I'm reaching the boundaries of my understanding. If
point #1 above is taken off the table, then I think I can see it because
implementing optimized STC eliminates the interpreter yet facilitates incremental additions to the codebase on the target. But if that
hardware stack is out of bounds, I'm lost as to how you could implement
ITC, DTC, or TTC without elements of the address interpreter.

The execution model doesn't affect whether you can or cannot do incremental compilation. What determines that is whether you have a place to put the downloaded definitions to test them. You make it sound like the address interpreter is a big deal. It isn't.

But it gets back to the question of how much kernel do you have to
download to get started? At the very least in bootstrapping the kernel a
distributed model would have to be in play.

You start with a couple dozen bytes and use that to load the rest. We find a few K to be capable of providing a useful set of primitives.


I guess my question is what is the structure of an optimized compiled
code word then? What I cannot visualize is the linkages between the code

It's code in code space. Small primitives or optimized code sequences are expanded in place. Larger words are called. Linkage is normal call/return.

I think that structurally I can easily see how to compile a definition
into a collection of addresses or tokens. However compiling native code
is a different animal.

Simpler, really.

Microcontrollers also have mechanisms for dealing with time critical
stuff. Another reason I love using PICs is the wide variety of hardware
periperals they come packaged. UARTS, multiple timers, PWM, ADC, and the
like are really set/autopilot types of tools. Interrupts can be used to
buffer really time sensitive stuff.
So do most modern microcontrollers. Take another look at some of the alternatives. There are some pretty nice parts out there.

I'm aware. Remember I got here because I was looking at the propeller.
I'm already having to get up to speed with a new language and a new
tool. Componding it by starting from scratch with a new architecture is
too much to tackle.

It's a lot easier to come up to speed with a new language if you have thoroughly tested, well-documented tools at hand. Trying to develop tools for a language that's new to you is a much worse challenge.

Plus I feel if I can pull this off in my constrained little box, that
moving the port to a roomier chip (like the propeller, which of course
due to Cliff I don't need to do) should be no problem.

Isn't that like saying, if I can learn to ride a unicycle, a tricycle will be easy?

If all else fails after developing the application, simply run it all
through an optimizing compiler removing the inner interpreter altogether
along with other connecting tissue beween words.
That shouldn't have to be an extra step. An optimizing compiler isn't a post-processor, it's an *alternative* to another kind of compiler (such as ITC).

An incremental optimizing Forth compiler for the PIC 16F platform
doesn't exist AFAICT. It needs to be built. My experience with language tool building and with pics tells me that the optimizing compiler is the
much tougher road to travel to get to a incremental development target.

True, but even developing an interactive, incremental ITC or token-based development system for this beast will be extremely difficult.

A non incremental optimizing compiler does exist. But I doesn't suit my
development needs.

It was developed by a very clever guy. I'm sure if he could have built an incremental, interactive compiler he would have.

Put the two together and the answer that pops out is to implement a non
optimized token based compiler. I'm in my wheelhouse there because I
already have a token based, stack implemented 16F kernel that's already
tested and can be quick adapted to the task.

While I do have fun building tools, they do have the purpose of building
other stuff. I prefer building the simplest foolproof tools I can build
then using them to bootstrap up.

Implementing an address interpreter with NEXT, ENTER, and EXIT "words"
will cost me an afternoon and about 25-30 lines of assembly. Then I'll
have a tool that I can use to put forth on my target.

I'm not worried about slow. I'm worring about getting done and having
the right result when I get done.

Well, it sounds as though you have months of fun ahead of you.

Only one piece of the puzzle is missing at this point. Elizabeth
discusses in her post above that the XTL transfers the stack between the
host and the target. In short it implements a form of distributed
execution where you muster the stacks for RPC. In doing so one can run a
application with a set of words distributed between the host and the
Well, it only models the data stack on the host. The return stack stays on the target. And target words only execute on the target. There's no attempt to simulate execution of target words on the host.

Ah. I see. So that means that your XTL had to be significantly developed
before you could start using it. The appeal of Frank's paper was that
essentially once you implemented his three instructions kernel, that you
could immediately start developing applications with it without needing
to flesh out an entire kernel just to get started. This leads back to
the point I made in my initial post that a good (albeit slow) small set
of primitives would be good to implement. And the 48 that I've seen for
MAF doesn't qualify as a small set.

The target side of the XTL is very small and simple, probably not significantly different from Frank's concept. The host is rather more complex. And the concepts have been developed over about 20 years. There's a lot of advantages to be found in "standing on the shoulders of giants".

I see the distributed model sort of as a breakpoint. The host already
has everything (primitives, core words, core extensions) already
implemented. Why not use it as a remote process server in addition to
the text interpreter, wordlist coordinator, and the target's cross


A classic Forth has a text interpreter, which processes text from a user or disk and generates ("compiles") executable definitions (which might be actual code, strings of addresses of words to be executed, tokens, or some other internal form). That which used to be called an "inner" interpreter, more accurately "address" interpreter, processes the strings of addresses in the "compiled" form of a definition if that's the model being used. It's usually only 1-3 machine instructions per address, although on some processors it's more.

Right. The point of running Forth on the host is to get a complete
environment without having the burden the target with it. This is the
tethered model. But no one seems to be addressing the possibility of
distributed computing between the host and the target. The host is
simply a respository for a set of services (text interpreter, cross
compiler, wordset dictionaries) without helping the target run any
actual forth code. The way I see it since the host is a full forth
environment, it can emulate a full forth environment for the target.

Yes, the host provides all those services. What it doesn't do well is execute target code. Therefore, we make no attempt to execute target code on the host, but transparently exercise it on the target.

In any case, I'll repeat once again: the internal form of the definition has no impact on the development cycle. These are orthogonal issues. There can be excellent or terrible development tools with any internal Forth model.

I believe that now. You've broken the connection between optimized
native code definitions and linkage technique in my mind. Thanks for


So given that how does one go about building a optimized code compiler
that functions in an incremental fashion for a target that doesn't yet
have such a beast?

Don't bother for now. A token-based implementation should work fine, so long as you're stuck with this PIC.

Our systems have a text interpreter on the host, which parses your command line or source file. Definitions are compiled (and whether the compiled form is actual code, addresses, or tokens doesn't matter) and downloaded to the host, either incrementally or in a batch depending on switch setting. If you type a target command on the host, the host's set of dictionary heads for the target is searched, and the target address of the executable code is found. Then the target is directed to execute it. The target does no interpreting.

But you can't have it both ways. I'm not talking about text
interpretation at all. Only address interpretation. Unless I missed
something the only two ways to compile definitions without an address
interpreter are STC or by inlining the code. If the compiled form is
addresses or tokens, then the target by definition needs to have an
address interpreter to interpret those addresses or tokens.

Yes. But that has no impact on the development style.


BTW I realized that I'm still trying to figure out how in the heck forth
compiles a number into a definition. What is the xt for a number?

It's the address for (or call to) a word that pushes the actual value on the stack, followed by the value:

... [xt of LIT] [value] ...

There may be different versions for 8, 16, or 32-bit literals. It has to advance the interpreter pointer or PC beyond the value, in addition to pushing the value on the stack.

I'll take a look. But frankly I won't get the warm fuzzies about it
until I'm sure that it in fact offers the type of environment I hoping
to run. It's also complicating that SwiftX is a Windows product (and
justifiably so) and I'm a Linux guy (also justifiably so).
Well, IMO the only way to find out if this is the type of environment you're looking for is to try it. As for Windows vs. Linux, we don't necessarily love Windows, but we need to make a living, and that's where 95% of the market is.

I know. That's why I said justifiably so. My small aside on that subject
is that if tool developers could find a way to develop cross platform
tools without expending too much additional effort, then maybe a more
equitable distribution of market share would follow.

It's not just that the platforms are different. Windows users have certain expectations of their development system (e.g. pull-down menus, toolbars, much more) that a simple command-line Forth like gForth doesn't support. If you don't provide them, your system won't sell and all your effort is wasted. And actually, they're pretty nifty. Since we use our tools to develop very complex applications, we are continually improving them to make them easier to use.


Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250

"Forth-based products and Services for real-time
applications since 1973."