Build your own Forth for Microchip PIC (Episode 838): Threading



If you read my introductory episode (#837, and arbitrarily chosen number
that reflects that the search for Microchip PIC 16F forths have been
asked for a while) I think I need to implement NEXT, ENTER, and EXIT in
order to run Forth words under Frank Sergeant's 3 instruction Forth
microkernel. Threading is the core of this issue. So let's examine the
possibilities.

As I said in the last missive, the PIC is quirky. It rears its ugly
little head right off the bat in several ways:

1. Ram is so precious that there's no way to run code in it.

2. Program and data flash are both available, but reading them
programmatically is very costly. In addition only 14 bits is available
from each word of program memory.

3. The hardware stack is both limited in size (8 levels deep) and
visibility (as in invisible).

Confined in this environment we need to figure out a threading model to
implement. Brad Rodriguez outlines several techniques in his moving
forth article: http://www.zetetics.com/bj/papers/moving1.htm

Here's a brief recap:

1. Indirect threading (ITC): code is a list of addresses that point to
addresses of the target code to execute.

2. Direct threading (DTC): code is a list of addresses that points to code
that jumps/calls the target code to execute. It's about the same as ITC
but removes one level of indirection.

3. Subroutine threading (STC): code is a set of subroutine calls. Uses the
hardware stack to manage movement.

4. Token Threading (TTC): code is a token. The token is then used to
look up the address of the code to execute.

Let's evaluate the efficacy of each relative to the pic's constraints.

Ordered from worst to first:

worst: STC. The pic simply doesn't have the hardware stack to support
it.

ok: ITC. extracting address from the pic's program memory is costly.
Having to do it on two levels of indirection is even more so.

better: DTC. Halves the cost of fetching the address. Jump instruction
is one cell and executes quickly.

best: TTC. Facilitates fast fetching of tokens. PIC has support for fast
jump tables. Far more efficient than ITC or DTC.

It's interesting that Brad calls TTC the slowest of the bunch. But in
the case of a PIC where fetching program memory programmatically is
close to 20 instruction cycles, TTC runs rings around both DTC and ITC.

I already have a similar scheme implemented for fetching bytecodes in my
NPCI virtual machine interpreter. Fundamentally each byte is encoded
into a pic RETLW instruction. By using the PCLATCH/PCL it's possible to do a
computed goto to any instruction in the address space.

So it looks like TTC with jump tables is the winner.

BAJ
.



Relevant Pages

  • Re: Forth Machine software emulation
    ... A brilliant essay on writing an ITC Forth engine. ... understanding of Forth, and ITC is just about the *only* thing that he ... that he's not aware why the traditional design is the way it is. ... well-documented for seeing all the pieces of the threading model. ...
    (comp.lang.forth)
  • Re: Jonesforth and Hayes CORE tests
    ... Subroutine threading is ... ITC is something of a historical curio. ... a good starting point for optimisation, ... high level words in a comparatively large flash. ...
    (comp.lang.forth)
  • Re: Jonesforth and Hayes CORE tests
    ... But tweaking the run-time performance of an indirect threaded code ... Pure ITC is something of a historical curio. ... I don't really understand your meaning. ... threading capture the "meaning" any better than another? ...
    (comp.lang.forth)
  • Re: STC within an ITC environment
    ... >> Julian V. Noble wrote: ... >>> ITC need not be slower than STC. ... Depends on the threading mechanism. ...
    (comp.lang.forth)