Re: Build your own Forth for Microchip PIC (Episode 837)



In article <137sb0raq9lv503@xxxxxxxxxxxxxxxxxx>,
Elizabeth D Rather <eratherXXX@xxxxxxxxx> wrote:
none Byron Jeff wrote:
In article <137rks1o4s6mi1b@xxxxxxxxxxxxxxxxxx>,
Elizabeth D Rather <eratherXXX@xxxxxxxxx> wrote:
none Byron Jeff wrote:
...
Well, your choice of the PIC16 clearly has some benefits, which you've
outlined, but it appears as though it's really handcuffing you in terms
of designing a workable development cycle.

I don't think so. Execution effiency isn't my primary goal, and that's
what will be handcuffed. I was working with a bytecode interpreter that
is probably averaging 40 instructions per bytecode on average. At the
time I started working on it years ago, my bytecode memory was a
bitbanged serial EEPROM. Execution effiency isn't a worry. It's the item
I can most afford to give up initially.

You are handcuffed in the sense that you would like to be able to
download small amounts of code into ram and execute it. You don't seem
to have enough ram to do this, not to mention not enough stack space,
etc. Hence, a civilized development environment will be very much more
difficult to arrange than on other platforms.

I see that I missed explaining something. Code isn't going into RAM. The
code is burned into the flash of the PIC. There's 4-8K of flash on a
typical 16F part now. And the only parts I'll even consider are ones
that can programmatically program their own flash.

There's more than enough space for code. Even if I tokenize, I wouldn't
put those tokens in RAM. RAM is strictly for stacks and variables.

The flash is the reason that I'm harping on the incremental development
model. It's slow to program, much faster to read. The flash is the
Harvard architecture part of the PIC. The Harvard architecture also
dictates that native code cannot be run from RAM anyway.

I don't see how the issue of incremental compilation and downloading is
dependent on the execution model.
Regardless of what your compiled
stuff looks like (machine instructions, addresses, or tokens) you still
have to be able to download little bits of it and execute it, right?

The little bits is the key. PicForth and Mary are organized as a one
shot compilation environment. You compile the entire system and download
it at one go. I want pretty much the opposite. The current tools I have
available to me are not organized that way.

Right. But the problem is that the tools are designed for the
limitations of the platform. A less limited platform can more easily
support the kind of tools you're seeking.

We've established that. And I would agree with you if I were really
trying to run tokens from out of RAM. But it's must less constrained
than I think you thought the PIC is.

Unless (I'm really pretty ignorant of PIC architecture) you have a
Harvard architecture and it's code space you have no access to.

It's a Harvard architecture that's difficult to access and slow to
update. So transferring little bits at a time is a highly desireable
trait.

Yes. But you need somewhere to transfer them to that's a little more
accessible than that. PIC doesn't seem to support ram development,
which is the best way to do incremental testing.

This is the reason I'm wanting to use the host as a remote execution
environment. The model you see is:

1. Compile words on the host
2. Transfer compiled code words to RAM.
3. Test/Debug code on target recompiling and reloading as necessary.
4. When finished transfer code to more permanent memory on target.

A perfectly workable model.

The model I envision:

1. Prototype/test/debug words on host using tether to manipulate I/O
2. Compile words for target substituting local I/O access for remote ones.
3. Transfer words to target flash.
4. Retest word on target.

The model that I don't want:

1. Compile all words on host
2. Transfer all compiled words to flash on target
3. Debug/test on target.
4. Rinse and repeat.

I can work that model perfectly in assembly, Jal, C, NPCI, Basic, or
any number of traditional laguages/development models.

In that
case, either addresses or tokens could work (which is why we used tokens
on the AVR 8515).

Tokens is the winner in my case too. Still a bit concerned about if I'm
constrained in my token size (or if it really matters).

Not really.

From what I was reading in Brad's article it seems to be possible to
implement the token interpreter so that variable length tokens are
doable. It's a model that I had already implemented for my bytecode
interpreter, so I have no problem with it. So for now that's what I'll
plan to do reserving some of the 8 bit token space for extended word
tokens. It'll make my life on the host a bit more difficult because I'll
need to essentially do some hamming encoding of words to make sure that
the most frequently used words are in the smallest token space. But
again that's an optimization issue, not an implementation one.


...
But in my self-chosen constrained target environment this approach fails
on several levels given the goals I hope to achieve:

1. The pic's hardware stack is limited. Subroutine calls are simply not
an option because the stack overflows after only 8 levels of calls.

Well, it's somewhat limiting, but shouldn't be fatal. Most Forth apps
aren't really nested very deeply. We've run some pretty hairy apps on
8051's with on-chip stacks of limited size.

That's encouraging. I just worry about reliability because if you
overflow the hardware stack, your application is guaranteed to crash
eventually.

And that stack is completely unmapped in memory. It's probably the thing
about the part that drives me the most crazy because you can't implement
any effective context switching without access to that stack.

Among the many reasons we avoid using PICs.

And probably one of the reasons that no one has taken on the challenge
of implementing anything other than a batch forth compiler for it.

I think that you may have convinced me to take a stab at an STC
implementation. It does eliminate the need for the address interpreter
and will speed up execution of the most critical part of the threading
mechanism. It'll save me the code space of a token table. It would be an
absolute no brainer of a decision for an 18F part (32 level addressible
stack). I figure that if I don't implement recursion it should be fine.

And if not, I know that I have tokens as a backup.

Exactly how limited were those 8051 stacks?

64 bytes (32 cells) as I recall. Could have been 48 bytes. It's been a
while.

Now we are talking about the hardware subroutine stack right? Just
making sure.

2. Taking this route commits you to compiling your entire application
because once you do away with the inner interpreter, then everything on
the target must be compiled.
No, it doesn't. Changing to direct code compilation didn't affect our
development cycle at all. Unless you're saying this based on another
PIC-specific obstacle we haven't heard about yet.

Not sure. I think I'm reaching the boundaries of my understanding. If
point #1 above is taken off the table, then I think I can see it because
implementing optimized STC eliminates the interpreter yet facilitates
incremental additions to the codebase on the target. But if that
hardware stack is out of bounds, I'm lost as to how you could implement
ITC, DTC, or TTC without elements of the address interpreter.

The execution model doesn't affect whether you can or cannot do
incremental compilation. What determines that is whether you have a
place to put the downloaded definitions to test them. You make it sound
like the address interpreter is a big deal. It isn't.

From my reading (primarily Brad's articles) it's the core concept of
interpreted Forth. Without it you're back to the traditional compilation
model of inline expansion of native code. I already have a compiler like
that languishing on my hard disk. Not real interested in writing
another.


...
But it gets back to the question of how much kernel do you have to
download to get started? At the very least in bootstrapping the kernel a
distributed model would have to be in play.

You start with a couple dozen bytes and use that to load the rest. We
find a few K to be capable of providing a useful set of primitives.

...

I guess my question is what is the structure of an optimized compiled
code word then? What I cannot visualize is the linkages between the code
fragments.

It's code in code space. Small primitives or optimized code sequences
are expanded in place. Larger words are called. Linkage is normal
call/return.

So it's STC with inline expansion of smaller fragments. Got it.

I think that structurally I can easily see how to compile a definition
into a collection of addresses or tokens. However compiling native code
is a different animal.

Simpler, really.

If it's straight STC you're right it's simpler. Still a bit worried
about the PIC hardware call/return stack. If it overflows, your
application goes into the weeds.


...
Microcontrollers also have mechanisms for dealing with time critical
stuff. Another reason I love using PICs is the wide variety of hardware
periperals they come packaged. UARTS, multiple timers, PWM, ADC, and the
like are really set/autopilot types of tools. Interrupts can be used to
buffer really time sensitive stuff.
So do most modern microcontrollers. Take another look at some of the
alternatives. There are some pretty nice parts out there.

I'm aware. Remember I got here because I was looking at the propeller.
I'm already having to get up to speed with a new language and a new
tool. Componding it by starting from scratch with a new architecture is
too much to tackle.

It's a lot easier to come up to speed with a new language if you have
thoroughly tested, well-documented tools at hand. Trying to develop
tools for a language that's new to you is a much worse challenge.

It's new only to a point. If I were starting at ground zero I might
agree. But having implemented a stack based bytecode interpreter for the
target already, I'm way ahead of the game. I fundamentally already have
the XTL executive in place. It's simply a matter of linking in higher
level forth words into it. I feel I'm much less likely to get lost
starting bottom up with the foundation I know than restarting top down
to an unfamiliar base. In my current spot, forth is the only unknown,
and these discussions with you and others are clearing that up fairly
quickly.


Plus I feel if I can pull this off in my constrained little box, that
moving the port to a roomier chip (like the propeller, which of course
due to Cliff I don't need to do) should be no problem.

Isn't that like saying, if I can learn to ride a unicycle, a tricycle
will be easy?

No. It's saying I already know how to ride a unicycle, so the trike is
trivial.

If all else fails after developing the application, simply run it all
through an optimizing compiler removing the inner interpreter altogether
along with other connecting tissue beween words.
That shouldn't have to be an extra step. An optimizing compiler isn't a
post-processor, it's an *alternative* to another kind of compiler (such
as ITC).

An incremental optimizing Forth compiler for the PIC 16F platform
doesn't exist AFAICT. It needs to be built. My experience with language
tool building and with pics tells me that the optimizing compiler is the
much tougher road to travel to get to a incremental development target.

True, but even developing an interactive, incremental ITC or token-based
development system for this beast will be extremely difficult.

The token based executive is already done. See my other post for the
description of the bytecode interpreter with my comments.

Bruce threw together about a dozen primitives based on a simple one
register model along with about 7 primitives in a post. Forth doesn't
need much of a base to get rolling.

A non incremental optimizing compiler does exist. But I doesn't suit my
development needs.

It was developed by a very clever guy. I'm sure if he could have built
an incremental, interactive compiler he would have.

People develop tools based on their perceptions. Actually I read on his
site that he had started down the path I'm on now and decided that it
would be quicker to implement a cross compiler. He also had a particular
target in mine (model train stuff).

All of my perceptions are colored from my pervious experiences. For me
projects go better when I can noodle with stuff. Most time I spiral from
the middle of a specification in both directions. It works for me.

So I want to build the tools that fits that development style. In
addition I want tools that fir into my current toolset. Meeting forth in
the last couple of weeks has again changed my perception of what that
tool needs to be.

Put the two together and the answer that pops out is to implement a non
optimized token based compiler. I'm in my wheelhouse there because I
already have a token based, stack implemented 16F kernel that's already
tested and can be quick adapted to the task.

While I do have fun building tools, they do have the purpose of building
other stuff. I prefer building the simplest foolproof tools I can build
then using them to bootstrap up.

Implementing an address interpreter with NEXT, ENTER, and EXIT "words"
will cost me an afternoon and about 25-30 lines of assembly. Then I'll
have a tool that I can use to put forth on my target.

I'm not worried about slow. I'm worring about getting done and having
the right result when I get done.

Well, it sounds as though you have months of fun ahead of you.

Months? What makes you think that. There are at least a 1/2 dozen forths
out here that are predicated on operating on top of a simplified kernel.
The kernel already exists. All that's required is a simple mapping
between the primitives required by a particular forth and the existing
kernel.

Is there some magic incantation that I'm missing here?

...
Only one piece of the puzzle is missing at this point. Elizabeth
discusses in her post above that the XTL transfers the stack between the
host and the target. In short it implements a form of distributed
execution where you muster the stacks for RPC. In doing so one can run a
application with a set of words distributed between the host and the
target.
Well, it only models the data stack on the host. The return stack stays
on the target. And target words only execute on the target. There's no
attempt to simulate execution of target words on the host.

Ah. I see. So that means that your XTL had to be significantly developed
before you could start using it. The appeal of Frank's paper was that
essentially once you implemented his three instructions kernel, that you
could immediately start developing applications with it without needing
to flesh out an entire kernel just to get started. This leads back to
the point I made in my initial post that a good (albeit slow) small set
of primitives would be good to implement. And the 48 that I've seen for
MAF doesn't qualify as a small set.

The target side of the XTL is very small and simple, probably not
significantly different from Frank's concept. The host is rather more
complex. And the concepts have been developed over about 20 years.
There's a lot of advantages to be found in "standing on the shoulders of
giants".

OK I'll take you on with this. Here are my requirements:

1. I need an environment that runs natively on my Linux box.
2. I need the specification of the XTL and the host wire interface.
3. I need complete documentation of the host environment.

Can you provide that? How much will it cost me?

I see the distributed model sort of as a breakpoint. The host already
has everything (primitives, core words, core extensions) already
implemented. Why not use it as a remote process server in addition to
the text interpreter, wordlist coordinator, and the target's cross
compiler?

Yes.

You said no previously. Something to the effect that all words ran on
the target and that the stack was not transferred between the host and
the target.

So which is it?


...
A classic Forth
has a text interpreter, which processes text from a user or disk and
generates ("compiles") executable definitions (which might be actual
code, strings of addresses of words to be executed, tokens, or some
other internal form). That which used to be called an "inner"
interpreter, more accurately "address" interpreter, processes the
strings of addresses in the "compiled" form of a definition if that's
the model being used. It's usually only 1-3 machine instructions per
address, although on some processors it's more.

Right. The point of running Forth on the host is to get a complete
environment without having the burden the target with it. This is the
tethered model. But no one seems to be addressing the possibility of
distributed computing between the host and the target. The host is
simply a respository for a set of services (text interpreter, cross
compiler, wordset dictionaries) without helping the target run any
actual forth code. The way I see it since the host is a full forth
environment, it can emulate a full forth environment for the target.

Yes, the host provides all those services. What it doesn't do well is
execute target code. Therefore, we make no attempt to execute target
code on the host, but transparently exercise it on the target.

I understand. We don't need the host to execute target code. The host is
running forth, the target is running forth. I do realize that they may
be implemented differently (different cell sizes and the like). But
forth is forth is forth. So if there's a word on the host that the
target can use, why is it necessary to compile that word for the target,
transfer the word to the target and execute the word on the target? An
RPC model where the host can execute a word (or set of words) is also a viable
model during development.

[snippage]

BTW I realized that I'm still trying to figure out how in the heck forth
compiles a number into a definition. What is the xt for a number?

It's the address for (or call to) a word that pushes the actual value on
the stack, followed by the value:

... [xt of LIT] [value] ...

So the number is inlined into the code.


There may be different versions for 8, 16, or 32-bit literals. It has
to advance the interpreter pointer or PC beyond the value, in addition
to pushing the value on the stack.

That's problematic for an STC that doesn't have access to the hardware
return stack. It'll have to be inlined.

BAJ
.



Relevant Pages

  • Re: Build your own Forth for Microchip PIC (Episode 837)
    ... Because if you're doing tokens, wouldn't your token tables go in data space, with your token interpreter and primitives in code space? ... Compile words for target substituting local I/O access for remote ones. ... You can't, for example, test any of your PIC code on the host. ...
    (comp.lang.forth)
  • Re: Build your own Forth for Microchip PIC: Design thoughts
    ... compiler is the only task that's burning in my brain right. ... that puts PIC code on the host for the simulator to execute. ... I'm not so sure about the simulator. ... That target will execute compiled forth words. ...
    (comp.lang.forth)
  • Re: Build your own Forth for Microchip PIC: Design thoughts
    ... testing words on the host, then transferring words to the target. ... where the students would be ultimately working with a cross-compiler, ... to a different compiler that will generate PIC code and download it. ...
    (comp.lang.forth)
  • Re: tiny embedded C-based forth
    ... It would seem that both target and host ... and even though the compiler is simpler the language isn't. ... a Goods and Services Tax (or almost any other broad based production ...
    (comp.lang.forth)
  • Re: Build your own Forth for Microchip PIC (Episode 837)
    ... a powerful desktop, you can do things that are both difficult and inappropriate on a limited target, such as compiling optimized target machine code. ... The actual compilation is on the host, but the download of the result is immediate. ... because once you do away with the inner interpreter, ... through an optimizing compiler removing the inner interpreter altogether ...
    (comp.lang.forth)