Re: Build your own Forth for Microchip PIC (Episode 837)



In article <137rks1o4s6mi1b@xxxxxxxxxxxxxxxxxx>,
Elizabeth D Rather <eratherXXX@xxxxxxxxx> wrote:
none Byron Jeff wrote:
In article <137quf979qhcr75@xxxxxxxxxxxxxxxxxx>,
Elizabeth D Rather <eratherXXX@xxxxxxxxx> wrote:
...
That said, improvements do occur. By hosting the actual compilation on
a powerful desktop, you can do things that are both difficult and
inappropriate on a limited target, such as compiling optimized target
machine code (which we do). The compile-to-optimized-machine-code
approach not only generates much faster code than the traditional
indirect-threaded approach, it's comparable in size on even small
targets such as the 8051, and significantly smaller on larger ones such
as the 68K family.

I'm not necessarily opposed to the optimized-machine-code approach. It's
just that with the PIC based forths I've seen so far, there's no
incremental way to do it. So it reverts back to the traditional compile
the whole app, download the whole app, test cycle that associated with
traditional HLLs. And the target in this instance cannot be programmed
at wire speed. So downloading a entire program to test takes a while.
It's not like a PC where compiling is virtually instantaneous.

Well, we're able to do it incrementally on all the targets we support so
far. The actual compilation is on the host, but the download of the
result is immediate. We have a switch setting that either compiles the
whole thing and downloads it, or downloads individual definitions as
soon as they're done. We use the first mode to send the kernel, if
necessary, and then switch to the second mode for interactive development.

OK. So we're getting somewhere. Now I happen to still be stuck due to
the unreasonablly small hardware stack size for my particular target.
I presume that the optimized compiled code is based on STC with
optimizations?

...
As you point out especially in embedded systems development there are
segments where really really fast is critical and where fast enough is
often good enough. The tethered approach give you a rapid prototyping
platform for your target by using the host to implement your words.
That's a great idea. I'm struggling with the migration of those words to
the target. I'm trying to find a way out of what I perceive as a "gotta
compile the whole shebang" trap.

Well, your choice of the PIC16 clearly has some benefits, which you've
outlined, but it appears as though it's really handcuffing you in terms
of designing a workable development cycle.

I don't think so. Execution effiency isn't my primary goal, and that's
what will be handcuffed. I was working with a bytecode interpreter that
is probably averaging 40 instructions per bytecode on average. At the
time I started working on it years ago, my bytecode memory was a
bitbanged serial EEPROM. Execution effiency isn't a worry. It's the item
I can most afford to give up initially.

So the way I see it, even with a collection of optimized code words, in
order to gain the interactivity on the target I crave, I'd still have to
implement an inner interpreter to string the collection of optimized and
non optimzed words together, right?

Efficiency of execution isn't my primary goal. If I wanted to achieve
that then working on optimizing the PicForth compiler would be a better
use of my time. The tool I'm seeking is a fluid, interactive development
environment where at the end everything ends up on the target so I can
untether it and works fast enough to get the specified task done.

I don't see how the issue of incremental compilation and downloading is
dependent on the execution model.
Regardless of what your compiled
stuff looks like (machine instructions, addresses, or tokens) you still
have to be able to download little bits of it and execute it, right?

The little bits is the key. PicForth and Mary are organized as a one
shot compilation environment. You compile the entire system and download
it at one go. I want pretty much the opposite. The current tools I have
available to me are not organized that way.

Unless (I'm really pretty ignorant of PIC architecture) you have a
Harvard architecture and it's code space you have no access to.

It's a Harvard architecture that's difficult to access and slow to
update. So transferring little bits at a time is a highly desireable
trait.

In that
case, either addresses or tokens could work (which is why we used tokens
on the AVR 8515).

Tokens is the winner in my case too. Still a bit concerned about if I'm
constrained in my token size (or if it really matters).


...

But in my self-chosen constrained target environment this approach fails
on several levels given the goals I hope to achieve:

1. The pic's hardware stack is limited. Subroutine calls are simply not
an option because the stack overflows after only 8 levels of calls. This
is controllable in an assembly environment. But with Forth specifically
designed around making calls, it's a guaranteed path to doom.

Well, it's somewhat limiting, but shouldn't be fatal. Most Forth apps
aren't really nested very deeply. We've run some pretty hairy apps on
8051's with on-chip stacks of limited size.

That's encouraging. I just owrry about reliability because if you
overflow the hardware stack, your application is guaranteed to crash eventually.
And that stack is completely unmapped in memory. It's probably the thing
about the part that drives me the most crazy because you can't implement
any effective context switching without access to that stack.

Exactly how limited were those 8051 stacks?

2. Taking this route commits you to compiling your entire application
because once you do away with the inner interpreter, then everything on
the target must be compiled.

No, it doesn't. Changing to direct code compilation didn't affect our
development cycle at all. Unless you're saying this based on another
PIC-specific obstacle we haven't heard about yet.

Not sure. I think I'm reaching the boundaries of my understanding. If
point #1 above is taken off the table, then I think I can see it because
implementing optimized STC eliminates the interpreter yet facilitates
incremental additions to the codebase on the target. But if that
hardware stack is out of bounds, I'm lost as to how you could implement
ITC, DTC, or TTC without elements of the address interpreter.

Let me outline how I envision using the target to give you a sense of
why I'm looking for a blended approach.

1. Starting out on a new project. Grab a part and use the traditional
programmer to dump the core executive on the part. Put traditional
programmer away until the next project because I absolutely detest
having to have a special programmer just to dump code on the chip.
Another advantage to Frank's kernel is that if it can write program
memory then it can serve as a bootloader for the chip even if I wanted
to dump something non Forth into it. I've tasked one of my summer interns
with writing a PIC16F bootloader in Forth combined with a picoforth
kernel that can program the PIC's program memory.

Yep, once your kernel (Frank's, ours, whatever) is downloaded, it should
be able to accept more stuff from the host provided your hardware gives
you access to a place to put it and run it.

That's a good start.

2. Wire up the project with the serial (or USB) interface and hook up to
the PC. Fire up gforth on the host and load the standard port
definitions and whatever words I have from previous projects that I
often use for embedded systems projects. Don't compile or download to
the target yet simply because I don't necessarily know what words I'll
actually need for the project.

That's possible assuming you really can make gForth look like your
target. Often there are issues. For example, gForth has a 32-bit cell
size, and your PIC model may find that too much overhead. If you're
running with 32-bit cells on your host and 16-bit cells on the target,
there may be some numbers that won't fit in 16 bits, so your gForth code
won't run on the target. That's just an example, but there are snakes
in those woods.

Worth rooting them out because I can get instant gratification using
gforth to develop.

3. As of now the target only has the microexecutive on it. But it's enough
to get started. I wire up whatever I/O I need for my application and
either test prewritten words on that I/O or write up new words necessary
to exercise it. All the debugging is done on the PC in gforth initially
until I'm happy with the result. I now start migration. When I get a
word I'm sure I'm going to need on the target, I move that word to the
target. Depending on the speed requirements, this may be a compiled word
which essentially functions as CODE, or a high level definition if speed
isn't critical. I retest the word on the target to make sure it works as
expected. Once that's done the word is added to the target wordset on
gforth and any further usage of that word will be remotely called.

How does gForth access the I/O that's wired to your target? If you wire
it to the PC for testing, that's unlikely to be totally transparent.

I/O is memory accessed. Remote access is via Frank's 3 instruction
implementation. Local access is via @ and !. The word compiler to the
target that runs under gforth can transform remote memory accesses into
local one. So it'll be transparent to the developer when the code is
migrated to the target.

We follow essentially this model, except all actual testing is done on
the target. This actually helps us debug the hardware interface as well
as the code.

But it gets back to the question of how much kernel do you have to
download to get started? At the very least in bootstrapping the kernel a
distributed model would have to be in play.

4. Continue the process of building the application and incrementally
moving needed words to the target. Eventually the application will be
complete and well tested and all the words moved to the target and
nothing other than a GO command being run on the host.

5. Untether the target board, put it into service. Rinse and repeat with
the next project adding any interesting new words generated and tested
for this application to the hopefully growing library of useful words
that have been developed over previous projects.

Yes, that's certainly the preferred strategy.

Now in my view if an inner interpreter doesn't exist on the target that
activities 3 and 4 cannot be done. The inner interpreter is critical in
order to have both incremental compilation/movement of words to the
target and to facilitate the distributed execution of the application
between the host and target.

Did I miss something?

Yes. Whether there's an address interpreter (a term that's much more
descriptive than "inner" because it's clearer what's going on)

Will switch...

or not
has no impact on the development cycle. A Harvard architecture part
requires somewhat different internal support for actual code vs. other
implementation strategies such as tokens or addresses, but the
development cycle can be made to look just the same.

I guess my question is what is the structure of an optimized compiled
code word then? What I cannot visualize is the linkages between the code
fragments.

I think that structurally I can easily see how to compile a definition
into a collection of addresses or tokens. However compiling native code
is a different animal.


...
Microcontrollers also have mechanisms for dealing with time critical
stuff. Another reason I love using PICs is the wide variety of hardware
periperals they come packaged. UARTS, multiple timers, PWM, ADC, and the
like are really set/autopilot types of tools. Interrupts can be used to
buffer really time sensitive stuff.

So do most modern microcontrollers. Take another look at some of the
alternatives. There are some pretty nice parts out there.

I'm aware. Remember I got here because I was looking at the propeller.
I'm already having to get up to speed with a new language and a new
tool. Componding it by starting from scratch with a new architecture is
too much to tackle.

Plus I feel if I can pull this off in my constrained little box, that
moving the port to a roomier chip (like the propeller, which of course
due to Cliff I don't need to do) should be no problem.

If all else fails after developing the application, simply run it all
through an optimizing compiler removing the inner interpreter altogether
along with other connecting tissue beween words.

That shouldn't have to be an extra step. An optimizing compiler isn't a
post-processor, it's an *alternative* to another kind of compiler (such
as ITC).

An incremental optimizing Forth compiler for the PIC 16F platform
doesn't exist AFAICT. It needs to be built. My experience with language
tool building and with pics tells me that the optimizing compiler is the
much tougher road to travel to get to a incremental development target.

A non incremental optimizing compiler does exist. But I doesn't suit my
development needs.

Put the two together and the answer that pops out is to implement a non
optimized token based compiler. I'm in my wheelhouse there because I
already have a token based, stack implemented 16F kernel that's already
tested and can be quick adapted to the task.

While I do have fun building tools, they do have the purpose of building
other stuff. I prefer building the simplest foolproof tools I can build
then using them to bootstrap up.

Implementing an address interpreter with NEXT, ENTER, and EXIT "words"
will cost me an afternoon and about 25-30 lines of assembly. Then I'll
have a tool that I can use to put forth on my target.

I'm not worried about slow. I'm worring about getting done and having
the right result when I get done.

My time on a project is spent developing it. You (that would be
Elizabeth) pointed out in several posts over the years that developing
in a full fledged forth environment is a good thing. I agree. I firmly
believe that environment includes interactive and incremental
development. I'll sacrifice performance to get the project working.
"Make it work, then make it fast (only if necessary)".

What you seem to be doing is sacrificing the development cycle of your
dreams to stay with your beloved PICs. I think you'll find it really
hard to develop or support a good development system on the PIC16, from
all you've said.

Every other development cycle has its costs too. There's the cost of
learning new architectures, the cost of new programming tools and
software, the cost of sacrificing prototypability. For example the TI
MPS430 only comes in 3.3V or less version, with no 5V tolerant I/O and
in quad flat pack packaging only. Major shift.

And PICs are not the only tool in my box. Being a Linux guy (and yes
that's actually non negotiable) means I have to be extremely picky and
choosy about the tools I put in my box.

But I have the luxury of doing so as an acadmic and hobbyist. No
constraints, project deadlines, sales targets projections, or budget
concerns to worry about.

Only one piece of the puzzle is missing at this point. Elizabeth
discusses in her post above that the XTL transfers the stack between the
host and the target. In short it implements a form of distributed
execution where you muster the stacks for RPC. In doing so one can run a
application with a set of words distributed between the host and the
target.

Well, it only models the data stack on the host. The return stack stays
on the target. And target words only execute on the target. There's no
attempt to simulate execution of target words on the host.

Ah. I see. So that means that your XTL had to be significantly developed
before you could start using it. The appeal of Frank's paper was that
essentially once you implemented his three instructions kernel, that you
could immediately start developing applications with it without needing
to flesh out an entire kernel just to get started. This leads back to
the point I made in my initial post that a good (albeit slow) small set
of primitives would be good to implement. And the 48 that I've seen for
MAF doesn't qualify as a small set.

I see the distributed model sort of as a breakpoint. The host already
has everything (primitives, core words, core extensions) already
implemented. Why not use it as a remote process server in addition to
the text interpreter, wordlist coordinator, and the target's cross
compiler?


...
It's a
lot easier to just go with a fully functional XTL and do all your
testing on the target. Among other benefits, that means you can use
Forth to debug your target hardware, which is wonderful.

What does a fully functional XTL offer? Right now it's kind of a
black box to me. Please enlighten me.

The ability to have the "look and feel" of a full Forth on the target,
except that the actual target isn't burdened with dictionary heads &
searches, any kind of compiler or assembler, user interface, etc. All
these services are provided transparently by the host. I really urge
you to try SwiftX (or at least read its docs) to get a clearer picture.

I may take a read of the docs.

There are still details that need to be worked out such as how to
differentiate between local words and remote words in both systems and
how to facilitate transferring the stacks between the two. In both cases
solutions should be geared towards simplifying the target.
That differentiation is typically done with wordsets (formerly known as
vocabularies). The draft standard identifies "scopes" of words for the
host, cross-compiler, and target; they are usually implemented with
wordsets, although the draft standard doesn't mandate any particular
implementation strategy.

I read that in one of the later chapters of Steven's book. Still a bit
fuzzy as to whether there's a concept of a local interpreter and a
remote interpreter though.

I'm beginning to think you're confusing interpreters.

Nope. I have it straight.

A classic Forth
has a text interpreter, which processes text from a user or disk and
generates ("compiles") executable definitions (which might be actual
code, strings of addresses of words to be executed, tokens, or some
other internal form). That which used to be called an "inner"
interpreter, more accurately "address" interpreter, processes the
strings of addresses in the "compiled" form of a definition if that's
the model being used. It's usually only 1-3 machine instructions per
address, although on some processors it's more.

Right. The point of running Forth on the host is to get a complete
environment without having the burden the target with it. This is the
tethered model. But no one seems to be addressing the possibility of
distributed computing between the host and the target. The host is
simply a respository for a set of services (text interpreter, cross
compiler, wordset dictionaries) without helping the target run any
actual forth code. The way I see it since the host is a full forth
environment, it can emulate a full forth environment for the target.

In any case, I'll
repeat once again: the internal form of the definition has no impact on
the development cycle. These are orthogonal issues. There can be
excellent or terrible development tools with any internal Forth model.

I believe that now. You've broken the connection between optimized
native code definitions and linkage technique in my mind. Thanks for
that.

So given that how does one go about building a optimized code compiler
that functions in an incremental fashion for a target that doesn't yet
have such a beast?

Our systems have a text interpreter on the host, which parses your
command line or source file. Definitions are compiled (and whether the
compiled form is actual code, addresses, or tokens doesn't matter) and
downloaded to the host, either incrementally or in a batch depending on
switch setting. If you type a target command on the host, the host's
set of dictionary heads for the target is searched, and the target
address of the executable code is found. Then the target is directed to
execute it. The target does no interpreting.

But you can't have it both ways. I'm not talking about text
interpretation at all. Only address interpretation. Unless I missed
something the only two ways to compile definitions without an address
interpreter are STC or by inlining the code. If the compiled form is
addresses or tokens, then the target by definition needs to have an
address interpreter to interpret those addresses or tokens.

The beauty though is that the address interpreter words are the only
words required to execute forth words on the target presuming that those
definitions are compiled into addresses or tokens. And as you implied
above, host compilation in that instance is nothing more than looking up
those addresses/tokens in the dictionary heads and emitting a collection of
tokens or addresses corresponding to what is found in the dictionary.

BTW I realized that I'm still trying to figure out how in the heck forth
compiles a number into a definition. What is the xt for a number?


...

It's a chicken and egg problem. Anyone who has a real project with real
deadline will most likely either choose an existing development
environment for the target or choose a chip that is better supported by
Forth. I have the luxury of being an academic and a hobbyist. It also
helps to have a virtually unlimited supply of interns. So I can throw
resources at a project like this because it interests me, not because of
a deadline. There's of course a catch 22 to that too, which is that
since it isn't deadline driven development tends to be bursty.

Well, the real issue is where you want to throw those resources: at tool
development or on the actual project. It's really easy to get
distracted into a lengthy tool design/development project instead of
actually working on the real one. Are your interns there to learn how
to write cross compilers, or do projects with microcontrollers?

Both. They are not doing any of this tool work because it's
still getting specified here in this thread. They'll use picforth to
compile their applications. But they'll be stuck in the edit, compile,
download, test cycle because no other application environment currently
exists for them to do anything else.

That's why I'm here having this discussion.


I look forward to hearing your comments on these thoughts.
I think it would be a good investment of your time to take a hard look
at existing mature Forth cross-compilers. You can get a CD with
extensive docs and links to free evaluation versions of our SwiftX
cross-compilers for many chips (8051, 68HCS08, 68HC11, MSP430, AVR, ARM,
68HC12, 68K family, Coldfire, more) for only $15. You can get supported
boards for most of these processors very inexpensively. The evaluation
compilers are limited only in the size of the target app you can
develop, so you can exercise them and learn a lot. For more info go to
http://www.forth.com/embedded/index.html.

I'll take a look. But frankly I won't get the warm fuzzies about it
until I'm sure that it in fact offers the type of environment I hoping
to run. It's also compilicating that SwiftX is a Windows product (and
justifiably so) and I'm a Linux guy (also justifiably so).

Well, IMO the only way to find out if this is the type of environment
you're looking for is to try it. As for Windows vs. Linux, we don't
necessarily love Windows, but we need to make a living, and that's where
95% of the market is.

I know. That's why I said justifiably so. My small aside on that subject
is that if tool developers could find a way to develop cross platform
tools without expending too much additional effort, then maybe a more
equitable distribution of market share would follow.

BAJ
.



Relevant Pages

  • Re: Reports direct to email
    ... Recipient 1 ... "Your observations of Target 1 from Cycle 2 are ready for download" ...
    (microsoft.public.access.reports)
  • Re: Problem with KB890859 update...
    ... >following link, download this file, execute it, allow it to collect the logs, ... >> wants to install on every boot, but now it FAILS every time. ... >> WindowsUpdate with code Call complete and error 0 ...
    (microsoft.public.windowsupdate)
  • Re: Save Target As fails to save
    ... - Created a whole new policy and set it up the same as the 'failing' one. ... > Save Target As and Print Target Commands Are Unavailable ... Go to this site and read on adjusting the Privacy settings in ZA. ... > Value of Download Directory ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: Build your own Forth for Microchip PIC: the nature of metacompilation
    ... compile themselves and are organized into vocabularies that changes the ... Each word in the TARGET vocabulary needs 4 data elements in its definition: ... And we keep a global switch that governs whether we're transferring definitions as they're compiled vs. building an image for download, so no need to track individual definitions. ... You may wish to save an image of your kernel or app for flashing at the start of a development session, but not individual vocabularies. ...
    (comp.lang.forth)
  • Re: Error Code: 80070005 - DM: SusMakeDirectoryW failed with 0x80070005.
    ... All updates prior to XP SP2 installed ... Skipping download. ... CClientCallRecorder::BeginFindUpdates from WindowsUpdate with call ...
    (microsoft.public.windowsupdate)