Re: If you got to choose the syntax, what would you do?




Jeff Fox wrote:
Dennis Ruffer wrote:
Ok, you hooked me enough to stop me from sleeping, which is getting
pretty common these days, so I should put in a few thoughts. I didn't
get involved too much on the front end design of OTA, but I had many
"discussions" about this in Open Firmware (OF). There we had a 1 Meg
ROM, which was always full, and I developed many alternatives to keep
all of Apple's "new world" systems so we could rebuilt them from our
common source. I'm not sure how long it will be before I can talk
about their proprietary stuff, so I'll try to stick with generic
concepts.

First, I've never seen much difference between OTA and OF. Both are
tokenizing systems and both have the ability to execute the token
streams directly, without translating them into some internal form.

ok. Do they execute the tokens serially from the stream or do they
collect them in memory first?

However, in both cases, we did de-tokenize to gain run-time speed.

I am not sure what you mean by de-tokenize in this context. I think
it could mean used normal source instead, or it could mean
uncompressing
tokens after they arrive. I am a little confused about that.

I think the primary difference between the two revolves around the fact
that OF carries the definition names, allowing run-time interpretation,
but in OTA we could also optimized the stream by having the tokenizer
create new definitions out of common sequences.

It may take me some time to figure that one out.

Given the same
incentive of squeezing the maximum amount of code into the minimum
space, I think the same could be done with OF, but the run-time,
interpretive decompile might be interesting.

Anyway, you end up with a few predefined tokens for kernel words,
hopefully using the smallest tokens, and new definitions are allocated
sequential tokens by the tokenizer. Technically, this is the same
process that simple compression techniques do, with the added benefit
of removing white space and comments, and this is where the token
approach will win, every time. The compression approach does end up
reducing the most common element to a single bit, but in OF, we also
compressed the token stream, which negates that advantage. We still
had to play some other games to get each system to fit in a 1 Meg image
(with redundancy), but no one (I heard of) was ever able to do better.
I didn't invent this, but even I could not come up with a way to
improve it easily. The "games" I played involved eliminating code that
didn't belong on some configurations by making "driver" modules. This
kept us going to the end, but it was starting to get tight again.

Interesting stuff.

So, tokenize to remove irrelevant source (white space, comments and
configuration choices) and then compress to optimize common sequences.
This produces a small stream with minimal overhead, if that is your
priority.

Well we have 5-bit opcodes can be considered tokens that represent
about
80% of programs and factoring code into short words results in very
small
code as Forth itself is a form of compression. It is pretty hard to
compress
the object code further and it directly executable by the processor
from
the stream and one transfer can send up to four opcodes that execute
immediately. But this is just one of that system's features.

You do have compile time and run time overhead, but both can
be 1 time events. In a SEAforth environment, or any time you have
multiple instances of the run time image, you replicate this overhead,
but the decompress/detokenization techniques are well known and pretty
easy to optimize.

Well I hadn't intended to talk about this here in this thread, but
I don't this heading for the syntax I asked for.

On SEAforth24A we have 64 words of RAM.

I have been told that I should put a routine in there to fetch store
and
execute tokens from the outside world, or put a Forth compiler in there
and send it source code. Ok. That's going to be a pretty small
Forth comiler I hope.

I am confused. why would they ask for you to embed a Forth compiler on
a SEAforth24A if the processor itself IS what would be considered the
traditionnal Forth virtual machine? Isn't the point of having a Forth
chip to implement in hardware what we usually use in software? What are
you reffering to when you say Forth compiler in this context? The text
interpreter?

But it probably isn't going to leave much of that 64 words for
application
code after the compilation.

The source code, even tokenized, is probably going to be larger
than compiled code when most of it is tiny tokens after compilation.

There is going to be time required to compile an incoming message.

There have been some guesses in this newsgroup and posts about
Forthlets being about the kind of tokenization and compilation that
you describe, but they aren't. But I do appreciate your educational
explanation of this stuff.

Now, there is some room for improvement here. It would be nice to make
the tokenizer use bits for tokens, rather than byte multiples. It
would also be nice if the tokenizer were able to assign the smallest
tokens to the most commonly used sequences. You'd have the overhead of
a token table to deal with, but then you have probably done the major
portions of the compression step, so it could be eliminated entirely.
Tokenization and compression are really the same process at that point
and the detokenization speed would probably not be affected near as
much as the time it takes to do a separate decompress step.

Yes. I have always wondered about an instruction set that was
compressed the same way.

I'm not going to get into measuring code or message stream sizes.
There are way too many variables and the coding of my wish list is
non-trivial. I'm also not going to guess timings, for the same
reasons. All I will say at this point is that OF did have a pretty
good history. Intel's EFI marketing is winning the war now, but I
think IBM is still holding out. There's plenty of prior art to select
from and a few programmers who know it well. Some of whom are even
available at the moment. ;)

Thanks.

If you have to programs running in parallel on two different computers
they can synchronize programs on a token exchange. This is
the minimal function of a communication channel.

And perhaps each computer in a multiprocessor has its own
mass storage devices from which it loads its copy of its program,
but perhaps not. In embedded system, or even in workstation
farms, many nodes may not have mass storage. So they
have to load their programs from the network and that requires
software, and that requires syntax.

Now I understand that Forth is small enough that one can put
a Forth compiler on a pretty small computer and send it programs
in source form and have it compile them on top in a normal
Forth manner pretty fast, or interpret them pretty fast directly from
the stream using software.

This is part of the picture. We think it is useful to be able to load
programs into memory and also to execute some directly from
the stream. Unless that Forth compiler or tethered interface driver
or application code are in ROM they are going to be in FLASH or
RAM and have to loaded at least once before they run.

If the nodes are too small to host a compiler and still have room for
efficient apps then the loading of compiled programs into memory
or executing them directly from a stream are a remaining option.

Syntax? I type the name of a program on the command line to load it
and run it. Yes but what is the syntax in your Forth programs to do
this?

Syntax? What if I have 24 processors but don't have 24 monitors and
keyboards? I could have 24 windows on my screen and go to
24 command lines and type the name of program like I would in DOS?
The problem of 24 or more nodes all doing this sort of thing
asynchronously
is one that offers several problems. When the embedded application is
running I am not going to be there with the development system. A
program is going to have to send or remotely execute another program
on another node.

1. The most basic form is expressed as one computer telling another to
execute some code directly from the stream.

2. The next level would be to tell it to load a program into memory.

3. The next level would be to tell it to run that program.

4. Then the next and next level, and on to more complex problems.

But you start with the simplest thing when trying to explain how it
works or ask how other people do #1.

I have tried to focus on what syntax people use for #1.

I don't know the answer. I understand that the source gets sent
and that the source gets interpreted and that is common practice.

I suppose the syntax to do that is not appropriate for sending
a compiled program to have it executed from the stream.

So perhaps no one who has been sending compiled code, doing
code overlays, or executing compiled code from streams out
there cares about what Forth syntax other people use to do it.

I am not getting a strong sense of preference for how other people
like to do that in Forth now.

Regards
Jean-Francois Michaud

.



Relevant Pages

  • Re: If you got to choose the syntax, what would you do?
    ... pretty common these days, so I should put in a few thoughts. ... Do they execute the tokens serially from the stream or do they ... execute tokens from the outside world, or put a Forth compiler in there ...
    (comp.lang.forth)
  • Re: If you got to choose the syntax, what would you do?
    ... pretty common these days, so I should put in a few thoughts. ... You can execute the stream serially, but there has to be some local storage to hold the largest loop sequence. ... Anyway, you end up with a few predefined tokens for kernel words, ... execute tokens from the outside world, or put a Forth compiler in there ...
    (comp.lang.forth)
  • Re: Handling error/status messages by interface to C++ programs
    ... The last time I worked on a compiler was in the late '80s and the language was BLISS. ... The means you can figure out what to do in the next parsing step by knowing what the previous, current, and next tokens are. ... one needs to instantiate the objects and their relationships. ... The BNF specification will define and instantiate the [Syntax Row] and objects. ...
    (comp.object)
  • Re: extended operators
    ... uint32_t u32rotl{ ... my compiler currently doesn't handle constants this large... ... together a "cleaner" way of dealing with temporary text buffers, ... different sets of tokens. ...
    (comp.std.c)
  • Re: Sun forte7 f90 bug
    ... >> each and every occurence of the common block named DATUMS, ... I assumed that the compiler writers, in their infinite wisdom, ... >> issue with a number of other experienced fortran programmers. ...
    (comp.lang.fortran)

Loading