Re: If you got to choose the syntax, what would you do?



On 2006-06-26 04:00:02 -0400, "Jeff Fox" <fox@xxxxxxxxxxxxxxxxxxx> said:

Dennis Ruffer wrote:
Ok, you hooked me enough to stop me from sleeping, which is getting
pretty common these days, so I should put in a few thoughts. I didn't
get involved too much on the front end design of OTA, but I had many
"discussions" about this in Open Firmware (OF). There we had a 1 Meg
ROM, which was always full, and I developed many alternatives to keep
all of Apple's "new world" systems so we could rebuilt them from our
common source. I'm not sure how long it will be before I can talk
about their proprietary stuff, so I'll try to stick with generic
concepts.

First, I've never seen much difference between OTA and OF. Both are
tokenizing systems and both have the ability to execute the token
streams directly, without translating them into some internal form.

ok. Do they execute the tokens serially from the stream or do they
collect them in memory first?

You can execute the stream serially, but there has to be some local storage to hold the largest loop sequence. I haven't seen this done much, due to the speed issue. However, if storage is a priority, it can be accomplished with a limit on the size of loops.

However, in both cases, we did de-tokenize to gain run-time speed.

I am not sure what you mean by de-tokenize in this context. I think
it could mean used normal source instead, or it could mean
uncompressing
tokens after they arrive. I am a little confused about that.

This assumes you have enough storage to decompress/detokenize. You trade off stream size with local storage and run-time speed. It also assumes that the message is going to be executed more than once.

I think the primary difference between the two revolves around the fact
that OF carries the definition names, allowing run-time interpretation,
but in OTA we could also optimized the stream by having the tokenizer
create new definitions out of common sequences.

It may take me some time to figure that one out.

It took a little while to write the optimizer too.

Given the same
incentive of squeezing the maximum amount of code into the minimum
space, I think the same could be done with OF, but the run-time,
interpretive decompile might be interesting.

Anyway, you end up with a few predefined tokens for kernel words,
hopefully using the smallest tokens, and new definitions are allocated
sequential tokens by the tokenizer. Technically, this is the same
process that simple compression techniques do, with the added benefit
of removing white space and comments, and this is where the token
approach will win, every time. The compression approach does end up
reducing the most common element to a single bit, but in OF, we also
compressed the token stream, which negates that advantage. We still
had to play some other games to get each system to fit in a 1 Meg image
(with redundancy), but no one (I heard of) was ever able to do better.
I didn't invent this, but even I could not come up with a way to
improve it easily. The "games" I played involved eliminating code that
didn't belong on some configurations by making "driver" modules. This
kept us going to the end, but it was starting to get tight again.

Interesting stuff.

So, tokenize to remove irrelevant source (white space, comments and
configuration choices) and then compress to optimize common sequences.
This produces a small stream with minimal overhead, if that is your
priority.

Well we have 5-bit opcodes can be considered tokens that represent
about
80% of programs and factoring code into short words results in very
small
code as Forth itself is a form of compression. It is pretty hard to
compress
the object code further and it directly executable by the processor
from
the stream and one transfer can send up to four opcodes that execute
immediately. But this is just one of that system's features.

If you messages are small, as in this demo, there's not much to gain by doing more elaborate compression.

You do have compile time and run time overhead, but both can
be 1 time events. In a SEAforth environment, or any time you have
multiple instances of the run time image, you replicate this overhead,
but the decompress/detokenization techniques are well known and pretty
easy to optimize.

Well I hadn't intended to talk about this here in this thread, but
I don't this heading for the syntax I asked for.

On SEAforth24A we have 64 words of RAM.

I have been told that I should put a routine in there to fetch store
and
execute tokens from the outside world, or put a Forth compiler in there
and send it source code. Ok. That's going to be a pretty small
Forth comiler I hope.

But it probably isn't going to leave much of that 64 words for
application
code after the compilation.

The source code, even tokenized, is probably going to be larger
than compiled code when most of it is tiny tokens after compilation.

There is going to be time required to compile an incoming message.

There have been some guesses in this newsgroup and posts about
Forthlets being about the kind of tokenization and compilation that
you describe, but they aren't. But I do appreciate your educational
explanation of this stuff.

I'm not sure I see the difference. SEAforth simply executes the tokens directly, or am I missing something?

Now, there is some room for improvement here. It would be nice to make
the tokenizer use bits for tokens, rather than byte multiples. It
would also be nice if the tokenizer were able to assign the smallest
tokens to the most commonly used sequences. You'd have the overhead of
a token table to deal with, but then you have probably done the major
portions of the compression step, so it could be eliminated entirely.
Tokenization and compression are really the same process at that point
and the detokenization speed would probably not be affected near as
much as the time it takes to do a separate decompress step.

Yes. I have always wondered about an instruction set that was
compressed the same way.

The problem here would be one of changing the instruction set with every image. It's a WISC on steroids. ;)

I'm not going to get into measuring code or message stream sizes.
There are way too many variables and the coding of my wish list is
non-trivial. I'm also not going to guess timings, for the same
reasons. All I will say at this point is that OF did have a pretty
good history. Intel's EFI marketing is winning the war now, but I
think IBM is still holding out. There's plenty of prior art to select
from and a few programmers who know it well. Some of whom are even
available at the moment. ;)

Thanks.

If you have to programs running in parallel on two different computers
they can synchronize programs on a token exchange. This is
the minimal function of a communication channel.

And perhaps each computer in a multiprocessor has its own
mass storage devices from which it loads its copy of its program,
but perhaps not. In embedded system, or even in workstation
farms, many nodes may not have mass storage. So they
have to load their programs from the network and that requires
software, and that requires syntax.

Now I understand that Forth is small enough that one can put
a Forth compiler on a pretty small computer and send it programs
in source form and have it compile them on top in a normal
Forth manner pretty fast, or interpret them pretty fast directly from
the stream using software.

Remember that in a tokenized concept, the detokenization step is only a part of the normal compiler. The interpreter step is reduced to a table lookup.

This is part of the picture. We think it is useful to be able to load
programs into memory and also to execute some directly from
the stream. Unless that Forth compiler or tethered interface driver
or application code are in ROM they are going to be in FLASH or
RAM and have to loaded at least once before they run.

If the nodes are too small to host a compiler and still have room for
efficient apps then the loading of compiled programs into memory
or executing them directly from a stream are a remaining option.

I can see the dilemma in SEAforth, where there appears to be a de-emphasis of shared memory. This is in conflict with creating common tasks that each node must execute. It doesn't make sense to reproduce this shared code in a memory constrained environment.

Syntax? I type the name of a program on the command line to load it
and run it. Yes but what is the syntax in your Forth programs to do
this?

Syntax? What if I have 24 processors but don't have 24 monitors and
keyboards? I could have 24 windows on my screen and go to
24 command lines and type the name of program like I would in DOS?
The problem of 24 or more nodes all doing this sort of thing
asynchronously
is one that offers several problems. When the embedded application is
running I am not going to be there with the development system. A
program is going to have to send or remotely execute another program
on another node.

1. The most basic form is expressed as one computer telling another to
execute some code directly from the stream.

2. The next level would be to tell it to load a program into memory.

3. The next level would be to tell it to run that program.

4. Then the next and next level, and on to more complex problems.

But you start with the simplest thing when trying to explain how it
works or ask how other people do #1.

I have tried to focus on what syntax people use for #1.

Moving beyond the message as command concept requires that you have a module store. Then your tokens can reference those modules and you've just scaled up one level. However, you have to architect a storage solution to get there.

I don't know the answer. I understand that the source gets sent
and that the source gets interpreted and that is common practice.

I suppose the syntax to do that is not appropriate for sending
a compiled program to have it executed from the stream.

So perhaps no one who has been sending compiled code, doing
code overlays, or executing compiled code from streams out
there cares about what Forth syntax other people use to do it.

I am not getting a strong sense of preference for how other people
like to do that in Forth now.

While the tokenization approaches can be considered to simply be a source conversion technique, I do think they are significantly different to be called an alternative to source, and there is enough history and prior art to indicate a significant preference.

What are you looking for to give you this sense?

DaR

.



Relevant Pages

  • Re: If you got to choose the syntax, what would you do?
    ... pretty common these days, so I should put in a few thoughts. ... Do they execute the tokens serially from the stream or do they ... execute tokens from the outside world, or put a Forth compiler in there ...
    (comp.lang.forth)
  • Re: If you got to choose the syntax, what would you do?
    ... pretty common these days, so I should put in a few thoughts. ... Do they execute the tokens serially from the stream or do they ... execute tokens from the outside world, or put a Forth compiler in there ...
    (comp.lang.forth)
  • Re: Handling error/status messages by interface to C++ programs
    ... The last time I worked on a compiler was in the late '80s and the language was BLISS. ... The means you can figure out what to do in the next parsing step by knowing what the previous, current, and next tokens are. ... one needs to instantiate the objects and their relationships. ... The BNF specification will define and instantiate the [Syntax Row] and objects. ...
    (comp.object)
  • Re: extended operators
    ... uint32_t u32rotl{ ... my compiler currently doesn't handle constants this large... ... together a "cleaner" way of dealing with temporary text buffers, ... different sets of tokens. ...
    (comp.std.c)
  • Re: Lisp in hardware
    ... It has great advantages to execute s-expressions. ... But for the development of the Lisp system itself. ... > It is all so simple if you dare to abandon the compiler. ... from other 'normal' interpreters for Lisp? ...
    (comp.lang.lisp)

Loading