Re: The standard I/O problems
- From: Andreas <akk@xxxxxxxxxx>
- Date: Wed, 08 Apr 2009 18:02:47 +0200
Long statement, short answer: IMO these are no "problems" because
You either have an OS or your Forth is the OS
or
you either have a terminal (or communication line) or your Forth communicates with the bare hardware
If you have OS and/or terminal you adapt to what they need/provide
Otherwise you are on your own, but hopefully can copy&modify somebody's code
Next time someone will come up with a standard proposal for a non-returning return stack. ;-))
Salud
Andreas
Jonah Thomas schrieb:
I want to try to put together a clear description of several of the.
problems that are being worked on for 200x. I don't particularly have
solutions but I hope that laying out the problems will be useful.
## Historical context. ##
In the old days, source code was in seven-bit characters.
Later we got a system where 0-31 were control characters and some of
those had meanings and resulted in immediate actions. 32-127 were
characters. 128-255 were not standardised by the computing community and
were handled in various incompatible ways.
Forth-94 codified existing practice. 32-127 were characters. 0-31 could
all be interpreted as space except the ones that become EOL, or the
Forth system could do something with them that would be invisible to
Forth systems. 128-255 didn't belong in names of standard programs. A
specific Forth could use other characters but for source code they had
to translate to 32-127 plus an EOL sequence and for files maybe an EOF
sequence.
Forth-94 also provided an EKEY word for Forth programs to use to get
extra stuff from the input stream, and this did not actually provide
much in the way of standardised behavior. The EKEY word was kind of like
the CODE word. It said basicly "Here's a standard word that tells you
something unportable is about to happen.".
## Newer issues ##
The way I see it, there are basicly four new behaviors that people want
to standardise. These behaviors overlap and interact, but I want to
present them separately.
1. Eddies in the input stream: The input stream, particularly from the
keyboard, might include not just control characters but complicated
extra information that tells the system to do special things even when
the final effect is just a stream of ASCII characters. There's
backspace, there's move-cursor, there's delete, insert, and many many
more. The kind I described affect what source code characters the Forth
interpreter will see, and what order they will be.
Also there are various extended character sets for input. These can come
in source code or as text to display. Extra information in source code
may be an important part of names or possibly numbers, or it may be
discarded with no bad effect. Things that may be discarded include
character Case, utf8, literate programming source, unicode, etc. All of
this could be massaged away from a file before the file is INCLUDEd. We
could say that in source code names should only include characters that
translate directly to ASCII 32-127 plus EOL, or we could allow programs
to use the extra information, a la ColorForth etc. But source code that
depends on extra information can't be used on Forths that don't know how
to parse that, unless the files are first somehow massaged.
As far as the input stream is concerned, extra information that will
only affect display can be just carried along until it's time to display
it. But you might want to carry along character sequences that would
otherwise be used to affect the input -- backspaces, returns, tabs, etc
etc. It has to be decided which sequences to respond to and which to
store.
2. Turbulence in the output stream: Text that is intended to be
displayed can contain special sequences that affect the display and can
contain multiple-length characters. If all you do with this is accept
strings, MOVE whole strings from place to place, and TYPE strings, then
all the extra information can be ignored by Forth programs. All you need
to know is how long the string is, and TYPE or the OS's display code
does the rest. But if you need to manipulate the string -- maybe to do
something with the special characters, like translate them from one form
to another -- then you have problems. If there are variable-length
characters then you can't just search for the sequences you're
interested in -- you might get the last part of one character mixed with
the first part of the next. frameshift errors. It's a mess. EMIT doesn't
work any better than KEY, CHAR+ doesn't either. Nor does SEARCH. A
32-bit Forth can keep most variable-length characters as a single stack
item but a 16-bit Forth can't.
I think variable-length characters are enough to give you all the
output-stream problems. Solve it for those and you've probably solved
the rest? Or not? Probably a whole new string-manipulation wordset is
needed.
3. Localisation: We'd like a good obvious way to translate I/O strings
from one human language to another. Looking through many thousands of
lines of source code for the strings is not it, particularly when the
strings might sometimes get massaged in ways that aren't obvious. And
ideally the new strings would be displayed in an esthetic way. If the
user display area is sixteen 32-character lines and you want to be sure
the massaged translated message fits into that with no ugly line
breaks.... There are complicated issues here but surely some of them can
be standardised, and some of those depend heavily on how #2 above gets
solved.
4. Input devices: Forth gets used to get input from lots of specialised
devices, and there's no standard way to say you're doing that. It's
tempting to redirect the input stream and use EKEY for that. But there's
nothing standard about what you get or what you do with it. Is there any
value in specifying that there might be more than one input device and
giving some sort of standard name for the things you do to read from
them and send control data to them? Special output devices too? With
output devices you can use standard words to create the bit-strings you
send them, and then one nonstandard word can send a string of chars or a
cell or whatever. You can do the same with special input devices. You
can write a device driver in Forth if you need to, or your Forth might
provide it, and there's no porting the result to any system which lacks
that particular device.
Is it even worthwhile to redirect special I/O to KEY EMIT etc? Probably
not, if you still have a user who will need his own I/O. If it isn't
worth doing that way then #4 is an unrelated issue.
## Conclusion ##
The basic issues are:
sequences that massage the input stream
sequences that massage the output stream
interaction among those sequences
manipulating those sequences with standard Forth
localisation
source code apart from strings that will be output unchanged
The source code issue looks the easiest to me -- if the source can't be
massaged into upper-case ASCII with spaces and EOLs, then that source is
not completely portable. .( ." sequences etc may not display correctly
on all Forths if they contain characters that can't be massaged into
ASCII If you expect KEY to return something that you can't compare to
CHAR ASCII-CHAR then that might not be portable either.
I think the others are so hard that the traditional words cannot be
salvaged. They need one or more new wordsets.
- Follow-Ups:
- Re: The standard I/O problems
- From: Jeff Fox
- Re: The standard I/O problems
- References:
- The standard I/O problems
- From: Jonah Thomas
- The standard I/O problems
- Prev by Date: Re: More Chuck news
- Next by Date: Re: How would you do this in forth?
- Previous by thread: Re: The standard I/O problems
- Next by thread: Re: The standard I/O problems
- Index(es):
Relevant Pages
|