aus and chars (was: CMOVE wrong?)
- From: anton@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl)
- Date: Sat, 03 Mar 2007 14:10:07 GMT
"J Thomas" <jethomas5@xxxxxxxxx> writes:
What good are raw address units? I remember asking somebody who was on
the standards committee in the early days about that, and he mumbled
something about somebody who'd been on the committee for a while who
had a system that was nibble-addressed, who insisteed address units
shouldn't be bytes. But after they catered to him he dropped out.
A more relevant example in the early-90s timeframe is 16-bit
characters on byte-addressed machines, and Jack Woehr even implemented
such a Forth system to demonstrate that the Forth-94 approach works in
that setting (I don't think this system got much use, though).
If
you can address individual nibbles I don't see much you can do with
those addresses.
You can use the native addresses of the hardware.
An alternative would be to use a different, non-native representation
for addresses, such that 1 chars = 1. The downside here is that there
would have to be conversions between the Forth addresses and the
native addresses in a number of places, in particular in all
memory-access primitives, and when communicating with non-Forth
software.
Given Forth's character as a language that is close to the metal, I
feel that such a non-native representation for addresses is somewhat
against the spirit of the language.
In any case, the Win32Forth users have some experience with using a
non-native address representation, as once upon a time they used
addresses relative to some base (and conversions words like REL>ABS
and ABS>REL) instead of the native addresses in their system.
A language that took this approach is BCPL: it uses word addressing,
with consecutive words having consecutive addresses, and it does not
perform type checking; therefore it cannot use native addresses
(except on word-addressed machines). AmigaOS was partly written in
BCPL, and from what I read about it, the address conversion necessary
in many places was a significant pain.
So it seems to me that if you happen to have a system where an address
unit is smaller than a char, you could do:
: ALLOT CHARS ALLOT ;
: MOVE CHARS MOVE ;
: ERASE CHARS ERASE ;
: UNUSED CHARS UNUSED ;
: ALLOCATE CHARS ALLOCATE ;
: RESIZE CHARS RESIZE ;
: DUMP CHARS DUMP ;
: CHARS ; IMMEDIATE
and all your standard programs will work just as before
create a 5 cells allot
5 a 4 cells + !
a 4 cells erase
a 4 cells + @ .
would produce the wrong result on such a system. Ok, that would be
fixable by redefining CELLS (and FLOATS etc), but here's the killer:
s" abcdef" drop 3 chars + c@ emit
Here your system with the redefined CHARS would try to access the char
starting at the fourth au instead of at the fourth char of the string.
You cannot just redefine + in Forth to work as you would need. You
could do it in StrongForth.
Basically, what you are thinking of is something similar to the
approach that C took: automatic scaling of address arithmetic by type
size. That's appropriate for C with it's type-aware compiler (and
even there I think it's not the greatest idea), but not for Forth.
However, I think that 1 CHARS = 1 is something that all current and
future standard systems (apart from that demonstration system by Jack
Woehr) will support, for the following reasons:
- There is a large amount of code around that assumes that 1 CHARS = 1.
- While there is hardware where an au is smaller than a byte, probably
nobody will implement a standard Forth system on such hardware (too
little resources).
- In the early 1990s the eventual transition to 16-bit fixed-width
Unicode characters (UCS-2) looked likely. However, UCS-2 was
superseded by the variable-width UTF-16 and the fixed-width
UCS-4/UTF-32, and apart from a few niches no transition to UCS-2 or
UCS-4 has happened. Instead, the transition to Unicode that I see
is mainly towards variable-width UTF-8 with 8-bit granularity; other
popular encodings like GB18030 (or tGB2312 and GBK that GB18030 is
based on) are also variable-width with byte granularity.
The Forth-94 approach does not fit variable-width encodings, but
there is the xchars proposal for dealing with variable-width
encodings, and it is compatible with 1 CHARS = 1 for byte- or
word-addressed machines (especially if the encoding has byte
granularity).
So, there is no need to drop 1 CHARS = 1 to support Unicode.
- And even if somebody would implement a standard system on a
nibble-addressed machine, or a system with UCS-4 characters on a
byte-addressed machine, they would probably choose the BCPL-style
approach; while that is painful in places, being able to run the
code mentioned above without having to find all the places where
CHARS has been forgotten etc. is worth the pain.
If I am right about the support of 1 CHARS = 1 by all significant
systems, then you can just rely on that support; we might also
formally standardize this extension in Forth200x.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/
.
- Follow-Ups:
- Re: aus and chars (was: CMOVE wrong?)
- From: Albert van der Horst
- Re: aus and chars (was: CMOVE wrong?)
- From: Bruce McFarling
- Re: aus and chars (was: CMOVE wrong?)
- From: J Thomas
- Re: aus and chars (was: CMOVE wrong?)
- References:
- Re: CMOVE wrong?
- From: Ed
- Re: CMOVE wrong?
- From: Elizabeth D Rather
- Re: CMOVE wrong?
- From: J Thomas
- Re: CMOVE wrong?
- Prev by Date: Re: CMOVE wrong?
- Next by Date: Re: FizzBuzz
- Previous by thread: Re: CMOVE wrong?
- Next by thread: Re: aus and chars (was: CMOVE wrong?)
- Index(es):
Relevant Pages
|