Re: RfD - Escaped Strings (long)



On 21 Aug 2006 15:10:45 -0700, "Alex McDonald"
<alex_mcd@xxxxxxxxxxxxxxx> wrote:

Way back in IBM360 BAL land, "abcd""def" was the answer; a double
doublequote colapsed to a single doublequote, and parsing of the string
continued. Could S" be extended to accept "" without breaking existing
code?

Has it any common practice in Forth?

Why do we need two representations, both of variable length?
This proposal selects the hexadecimal representation, requiring
two hex digits. A consequence of this is that xchars must be
represented as a sequence of pchars. Although initially seen as a
problem by some people, it avoids the endian problems involved
in storing an xchar.


Here I would propose

\unnnn
and
\Unnnnnnnn

for UTF16 and UTF32 support. Python iirc supports this construct. It
avoids any ambiguity over endianess problems.

What terminates it? If you want say '00' immediately after
\Uxxxxxx do you write \Uxxxxxx00 which I believe to be
ambiguous. Variable length extensions without a terminator
are dangerous!

The use of hex characters is not just to provide wide
character support, but also allow insertion of control
codes into comms channels, e.g. Telnet IAC handling.

Anton is pushing hard for UTF-8 support. I argue that separated
octets supports UTF-8/16/32 without any required changes.
Another advantage of the octet approach is that it enables
16 bit embedded systems to support characters of any size
wider than a cell. With UTF-8 this is required even on a
32 bit Forth.

Stephen

--
Stephen Pelc, stephenXXX@xxxxxxxxxxxx
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
.



Relevant Pages

  • Re: RfD - Escaped Strings (long)
    ... This proposal selects the hexadecimal representation, requiring ... it avoids the endian problems involved ... for UTF16 and UTF32 support. ... Stephen Pelc, stephenXXX@xxxxxxxxxxxx ...
    (comp.lang.forth)
  • Re: Unicode-based FreeBSD
    ... displaying specialised characters on the screen/tty. ... There are special Input Methods for the rest of Unicode. ... Unicode support and the FreeBSD developers see little reason to ...
    (freebsd-current)
  • Re: Unicode-based FreeBSD
    ... than any other Chinese character sets (including traditional and simplified ... The UTF-8 support in FreeBSD/Xorg is good enough for me. ... I can read/type all Unicode 4.0 characters ... There are two reasons to use any character sets other than UTF-8: ...
    (freebsd-current)
  • Re: Industry Standard Security and guest wifi access best practice
    ... what is specifically unacceptable is requiring technician ... Could you elaborate on the "scale" that you're looking for? ... Availability of offsite support and admin? ... latest FON firmware has this feature. ...
    (alt.internet.wireless)
  • Re: =?ISO-8859-1?Q?Soup=E7on_of_cedilles_and_aper=E7us?=
    ... Note that most of these support only ASCII ... plus a few accented characters, ... Unicode supports, in principle, the characters of every written ... The catch is that not all newsreaders are new enough to ...
    (alt.usage.english)