Re: RfD - Escaped Strings (long)
- From: "Alex McDonald" <alex_mcd@xxxxxxxxxxxxxxx>
- Date: 21 Aug 2006 16:17:41 -0700
Stephen Pelc wrote:
On 21 Aug 2006 15:10:45 -0700, "Alex McDonald"
<alex_mcd@xxxxxxxxxxxxxxx> wrote:
Way back in IBM360 BAL land, "abcd""def" was the answer; a double
doublequote colapsed to a single doublequote, and parsing of the string
continued. Could S" be extended to accept "" without breaking existing
code?
Has it any common practice in Forth?
Err... no. But it would be useful as it's a common case, and could be
trivially implemented.
Why do we need two representations, both of variable length?
This proposal selects the hexadecimal representation, requiring
two hex digits. A consequence of this is that xchars must be
represented as a sequence of pchars. Although initially seen as a
problem by some people, it avoids the endian problems involved
in storing an xchar.
Here I would propose
\unnnn
and
\Unnnnnnnn
for UTF16 and UTF32 support. Python iirc supports this construct. It
avoids any ambiguity over endianess problems.
What terminates it? If you want say '00' immediately after
\Uxxxxxx do you write \Uxxxxxx00 which I believe to be
ambiguous. Variable length extensions without a terminator
are dangerous!
They're fixed length; \u has 4 digits, \U has 8.
The use of hex characters is not just to provide wide
character support, but also allow insertion of control
codes into comms channels, e.g. Telnet IAC handling.
Anton is pushing hard for UTF-8 support. I argue that separated
octets supports UTF-8/16/32 without any required changes.
\x12\x34 has a specific storage order, as does \x1234, if I get the
details of the proposal correctly. They're endian sensitive. \u and \U
don't have that problem; they're stored as required by the endianness
of the target, not in the order specified.
Another advantage of the octet approach is that it enables
16 bit embedded systems to support characters of any size
wider than a cell. With UTF-8 this is required even on a
32 bit Forth.
I wasn't considering UTF8, just UTF16 and 32 support.
Stephen
--
Stephen Pelc, stephenXXX@xxxxxxxxxxxx
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
--
Regards
Alex McDonald
.
- Follow-Ups:
- Re: RfD - Escaped Strings (long)
- From: Stephen Pelc
- Re: RfD - Escaped Strings (long)
- References:
- RfD - Escaped Strings (long)
- From: Stephen Pelc
- Re: RfD - Escaped Strings (long)
- From: Alex McDonald
- Re: RfD - Escaped Strings (long)
- From: Stephen Pelc
- RfD - Escaped Strings (long)
- Prev by Date: Re: RfD - Enhanced local variable syntax (long)
- Next by Date: Re: SYNONYM
- Previous by thread: Re: RfD - Escaped Strings (long)
- Next by thread: Re: RfD - Escaped Strings (long)
- Index(es):
Relevant Pages
|