Re: RfD: XCHAR wordset (for UTF-8 and alike)
- From: stephenXXX@xxxxxxxxxxxx (Stephen Pelc)
- Date: Mon, 26 Sep 2005 10:17:02 GMT
On Mon, 26 Sep 2005 00:16:25 +0200, Bernd Paysan <bernd.paysan@xxxxxx>
wrote:
>ASCII is only appropriate for the English language. Most western languages
>however fit somewhat into the Forth frame, since a byte is sufficient to
>encode the few special characters in each (though not always the same
>encoding can be used; latin-1 is most widely used, though). For other
>languages, different char-sets have to be used, several of them
>variable-width. Most prominent representant is UTF-8. Let's call these
>extended characters XCHARs. Since ANS Forth specifies ASCII encoding, only
>ASCII-compatible encodings may be used.
How does this fit in with the wide character and internationalisation
proposals at
www.mpeforth.com/arena/
i18n.propose.v7.PDF
i18n.widechar.v7.PDF
These proposals/RFCs are from the application developers point of
view. There's a sample implementation in the file
LIB\INTERNATIONAL.FTH in the VFX Forth distribution. The file
is derived from 15+ years of experience. From the file header:
"You are free to use this code in any way, as long as the MPE
copyright notice in this section is retained.
This code is an implementation of the draft ANS internationalisation
specification available from the download area of the MPE web site.
The implementation provides more functionality than is required by
the ANS draft standard and provides enough hooks to be the basis of
a practical system."
>XCHAR- ( xc_addr1 -- xc_addr2 )
>Goes backward from xc_addr1 until it finds an XCHAR so that the size of this
>XCHAR added to xc_addr2 gives xc_addr1. Note: XCHAR- isn't guaranteed to
>work for every possible encoding.
IMHO standardising a word that can't be guaranteed to work is not
beneficial. If you must step back through a string, extend the
definition of /STRING to form /-STRING or some such, such that
the start of the string must be at the start of a character.
IMHO your approach is from the implementor's perspective, which is
valuable. But all our (Willem, Nick, Peter, Stephen) reviews showed
that what *applications* do with strings is at a *much* higher level
than implementors issues.
Can we merge the application developer issues with the kernel
issues? These inclue cleaning up the meaning of character,
byte/octet access, file wors and son on.
I look forward to discussing these issues at EuroForth 2005.
Stephen
--
Stephen Pelc, stephenXXX@xxxxxxxxxxxx
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
.
- Follow-Ups:
- Re: RfD: XCHAR wordset (for UTF-8 and alike)
- From: Anton Ertl
- Re: RfD: XCHAR wordset (for UTF-8 and alike)
- From: Bruce McFarling
- Re: RfD: XCHAR wordset (for UTF-8 and alike)
- From: Bruce McFarling
- Re: RfD: XCHAR wordset (for UTF-8 and alike)
- From: Bernd Paysan
- Re: RfD: XCHAR wordset (for UTF-8 and alike)
- References:
- RfD: XCHAR wordset (for UTF-8 and alike)
- From: Bernd Paysan
- RfD: XCHAR wordset (for UTF-8 and alike)
- Prev by Date: Re: What would be people's ideas on a Forth spread***
- Next by Date: Re: RfD: XCHAR wordset (for UTF-8 and alike)
- Previous by thread: Re: RfD: XCHAR wordset (for UTF-8 and alike)
- Next by thread: Re: RfD: XCHAR wordset (for UTF-8 and alike)
- Index(es):