Re: RfD: XCHAR wordset (for UTF-8 and alike)



On Wed, 28 Sep 2005 17:49:50 GMT, anton@xxxxxxxxxxxxxxxxxxxxxxxxxx
(Anton Ertl) wrote:

>>The key word is "should". However, reality intervenes. There are
>>apps out there that use multiple encodings. A standard formalises
>>current practice - it is *not* a design for the future.
>
>It makes no sense to standardize a current practice that has no
>future.

Yes it does! It encourages take up of current best practice after
the first port. Application developers simply will not discard a
large and proven code base just because you say they should.

We have been involved in two ports of large commercial Forth
applications, FigForth -> Forth83 and Forth83 -> ANS94. The
final application generates 10-16Mb of binary. Even the first
stage build requires compiling 250,000 lines of code. Until
you understand the mindset of these developers and the
management issues of large applications, you will not
understand why I'm taking this approach.

In essence you want to go from A -> B directly. I' saying that
acceptance of B requires some people to go A -> C -> B. The
end point is not in dispute, it's the journey that counts.

>I have read enough statements from Forth vendors that it's impossible
>to write substantial apps in ANS Forth, so supposedly the programmers
>of those substantial apps are ignoring the standard already.

I for one do not subscribe to that point of view. What many/some
vendors have said is
a) the standard does not cover enough
b) we were out of time to do more
c) we welcome your taking up the challenge.

>>The preferred route, I suggest, is to provide GET-ENCODING and
>>SET-ENCODING.
>
>That's the worst possible design; or maybe having an ENCODING variable
>would be even worse.
>
>In general, the global-state approach is always causing problems,
>whether it's STATE or BASE or something else.

That's why GET-ENCODING and SET-ENCODING are suggested - they hide
the implementation of the storage.

>Xchars were designed for dealing with one encoding used throughout
>the Forth system. Several encodings are compatible with the
>requirements of xchars, and a Forth system might let you choose on
>startup which encoding to use, but you cannot switch around between
>encodings.

The implication of XCHARs is then that they cannot be used when
ACS <> DCS or OCS <> DCS. This breaks XCHARs for application
development on current Forths.

Stephen


--
Stephen Pelc, stephenXXX@xxxxxxxxxxxx
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
.



Relevant Pages

  • Re: CfV: Xchar wordset
    ... Then you should stop talking about "xchar" entities at once. ... Revoke your proposal and redo everything to handle UTF-8 specifically. ... different encodings say that it works (with some ... future standard. ...
    (comp.lang.forth)
  • Re: CString and UTF-8
    ... and variable character width encodings are a nightmare for in-memory ... The current proposal for the standard is to introduce char16_t and char32_t ... (UTF-16 vs. UTF-32) ...
    (microsoft.public.vc.mfc)
  • Re: RfD: XCHAR wordset (Version 3)
    ... Changing encodings is messy. ... A standard system can provide ways to change the internal and external ... A legacy application that uses one or several non-Unicode encodings is not ... setting LANG=C makes your local terminal UTF-8 unaware. ...
    (comp.lang.forth)
  • Re: What is PeekChar() ???
    ... If the text could be in different encodings you would ... >> probably benefit from using Unicode or UTF8. ... Which is considered the standard western codepage? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: CfV: Xchar wordset
    ... that's my opinion - UTF-8 is the only reasonable forward-looking ... all the others are legacy and messy. ... different encodings say that it works (with some ... Xchars go beyond UTF-8 and can deal with all sorts ...
    (comp.lang.forth)