Re: RfD: XCHAR wordset (Version 3)



Stephen Pelc wrote:

On Wed, 26 Nov 2008 10:26:40 +0100, Bernd Paysan <bernd.paysan@xxxxxx>
wrote:

Hm, what about "distributing an internationalized program as source code"?
How do you do that when you don't know what kind of charsets the systems
use? You might want to convert them, but it's a PITA (all the other
encodings have less characters, so you will end up mapping some characters
to "unknown"). Then imagine a program that starts up a dialog with the
user, and sets the language it uses in response to the language the user
types in "hello" at the start of the program. Completely impossible
outside Unicode.

Sorry, but it's daily practice already. I also believe that the UTF-8
assumption is just ignoring reality. It will take a long time (10+
years) for all the other encodings to disappear from developers'
consideration.

How long do these people take? Unicode has been supported for more than 10
years in the Windows world. Microsoft has been recommending not to use non-
Unicode code pages since then. All operating systems from that vendor that
require non-Unicode approaches to i18n have been shelved years ago.
Unix/Linux has been using Unicode for quite a while, too, and truly
standardized on UTF-8 in the meantime.

These people don't use standardized approaches to deal with their encodings
today. Just let them continue to work their way, and ignore them.

That's no reason for the XCHAR proposal to disenfranchise them, nor do
I see any technical reason for it.

Of course there is. Give choice only if it's useful, and allows writing
portable programs even though. We have given choice to implementors in the
Forth-94 standard to have separate or mixed stacks for floating point.
There's no way to write a reasonably complex FP program that deals with this
choice.

I'm tending to believe that the
proposal should be split into two, XCHAR in one and XCHAR EXT in the
other. XCHAR EXT is the place where any assumptions about encodings
should go.

I'm ok with that. XCHAR EXT deals with xcs on the stack, XCHAR only deals
with variable width characters as opaque type in memory. There could be a
third layer which deals with changing encodings, that's left out of the
xchar proposal anyway. This third layer would allow people to work with different encodings. Some things will have to be left undefined.

The EXT wordsets allow to cherry pick features, so if the property of using
UTF-8 as file encoding, internal encoding, and Unicode code points each are
an XCHAR EXT feature, just pick and leave what you want.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
.



Relevant Pages

  • Re: CfV: Xchar wordset
    ... Then you should stop talking about "xchar" entities at once. ... Revoke your proposal and redo everything to handle UTF-8 specifically. ... different encodings say that it works (with some ... future standard. ...
    (comp.lang.forth)
  • Re: What is PeekChar() ???
    ... If the text could be in different encodings you would ... >> probably benefit from using Unicode or UTF8. ... Which is considered the standard western codepage? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: RfD: XCHAR wordset (Version 3)
    ... encodings have less characters, so you will end up mapping some characters ... If some vendors have legacy applications that use non-Unicode code ... That's no reason for the XCHAR proposal to disenfranchise them, ... I see any technical reason for it. ...
    (comp.lang.forth)
  • Re: Unicode Support
    ... if two Unicode strings are the same? ... UTF-16 is basically telling everyone "ok we all got to start ... character, and will likely support *both* endians. ... UTF-8 encodings are also easy to learn to ...
    (alt.lang.asm)
  • Re: Quieter glyphs than parentheses
    ... ASCII or 16-bit Unicode characters, it did not require rewriting the entire ... by non ISO8859 language scripts. ... Japanese has three popular non-Unicode-based encodings, ... display fonts is one reason I would caution against using characters from ...
    (comp.lang.lisp)