Re: RfD: XCHAR wordset (Version 3)
- From: Bernd Paysan <bernd.paysan@xxxxxx>
- Date: Wed, 26 Nov 2008 16:17:53 +0100
Stephen Pelc wrote:
On Wed, 26 Nov 2008 10:26:40 +0100, Bernd Paysan <bernd.paysan@xxxxxx>
wrote:
Hm, what about "distributing an internationalized program as source code"?
How do you do that when you don't know what kind of charsets the systems
use? You might want to convert them, but it's a PITA (all the other
encodings have less characters, so you will end up mapping some characters
to "unknown"). Then imagine a program that starts up a dialog with the
user, and sets the language it uses in response to the language the user
types in "hello" at the start of the program. Completely impossible
outside Unicode.
Sorry, but it's daily practice already. I also believe that the UTF-8
assumption is just ignoring reality. It will take a long time (10+
years) for all the other encodings to disappear from developers'
consideration.
How long do these people take? Unicode has been supported for more than 10
years in the Windows world. Microsoft has been recommending not to use non-
Unicode code pages since then. All operating systems from that vendor that
require non-Unicode approaches to i18n have been shelved years ago.
Unix/Linux has been using Unicode for quite a while, too, and truly
standardized on UTF-8 in the meantime.
These people don't use standardized approaches to deal with their encodings
today. Just let them continue to work their way, and ignore them.
That's no reason for the XCHAR proposal to disenfranchise them, nor do
I see any technical reason for it.
Of course there is. Give choice only if it's useful, and allows writing
portable programs even though. We have given choice to implementors in the
Forth-94 standard to have separate or mixed stacks for floating point.
There's no way to write a reasonably complex FP program that deals with this
choice.
I'm tending to believe that the
proposal should be split into two, XCHAR in one and XCHAR EXT in the
other. XCHAR EXT is the place where any assumptions about encodings
should go.
I'm ok with that. XCHAR EXT deals with xcs on the stack, XCHAR only deals
with variable width characters as opaque type in memory. There could be a
third layer which deals with changing encodings, that's left out of the
xchar proposal anyway. This third layer would allow people to work with different encodings. Some things will have to be left undefined.
The EXT wordsets allow to cherry pick features, so if the property of using
UTF-8 as file encoding, internal encoding, and Unicode code points each are
an XCHAR EXT feature, just pick and leave what you want.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
.
- Follow-Ups:
- Re: RfD: XCHAR wordset (Version 3)
- From: Stephen Pelc
- Re: RfD: XCHAR wordset (Version 3)
- References:
- RfD: XCHAR wordset (Version 3)
- From: Bernd Paysan
- Re: RfD: XCHAR wordset (Version 3)
- From: m_l_g3
- Re: RfD: XCHAR wordset (Version 3)
- From: Bernd Paysan
- Re: RfD: XCHAR wordset (Version 3)
- From: Peter Fälth
- Re: RfD: XCHAR wordset (Version 3)
- From: Bernd Paysan
- Re: RfD: XCHAR wordset (Version 3)
- From: Anton Ertl
- Re: RfD: XCHAR wordset (Version 3)
- From: Bernd Paysan
- Re: RfD: XCHAR wordset (Version 3)
- From: Stephen Pelc
- RfD: XCHAR wordset (Version 3)
- Prev by Date: Re: RfD: XCHAR wordset (Version 3)
- Next by Date: Re: RfD: XCHAR wordset (Version 3)
- Previous by thread: Re: RfD: XCHAR wordset (Version 3)
- Next by thread: Re: RfD: XCHAR wordset (Version 3)
- Index(es):
Relevant Pages
|