Re: RfD: XCHAR wordset



On Jul 16, 3:58 pm, Bernd Paysan <bernd.pay...@xxxxxx> wrote:
Anton Ertl wrote:
Windows is UTF-16, which is not ASCII compliant. Although Windows
provides APIs to translate from locale to locale, there is no method in
Win32Forth to automatically identify which parameters would be require
to be translated from XHCARS to UTF-16 and back; the programmer would be
responsible for coding the conversions.

I don't see that you are any worse off with xchars in this situation
than with chars.

It's somewhat worse, because Windows has "A" prototypes, which convert the
current code page (can be multibyte) into UTF-16 on the fly. The "W"
prototypes take UTF-16 directly. But there's some light: UTF-8 is one of
the code pages in Windows (number 65001), and you can at least use
MultiByteToWideChar to convert data.

Actually, it might be possible to change the current code page to UTF-8, but
I didn't see a hint how to do that other than for console i/o (SetConsoleCP
and SetConsoleOutputCP).

It isn't possible, for reasons related to the A form of the functions;
they aren't designed to be used for anything other than byte=char code
pages. 65001, the codepage for UTF-8, isn't a valid code page for the
SetConsolexxx functions either. It can only be used by the
MultiByteToWideChar function and its reverse WideCharToMultiByte.

It's possible to build a UTF-8 Forth for Windows, but only if we know
where all the string parameters are in the A calls and trampoline them
to the W equivalents. Someone did this for cygwin;
http://www.okisoft.co.jp/esc/utf8-cygwin/ but it appears that the
cygwin maintainers rejected it; http://www.cygwin.com/ml/cygwin-patches/2006-q3/msg00014.html.

It's not that hard as there are a limited number of A form calls and
they aren't being added to or changed in any way. The problem is where
there are only W functions with no A equivalents. Then we're into
stupid territory trampolining functions by the gazillion.

If we're to go UTF-8 in Windows, the Forth implementor can cover TYPE
and the like, but the rest is then the programmer's responsibilty.
Ugly. But less ugly than a UTF-16 Forth with a 16bit char and an 8 bit
au, where everyone else's code breaks as it's not ANS and they're
COUNTing and C@ing.


I must honestly admit that I don't like the online
access to MSDN represents information. The internal search is horrible, and
it's one of the rare sites where even Google is confused.

I download it; it's large (over 1GB) but free.


We would need something like the proposal Anton made at EuroForth 2006
(http://dec.bournemouth.ac.uk/forth/euro/ef06/ertl06.pdf, A Portable C
Function Call Interface), with extensions to identify string pointers,
before implementing this.

For strings my approach in the C interface is that one needs to
convert explicitly. Even without Unicode, you already have the
problem of needing zero-termination in C and explicit length counts in
Forth. Hmm, maybe we need some support words for the conversion.

Windows strings are usually not C strings, but buffers with start address
and size (i.e. Forth strings).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"http://www.jwdt.com/~paysan/


.



Relevant Pages

  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... Yet it is somehow is supposed to be better than Ansi, ... Is UTF-16 the same as what Windows Notepad calls "Unicode"? ... Both UTF-8 and UTF-16 are complete encodings of Unicode. ...
    (microsoft.public.vc.mfc)
  • Re: AfxMessageBox?
    ... I also like to use UTF-8 for XML. ... to MFC to support this sort of thing. ... I know there are different kinds of UTF-16:o) ... Mihai Nita [Microsoft MVP, Windows - SDK] ...
    (microsoft.public.vc.mfc)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... somehow is supposed to be better than Ansi, ... Windows Notepad calls "Unicode"? ... and UTF-16 uses up to two 16-bit characters. ... so UTF-8 and Ansi are ...
    (microsoft.public.vc.mfc)
  • Re: Support for UTF-16 on Solaris
    ... whereas with UTF-16 you may find yourself having to reinvent the wheel. ... which is interesting on Windows but probably less interesting on Solaris ... (where, for instance, it may make sense to use UTF-8 as the native format). ...
    (comp.unix.solaris)
  • Re: RfD: XCHAR wordset
    ... Unfortunately, on first analysis, this is one proposal that Win32Forth ... Windows is UTF-16, ... Windows has 'A' type prototypes for strings (which use ...
    (comp.lang.forth)

Loading