Re: Forth 200x, S\\\q



Bruce McFarling <agila61@xxxxxxxxxxxx> writes:
So rather than saying that the \xHH representation assumes 1 char = 1
byte = 1 au, it is more precise to say that it assumes that:

1 char >= 1 byte

and that where characters are larger than 1 byte, each character may
be unambiguously encoded as a sequence of bytes. Which, for Unicode,
is the case.

There are a number of options what \xHH means in a Unicode Forth:

- It means the Unicode code point HH, encoded according to the
system's encoding; e.g., with UTF-8 some \xHH characters would be
translated into two bytes.

- It means a (primitive) char (i.e., byte) with the value HH. What it
means as Unicode character depends on the system's encoding and
possibly on the surrounding chars; you could construct a sequence of
chars with \x that would be illegal as UTF-8 encoded Unicode
characters. I think Stephen favours this option, because it allows
specifying, say, I/O control strings that should not be interpreted as
UTF-8 strings.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2008: http://www.complang.tuwien.ac.at/anton/euroforth/ef08.html
.



Relevant Pages

  • [PATCH] UTF-8 input: composing non-latin1 characters, and copy-paste
    ... One can put the keyboard driver into Unicode mode, load a Unicode keymap, and get single keystrokes generate valid UTF-8 for non-ASCII characters. ...
    (Linux-Kernel)
  • Re: Unicode string libraries
    ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
    (comp.programming)
  • Re: Unicode string libraries
    ... I know that Perl uses UTF-8 as its internal string representation. ... characters defined within the BMP). ... search on UTF-8 encodings is equivalent to a search on Unicode ... it makes sense to choose other criteria for your internal encoding. ...
    (comp.programming)
  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)
  • Re: heeeeeeeeeeeeeeeellllllllllllllppppppppppppppppppppp
    ... Why is using char* a bad thing and why using sprintf a bad thing to, ... can be up to MAX_PATH characters). ... LPSTR lpMsgBuf; ... MessageBox(NULL, lpMsgBuf, "GetLastError() for ...
    (microsoft.public.vc.mfc)

Loading