Re: Chinese and Russian versions of the website



Smike <antispamcop@xxxxxxxx> scripsit:

I do not know about Chinese, but Cyrillic (Russian) version is
suggested to be done in 16-bit code version.

Are you kidding? Suggested by whom? Surely not by the Internet Architecture Board, which clearly favors UTF-8.

No charset specification is required,

You _are_ kidding, are you not?

Cyrillic text will be visible in most modern web browsers
immediately without Encoding selection.

Web browsers recognize encoding from HTTP headers and don't need any manual encoding selection, for any registered encoding they support.

View source of this example:

Huh? Why should we view source, in an issue like this? There's no sensible way of viewing source without knowing or guessing the encoding, so what could it possibly demonstrate?

http://bratok.prison.se/heros.htm
http://bratok.prison.se/nv.htm
http://bratok.prison.se/cp.htm

These all appear to be documents that
a) lack any declaration about character encoding, which is a protocol error and leaves it to browsers to make their guesses
b) contain just octets < 128, so any reasonable guess, such as US-ASCII or ISO-8859-1 or ISO-8859-5 or ISO-8869-6 or UTF-8 or UTF-16, will do
c) represent non-ASCII characters as character references, which is of course possible but rather inefficient and hopelessly obscure unless you have an editing tool that interprets the references, and if you have, you could use it cleverly, saving the data as UTF-8 encoded
d) demonstrate nothing relevant to the topic.

Long ago, a "conservative approach" as described in c) made sense, but it's hardly fruitful these days for authoring in a language that uses a non-Latin script. Besides, it has absolutely nothing to do with using some "16 bit version".

--
Yucca, http://www.cs.tut.fi/~jkorpela/

.



Relevant Pages

  • =?ISO-8859-1?Q?Re=3A_How_to_upload_a_=A3?=
    ... A reference to a character that will display as this glyph ... Correctly encoding some bytes so as to be recognised as this ... ASCII-like encodings are old and only cope with a character set of up ... straight for UTF-8. ...
    (alt.html)
  • Re: C# and encodings
    ... But if windows has numerous code pages, ... encoding, and thus have only 255 code points matched to characters? ... Unicode can't be represented in only 8-bits, ... But Notepad supports Unicode and yet it only recognizes 255 character, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: UTF-8 JavaScript files
    ... If the adopted encoding form is not otherwise ... That is a subset of a character set, ... Well, what I know is that when talking about HTML, the difference ... Whether UTF-8 would be most widely used was ...
    (comp.lang.javascript)
  • Re: Writing to the newsgroup?
    ... you should be able to set the encoding and use the encoding you ... I'm not familiear with Unitype Global writer, ... However, if you use its help feature to inquire about 'character encoding', ... Here's the UTF-8 test. ...
    (sci.lang.japan)
  • Re: [PHP] First stupid post of the year. [SOLVED]
    ... one can argue how many bytes are needed to represent a character ... in what encoding, but that doesn't change the character. ... Unicode it is called U+00A0. ... there are a few ways to encode U+00A0. ...
    (php.general)