Re: Strange encoding in a hotmail message.



On Thu, 25 Aug 2005, nonmais wrote:
I figured it out.  It's an EUC encoding of one of the East Asian character
sets: either JIS X 0208, GB 2312, or GB 12345.
Thanks that was it: GB2312. I'm trying to find an explanation why it
displays cyrillic characters. I've only found informations on East-asian
characters with these encodings.

GB 2312 and GB 12345 includes codepoints for Cyrillic in its 7th row and Latin with Western European diacriticals (including umlaut-u) in its 8th row.


Apparently, the MUA saw that the message could not be represented in KOI8-R, and promoted it to GB 2312 which can represent the message. I don't know why it didn't promote it to UTF-8 (Unicode). My guess is that the MUA is configured with Russian as a primary environment and Chinese as a secondary environment, and all it cared in making that selection was that the Chinese character set could represent the text.

Even so, it should have tagged the message with the proper character set. However, it's not easy to pinpoint blame; it's possible that the message left the MUA in proper form, but some intermediate MTA damaged it.

I misspoke about JIS X 0208. JIS X 0208 has Cyrillic in its 7th row, but has box-drawing characters in its 8th row. JIS doesn't have Latin with Western European diacriticals.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
.



Relevant Pages


Loading