Re: Spaced-out Unicode Cyrillic text



Alice Faber wrote:

T: п≈п?п?я?п?я?п? п?я?я?я?п?п?
T: ь?ь?ы?ь?ь? ь?ь?ь?ь?ь?

When I open the file in Firefox or Opera, the Russian title looks fine, but
when I open it in Safari, the Russian title has the extra spaces. And so
on for other apps.

Any clues?

Well, there are extra characters in there. I looked at various encodings and can't get back to how it originally posted, with the extra-wide spaces. This is Cyrillic. Note the ? between characters. I suspect that in some encodings this is a dead-key diacritic (in Central European, there are cedilla-like things *under* the "real" characters) and in some it's a padding space (some of my phonetics fonts used to have this as a way to fine-tune character spacing).

Nope; this is Russian. There ain't no diacritics. Well, ok; the fourth letter
is a "yo", which has what looks like an umlaut, but that's all. The two-byte
pattern is because it's UTF-8 Unicode, and the first byte gives the code
block that the next byte is in (to over-simplify a bit). What's really
there is two-byte characters.

The funny thing is that most of the Mac apps obviously get the UTF-8 part
right, because they in fact display the proper Russian characters (which
obviously didn't make it through to the above message). They couldn't
do this unless they are properly recognizing and decoding the UTF-8
charset. But they draw those funny spaces.

You could be partly right. The rendering routines could be decoding the
UTF-8 Russian letters, but using the byte count to position the character
glyphs. If so, they've got a serious rendering bug.

I wonder if we could somehow test this conjecture ...
.



Relevant Pages

  • Re: A python IDE for teaching that supports cyrillic i/o
    ... Have you tried to use cyrillic characters in a Python string in ... That works for me in Win XP English, with Russian locale and Russian ... Set Russian locale and Russian language for non-unicode programs on ...
    (comp.lang.python)
  • Re: Raw input of non latin characters
    ... Iam trying to develop a program on poplog and i need to type russian ... Or where i must make changes in poplog source code to include Locale ... Poplog characters internally are just bytes -- Poplog has hardcoded ... Concerning 'readline': 'readline' must know which characters should ...
    (comp.lang.pop)
  • RE: Unicode in CSV files
    ... Why then do I have no problem saving and opening .xls files with both Russian ... When you use excel the Russian letters are treated as special characters ... They don't map to the the ascii character windows unless your ... then you will have the same problem if you use English. ...
    (microsoft.public.excel.misc)
  • Russian to Windows-1252 (htmlencode)
    ... I am looking for a way to convert Russian characters to their ... html entities in a Windows-1252 character set. ... Unicode::Lite has a strange output (I don't understand it, ...
    (comp.lang.perl.misc)
  • Re: DB2 UTF-8 ODBC double conversion
    ... UTF-8 *is* Unicode. ... byte to store characters in the 7-bit ASCII code. ... If I give a UTF-8 string to CreateFile, ... this means that everyone who is using that database has to understand that the ...
    (microsoft.public.vc.mfc)

Loading