Re: Unicode and composition mappings
- From: "Jukka K. Korpela" <jkorpela@xxxxxxxxx>
- Date: Fri, 29 Feb 2008 23:15:11 +0200
Scripsit Andreas Prilop:
Is there simple-to-use software available that does normalizations?
Internet Explorer 7
I don't think it does.
Software for normalization can be found e.g. via
http://www.unicode.org/onlinedat/products.html
Look at
http://www.unics.uni-hannover.de/nhtcapri/combining-marks.html
with Internet Explorer 7 and Firefox 2.
It may look like normalization, but it's an illusion.
If you e.g. cut the part that looks like "À = À" and paste it onto WordPad, click on the location after the first "À", and press Alt+X (on a sufficiently new version of WordPad), then it magically transforms to "A300", because the program converts the combing grave accent U+0300 to its hex code value. Nothing like that happens for the second occurrence of "À": using Alt+X, you turn it into C0.
This illustrates that the two occurrences of "À" are really different beasts: the first one is "A" followed by U+0300, whereas the second one is the single letter "À", U+00C0. The browser has _not_ normalized anything.
Displaying the two things in identical ways is correct and appropriate, but it takes place at the formatting level, not at the character level. And normalization is a character-level operation. Combining a letter and a diacritic in visual presentation might even take place at the _glyph_ level (i.e., the rendering engine might render such a combination using a single glyph from a font), but even that wouldn't be character-level issue.
Your page is a nice utility for testing _rendering_ level issues. The results naturally depend on the browser and on the fonts available. For example, though I see no difference in rendering (of the decomposed form and the precomposed form) for many characters, I see a big difference for Z with circumflex. Many things can happen; a simplistic implementation just takes a base character and a glyph for a diacritic and does an "overprint", and it might even use glyphs from different fonts, since many fonts don't have glyphs for many combining diacritics. That's bad, but it's a quality of implementation issue.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
.
- References:
- Unicode and composition mappings
- From: Hibou57 (Yannick Duchêne)
- Re: Unicode and composition mappings
- From: Jukka K. Korpela
- Re: Unicode and composition mappings
- From: Helmut Richter
- Re: Unicode and composition mappings
- From: Andreas Prilop
- Unicode and composition mappings
- Prev by Date: Re: Unicode and composition mappings
- Previous by thread: Re: Unicode and composition mappings
- Next by thread: Re: Unicode and composition mappings
- Index(es):
Relevant Pages
|