Re: Unicode and composition mappings



Scripsit Hibou57 (Yannick Duchêne):

Exploring the Unicode database, I've found in the file
UnicodeData.txt, a field named Decomposition_Mapping (which is
explained the Unicode reference), but I cannot see any
Composition_Mapping field.

There is no such mapping defined in the Unicode standard. (Note that UnicodeData.txt contains just a small part of properties of characters, so you need to check such things from other sources in the database. But I guess you already found this out.)

Does it means that it has to be derived from the Decomposition_Mapping
field, by building a reverse associative array ?

If you want such a mapping, yes.

Any way, I feel suprise, beceause there plenty of derive properties
files in the Unicode database, so this is strange that there is no
Composition_Mapping derivation any where.

Not really. A general composition mapping would hardly make much sense. A large number of characters with decomposition mappings are compatibility characters that should normally not be used in new data. They have been included perhaps only because some older standard, possibly quite obsolete by now, has defined a distinction (say, between the Latin letter A with ring above, Å, and the angstrom symbol, which looks exactly the same and is for all relevant purposes just that letter _used_ for a particular meaning), and Unicode lets you retain such a distinction in Unicode-encoded data _if desired_.

So it would make little sense to do general composition, and generally impossible in the sense that many characters share the same decomposition, so how could a program decide what to with a character or string that might be some character's decomposition?

In a restricted sense, general-purpose composition is possible, namely _canonical_ composition. It's routinely used when performing normalization; see "Normalization Forms" in the Unicode standard. In particular, certain normalization forms involve things like mapping a characters followed by a combining diacritic mark into a precomposed character.

Wish a nice time to all of you boys'n girls :)

And what about all the rest of us? "On the Internet, nobody knows you're a dog."

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

.