Re: Soupçon of cedilles and aperçus
- From: Peter Moylan <peter@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 30 Mar 2007 14:12:51 +1000
Leslie Danks wrote:
Martin Ambuhl wrote:For the benefit of those who didn't read that wiki article, it should be
Mike Lyle wrote:
Serious question: isn't this ASCII? alt-0199: Ç . . . alt-0231: ç.No. All ASCII codes are in the range 0 ... 127 (decimal).
Above that are the so-called "extended ASCII codes", which you can read about at:
<http://en.wikipedia.org/wiki/Extended_ASCII>
underlined that there isn't just one extended ASCII set. There are many,
all mutually incompatible. (Or partially incompatible: some characters
turn out by good fortune to have the same encoding in two or more
different codes.) An example of the incompatibility can be seen by
comparing the postings of Martin Ambuhl and blmblm (who both quoted
Mike's question) in this thread: same text, different character codes,
and therefore different end results.
The most popular encodings used these days, at least in English-speaking
countries, are as follows. Note that most of these support only ASCII
plus a few accented characters, so are unsuitable in many
non-English-speaking countries. Those countries use different extensions
of ASCII.
1. ibm-850, which was for a long time the most common form of "extended
ASCII". Its main disadvantage is that it's based on the "code page"
concept in the IBM-PC, so it might not be supported by non-IBM machines.
2. iso-8859-1, which I'm using in this response and which handles most
of the characters of a good collection of Western European languages.
Unfortunately there are two different versions of iso-8859-1: the
original, designed by the ISO (International Standards Organisation), and a
modified version designed by Microsoft. Since they both have the same
name (which would seem to be a copyright violation, but IANAL), the only
way to tell the difference is by seeing weird substitute characters
appearing on your screen. Another disadvantage of iso-8859-1 is that it
doesn't support the Euro symbol, which was invented after iso-8859-1 was
invented. iso-8859-1 is often also called Latin-1.
3. iso-8859-15, also known as Latin-9. This is almost identical to
Latin-1, but it makes some (minor) concessions to Scandinavian languages
and it does include the Euro symbol. It also has the advantage that
Microsoft has not yet, as far as I know, come out with its own
incompatible version of iso-8859-15.
4. Windows-1252, now rare, but still in use by some dinosaurs who are
still unaware that non-Windows operating systems exist.
5. UTF-8, which is one implementation of the huge character set called
Unicode. Unicode supports, in principle, the characters of every written
language in the world, including languages like Chinese that have a huge
character set of their own. Ideally Unicode should supersede all the
others, since it avoids the need to use different encodings in different
countries. The catch is that not all newsreaders are new enough to
support it. In fact, it doesn't always work on newsreaders that do
support it, because it depends on having some huge font files installed,
and many implementations of those font files omit a lot of languages. I
myself have only one Unicode font installed, and it's so ugly that I use
it only when absolutely necessary.
So that the receiver can know which code the sender is using, an
indication of the code is included in the MIME header lines of the
message. Unfortunately Mike does not have MIME enabled in his software,
which means that he can only send the 127 ASCII characters reliably. He
can have the impression of being able to send some extended characters
as well, but what the reader sees might or might not be garbled.
The present state of the art is that almost all modern newsreaders
support MIME - although Outlook Express, for some reason, has it
disabled by default - and something like half of them support some form
of Unicode. We must, however, also take into account those who cling to
older newsreaders such as tin and slrn. These have very limited
character set support, but remain popular because in most other respects
they are superior to the modern newsreaders.
--
Peter Moylan http://www.pmoylan.org
Please note the changed e-mail and web addresses. The domain
eepjm.newcastle.edu.au no longer exists, and I can no longer
receive mail at my newcastle.edu.au addresses. The optusnet
address could disappear at any time.
.
- Follow-Ups:
- Re: Soupçon of cedilles and aperçus
- From: blmblm
- Re: Soupçon of cedilles and aperçus
- References:
- Soupçon of cedilles and aperçus
- From: tinwhistler
- Re: Soupçon of cedilles and aperçus
- From: Martin Ambuhl
- Re: Soupçon of cedilles and aperçus
- From: Leslie Danks
- Soupçon of cedilles and aperçus
- Prev by Date: Re: He knelt him at that word.
- Next by Date: Re: Misquotations [WAS: He knelt him at that word.]
- Previous by thread: Re: Soupçon of cedilles and aperçus
- Next by thread: Re: Soupçon of cedilles and aperçus
- Index(es):
Relevant Pages
|