Re: Translating foreign text into html code - help



On Thu, 27 Oct 2005, Stan Brown wrote, quoting me:

> > there's a range of what purport to be browser compatibility
> > options. As you'd expect, there's no option there to ask for
> > W3C-compatible results :-( , and the only non-MS browsers which
> > get any kind of mention are Netscape 4 or earlier.
>
> Apparently clicking different "what browser this will be viewed on"
> choices only changes which items are checked in the five
> compatibility options below.

I'm not sure that's the whole story. Certainly there are some
interactions between the browser compatibility bar and the option
boxes, but certain choices can be made independently. And I suspect
there may be other compatibility features behind the scenes which
aren't evident from those few option boxes.

> > OK, I just tried setting the "encoding" to what it calls "Western
> > European (ISO)" (which seems to be a dumbed-down way of saying
> > iso-8859-1), and saving as filtered HTML. I can report that all the
> > Windows-1252-specific characters which were included in my sample got
> > saved as their &#bignumber; Unicode representations.
>
> I tried something with curly quotes, a degree sign, and Greek capital
> Alpha and Omega. The curly quotes came out as &#8200-odd, much to my
> amazement, and the Greek as &#900-odd. The degree sign was written to
> the file as a single character, which out of habit I never do; but
> I'm pretty sure it's legal since degree is character 176 in my
> selected encoding.

That all looks fine to me.

> BTW, the character encoding in the <meta> tag was ISO-8859-15, not
> ISO-8859-1.

Ouch! I got iso-8859-1. So maybe this is another effect of the
browser compatibility options? Or is it?

No - hang on - their pull-down menu has "Latin 9 (ISO)" (which is
another way of saying iso-8859-15) as a separate selection than
"Western European (ISO)", but there's no entry that says explicitly
"Latin 1 (ISO)". How odd. So we seem to have an open question here.
"Western European (ISO)" definitely emitted iso-8859-1 for me.

> AFAIK they're the same except that -15 has a euro symbol.

My euro character in my test document turned into &#8364; - which is
as it should be in iso-8859-1

Earlier peer-reviewed advice on character encoding had concluded
that there was no point at all in using iso-8859-15 for an HTML
document: by the time that browsers had started supporting -15, they
already had adequate support for utf-8, and so the more-compatible
thing to do, if Windows-1252 was considered rude, was to use utf-8.

Use of 8859-15 for a plain-text document is a different matter
altogether, of course. But we're not concerned with that here.

cheers
.