Charsets on multi-language website



I recently discovered that the web server I use has started to specify
Latin-1 as the default charset, with the result that my Greek, Russian,
Persian, etc pages failed to display properly. I had previously used
the deprecated <META ... charset ...> header tags, which worked for a
time -- presumably because the server didn't originally specify a
default charset.

My learning curve over the last few days has been quite steep: thank
you, Alan, Jukka et al (how are things, Al?) for your useful & clearly
expressed postings on this topic.

I had assumed -- erroneously -- that charset/encoding instructions
acted something like CSSs, with specifications on a webpage overriding
any centrally-specified default.

FWIW, & in the hope that it may be useful for someone in the same
position, here is the (Apache) .htaccess file I finally came up with:

AddCharset UTF-8 .htm
<Files ~ "^g(reek|s|c).+\.htm$">
AddCharset Windows-1253 .htm
</Files>
<Files ~ "^ro.+\.htm$">
AddCharset Windows-1250 .htm
</Files>
<Files ~ "^ru?s.+\.htm$">
AddCharset Windows-1251 .htm
</Files>
<Files ~ "^t(ur|s).+\.htm$">
AddCharset Windows-1254 .htm
</Files>

It looks a bit messy, & if I were starting from scratch I would have
organized the files into language folders. But the file may be of
interest as a sort of template. Briefly, for the benefit of anyone
unfamiliar with the format:

1. I start by making UTF-8 the default encoding.

2. I specify the encodings for Greek, Romanian, Russian and Turkish, in
that order.

3. I use regular expressions to cover the file names for each language
(of course these should have been rationalized, but I didn't want to
have to rewrite hundreds of links!).

HTH someone ...

Nigel

--
ScriptMaster language resources (Chinese/Modern & Classical
Greek/IPA/Persian/Russian/Turkish):
http://www.elgin.free-online.co.uk

.



Relevant Pages

  • Incoming mail encoding OL 2003
    ... The general problem is that historically Russian has at least 3 code tables ... So besides the language, you need to specify ... If message contains character set info in the ... default encoding for incoming messages. ...
    (microsoft.public.outlook)
  • Re: Another Daniels screwup [was: Re: Claims [was: Re: Drifting phonemes [was: Re: The AmE o sound]]
    ... headers specify the following encodings: ... your mailreader is at fault [unless the ... charset specified was "mac"]. ...
    (sci.lang)
  • Re: Google caches Greek page in Russian
    ... The HTTP headers don't specify the character encoding, and some people don't like that, but by the specs and by browser practice, the meta tag information will then be used. ... It's all Greek to us, and most people here don't grok Greek. ... There's knowledgeable people at Google, including people very fluent in encoding issues, and you should send a bug report to Google. ...
    (alt.html)
  • Re: Encoding of received XML document
    ... System.Xml.XmlDocument message, before passing it forward. ... I have to use Passthrough pipeline for receiving, ... Where & how should I specify correct charset conversion? ...
    (microsoft.public.biztalk.general)
  • Re: Yet another question about reading HTML body
    ... Does RTF stream specify the charset explicitly, ... OutlookSpy - Outlook, CDO ... PR_RTF_COMPRESSED data contain the same HTML body data, ...
    (microsoft.public.win32.programmer.messaging)

Loading