Re: page encoding question



"Tony Vella" <tony.vella@xxxxxxxxxx> wrote:

> I am preparing a series of philatelic html pages (lots of text and a few
> scans of stamps) which will include alpha-characters (accents) in Italian,
> French, Spanish, Portuguese and Danish.

They are all covered by the ISO-8859-1 encoding, except for some punctuation
marks and letters like the oe ligature. If you use windows-1252, you get the
punctuation marks and the ligature, too.

>The pages I have finished in draft
> form so far I have encoded UTF-8 but I have just been told that 99% of the
> world will not be able to read them

Nonsense. More probably, 99 % of the WWW users _are_ able to read them. Well,
let's say 97.6 %. After all, 96,3 % of all percentages have just been made
up, and the remaining 4,7 % have been miscalculated.

> and that I should go through all the
> pages and re-encode them "western european - windows (1252)".

I wouldn't do that at this point, unless you have good tools that do such
things for you with minimal effort.

> I guess what
> I would like to know is what encoding would be most effective for these
> particular languages.

If you were just about to start the project, I would recommend ISO-8859-1 (or
windows-1252 if you need those extras) - not because of wider browser
coverage (though there is a _small_ improvement to be gained there) but
because those encodings are somewhat more efficient (one byte per character,
whereas UTF-8 uses two bytes for some of the characters you'd use).

UTF-8 is certainly simpler in the future if you'll ever need to add
characters in other languages.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html


.



Relevant Pages

  • Re: Enhanced Unicode support for "Go" tools
    ... maybe Rene and Randy to note, perhaps - is an "ASCII compatible" ... version of UNICODE...in fact, for strict 7-bit ASCII, UTF-8 and ... characters so, being on Windows, that opinion makes great sense ... where the majority of the supported languages ...
    (alt.lang.asm)
  • =?utf-8?B?UmU6IFN0cmluZyAiw6LigqzihKIiIHRyYW5zbGF0ZWQgdG8gYXBvc3Ryb3BoZS4gV2h5Pw==?=
    ... it works), though it seems to use mostly just Ascii characters, representing ... but the author is not making the best possible use of UTF-8. ... They don't map it to ASCII apostrophe, ... Latin 1 encoding. ...
    (alt.html)
  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: about MBCS and UNICODE support
    ... This leaves the upper 128 to define characters ... > more characters are used in some other languages using Cyrillic script). ... and the encoding scheme usually reflects that. ... > Depends on what you do with those strings. ...
    (microsoft.public.vc.atl)

Loading