Re: Unicode and html - help for simple web site



On Mon, 29 Aug 2005, Alan J. Flavell wrote:

>> Interestingly, Mozilla identifies the encoding (charset) of
>> [ Warning: Very slow! Read without images! ]
>> http://www.apple.com.ge/contacts.html
>> as Windows-1251 because of the Russian *comments*.
>
> So it does!

Google, which is broken in many ways, thinks it is ISO-8859-5
and puts a <meta ... charset=ISO-8859-5> into the cached version:
http://google.com/search?q=cache:www.apple.com.ge/contacts.html&strip=1

> Worryingly, when auto charset recognition was turned off, the encoding
> was reported as utf-8: but surely these strings of cp1251 bytes could
> not be valid utf-8? !!

UTF-8 is _your_ default, it seems.

.



Relevant Pages

  • Re: different encoding handling between old ASP and ASP.Net
    ... globalization support and configuration between ASP and ASP.NET. ... charset to utf-8. ... decode as utf-8 encoding. ... In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8 ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: =?UTF-8?B?0JLQuNCx0LDRh9GC0LUsINGG0LUg0YLRltC70LrQuCDRgtC10YE=?= =?UTF-8?B?0YI=?=
    ... I am enclosing below the full raw contents of your posting as it arrived ... But this, going through Google, wouldn't test that. ... How do you accomplish making the Subject: come through encoded in utf-8? ... But I can't get it to accept the magic cookies that control the encoding ...
    (soc.culture.ukrainian)
  • Re: Get &euro; past a XML parser
    ... >> I'm having troubles getting the euro sign through an XML parser. ... > You need to explicitly declare that the output encoding is UTF-8 because ... But when I tell my browser to use charset UTF-8, ...
    (comp.lang.php)
  • Re: use of small kana on the increase?
    ... was injected into Google via Road Runner; ... You can change from UTF-8 to view the page in something else but the headers remain UTF-8 even though the contents of the post are in the encoding you change to. ...
    (sci.lang.japan)
  • Re: Input Character Set Handling
    ... "transmit verbatim over network". ... And for sure you have checked *what* charset is indicated in ... at the Encoding item in the menu for IE's ... UTF-8, and a hex dump of the bytes actually sent shows:- ...
    (comp.lang.javascript)