Re: [OT] character sets.



On 06/11/2008 Colin E. wrote:
Daniele Caselli wrote:

We can't read this one because JugglingDB.com uses an ISO-8859-1
charset not Unicode (ISO-8859-1 contains all the latin characters,
but not the cyrillic) so it was bad interpreted...

This is not strictly true. The IJDb serves pages in a character set
appropriate to the language being rendered for the majority of the
site...

http://www.jugglingdb.com/index.php?lang=ru -> *koi8-r*

If so there would have been the same problem even though the PHP
Headliner had added that charset in Vasili's headers, because of the
incompatibility of the two charset.

The problem is that PHP Headliner does not honor the charset within
the Content-Type header for NNTP messages.

That's strange: it's a really common header and it's really useful
to obtain a correct newsreader interpretation. Couldn't you add it in
the future?

What happens when you reply to a message, should yours use
the same character set?

When I reply to a message, MesNews add the necessary charset warning
in the Content-Type header, and with "necessary" I mean "that one which
fits all the characters used in my post".

But if I reply to one of the JDB posts I obviously don't see properly
all the Unicode characters used by JDB posters because of the disjointed
Content-Type header warning.

It looks like other newsreaders ignore this issue also,
for example LPs messages seem to indicate that SLRN does not add
charset to the headers.

Really strange for a geek-oriented newsreader!!!!

Maybe RFC822 has something to say on the subject...

What do you mean? Do you think it's not a recommended header? Thanks,

Daniele

--

«Tristo è quel discepolo che non avanza il suo maestro.»
Leonardo da Vinci, Codice Forster III.


.



Relevant Pages

  • Re: Input Character Set Handling
    ... that compares a UTF-8 string to a string that a user has inputted into ... rather often if they have any clue at all about Unicode). ... Unicode is a *charset*: a set of characters where each character unit ...
    (comp.lang.javascript)
  • Re: accentuation mark
    ... hang on to a prviously declared non-ISO-8859-1 charset. ... If you have read an e-mail that declares ISO-8859-2, ... screen as ISO-8859-1 characters, but if you then edit the ... read it fine in my UTF-8 news client. ...
    (comp.sys.acorn.misc)
  • Re: Some interesing aspect of injecting scripts on page...
    ... > Or convert all string literals into \uFFFF ... UTF-8 even served not as UTF-8 charset (via Content-Type HTTP header, ... in Unicode, and each characer of string is 2 byte long. ... Unicode characters. ...
    (comp.lang.javascript)
  • Re: File Creation From Form
    ... The problem is if the user type in ... Handling anything more than us-ascii characters on HTML forms input is ... Don't put another CGI script anywhere near the public WWW until you've ... tell CGI.pm what charset you intend to use ...
    (comp.lang.perl.misc)
  • Re: [kde] Character sets / encoding
    ... viewed with UTF-8. ... page for incoming mail to either ISO 8859-1 or IBM cp 1252. ... If the characters you typed were umlauted, ... wants to show the bits from the net in a readable form) which Charset (and ...
    (KDE)