About charset setting and replacing



Hi there,
I am writing a program to load HTML from file and send it to IE
directly. I've met some problem in charset setting. Most of HTML have
charset "us-ascii", for some reason, some UNICODE TEXT will be
inserted into the HTML before sending to IE. The problem is

1) Can I specify special charset for some component, e.g.
<span charset="UTF-8"> SOME UNICODE HERE</spand>

2) If "NO" for 1), so any way to change the charset of the original
HTML? Because I have no HTML praser handy, I can only SEARCH & REPLACE
the charset programmly. I've checked the several HTML and find the
CHARSET format like

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

So, for leading the program to replace the correct one, I search the
keyword "charset=" and get the position, and then search the position
of double quotation marks, finally, I replace the substring with UTF8,
everything seems fine. However, I am worrying about if there are some
excepction. Will these, for example, happen?

<META http-equiv=Content-Type content="text/html;" charset="us-ascii">

OR

<META http-equiv=Content-Type content='text/html;' charset='us-ascii'>

OR

<META http-equiv=Content-Type content='text/html; charset=us-ascii'>


Any better approach for my problem?

p.s. Someone suggest me to send the original code to IE and then call
IE's charset setting function to change the charset, I try, but for my
UNICODE TEXT, aftering changing the charset, the UNICODE TEXT becomes
some meaningly code!!!

Thanks in advance.

.



Relevant Pages

  • Re: http-equiv caps & spacing in Apache 1.3.36
    ... using the type for incompatible XHTML is not forbidden ... HTML, current practice on the Internet includes a wide variety of HTML ... Encoding of a charset is often for choosing an alphabet and that's ... override the HTTP headers sent by a prior server. ...
    (comp.infosystems.www.servers.unix)
  • Re: About charset setting and replacing
    ... I've met some problem in charset setting. ... inserted into the HTML before sending to IE. ... Since any valid us-ascii character is also valid UTF-8 ...
    (comp.infosystems.www.authoring.html)
  • Re: character encoding in CGI.pm
    ... >> Or is XML defined such that this is a perfectly valid situation? ... It isn't valid HTML (take this document, ... its charset; in this case, the charset given in the HTTP header ...
    (comp.lang.perl.misc)
  • Re: About charset setting and replacing
    ... I've met some problem in charset setting. ... inserted into the HTML before sending to IE. ... Since any valid us-ascii character is also valid UTF-8 ...
    (comp.infosystems.www.authoring.html)
  • Re: About charset setting and replacing
    ... HTML have charset "us-ascii", for some reason, some UNICODE TEXT ... If you create a page that is encoded as UTF-8, and serve it as UTF-8, ...
    (comp.infosystems.www.authoring.html)