Re: UTF-8 JavaScript files



Thomas 'PointedEars' Lahn wrote:

By contrast to HTML, XML (and so XHTML -- when served as
application/xhtml+xml, application/xml, or text/xml --, as an
application of XML) has two default character encodings defined (that
therefore do not need to be declared), UTF-8 and UTF-16LE. The X(HT)ML
Document Character Set is the same as in HTML, though, UCS.

Correction: The default is not limited to UTF-8 and UTF-16LE. At least
UTF-16BE must be supported, too.

,-<http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding>
|
| [...]
| Each external parsed entity in an XML document may use a different
| encoding for its characters. All XML processors MUST be able to read
| entities in both the UTF-8 and UTF-16 encodings. The terms "UTF-8" and
| "UTF-16" in this specification do not apply to related character
| encodings, including but not limited to UTF-16BE, UTF-16LE, or CESU-8.
|
| Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin
| with the Byte Order Mark described by Annex H of [ISO/IEC 10646:2000],
| section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE character,
| #xFEFF). This is an encoding signature, not part of either the markup or
| the character data of the XML document. XML processors MUST be able to
| use this character to differentiate between UTF-8 and UTF-16 encoded
| documents.
| [...]
| In the absence of information provided by an external transport protocol
| (e.g. HTTP or MIME), it is a fatal error for an entity including an
| encoding declaration to be presented to the XML processor in an encoding
| other than that named in the declaration, or for an entity which begins
| with neither a Byte Order Mark nor an encoding declaration to use an
| encoding other than UTF-8. Note that since ASCII is a subset of UTF-8,
| ordinary ASCII entities do not strictly need an encoding declaration.
| [...]
| Unless an encoding is determined by a higher-level protocol, it is also a
| fatal error if an XML entity contains no encoding declaration and its
| content is not legal UTF-8 or UTF-16.

I could not find normative definitions of what "including, but not limited
to" refers to. Appendix F (non-normative) mentions some possibilities, but
they should probably not being relied upon.


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)
.



Relevant Pages

  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... If ther encoding is not specified, then the encoding is assumed to be ... Ah, UTF-8. ... That would be wrong according to the standard. ... when producing XML files. ...
    (microsoft.public.vc.mfc)
  • Re: New utf8string design may make UTF-8 the superior encoding
    ... The host operating system's native Unicode encoding is unlikely to be UTF-8, ... Manipulating UTF-16 will always be more efficient than ... I am curious what a Chinese "letter" is according to the regexp. ...
    (microsoft.public.vc.mfc)
  • Re: =?ISO-8859-15?Q?Wof=FCr_sind_AnsiStrings_=FCberhaupt_?= =?ISO-8859-15?Q?noch_bra
    ... Fehler in Design und Implementierung der neuen AnsiStrings ... AnsiStrings mit unterschiedlichem Encoding praktisch unbrauchbar sind. ... Damit werden AnsiStrings mit anderen Encodings weiterhin nach UTF-16 gewandelt, ... dort Strings und Literale nur in "nativ" codiert, sonst UTF-8 oder was der Benutzer auch immer vorgibt. ...
    (de.comp.lang.delphi.misc)
  • Re: tDOM doesnt support encoding=ASCII?
    ... a Tcl channel then Tcl will ... specifically asked for binary encoding), so any XML encoding declaration ... but when tdom sees it it is almost certainly UTF-8. ...
    (comp.lang.tcl)
  • Re: Unicode string libraries
    ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
    (comp.programming)