Re: What is better encoding method?
- From: "Richard Cornford" <Richard@xxxxxxxxxxxxxxxxxxx>
- Date: 12 Jul 2006 07:39:41 -0700
Bart Van der Donck wrote:
Lasse Reichstein Nielsen wrote:
"Bart Van der Donck" <bart@xxxxxxxxxx> writes:
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.
A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromCharCode(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00hh")
(which is a string with .charCodeAt(0)==n, however much sense that
makes).
[...]
The code point table would probably be identical across all these
commands, it's probably decided by the js engine itself.
<quote cite="ECMA 262, 3rd Ed. Section 6">
6 Source Text
ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 2.1 or later, using the UTF-16
transformation format. The text is expected to have been normalised to
Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15. Conforming ECMAScript implementations
are not required to perform any normalisation of text, or behave as
though they were performing normalisation of text, themselves.
SourceCharacter ::
any Unicode character
ECMAScript source text can contain any of the Unicode characters. All
Unicode white space characters are treated as white space, and all
Unicode line/paragraph separators are treated as line separators.
Non-Latin Unicode characters are allowed in identifiers, string
literals, regular expression literals and comments.
</quote>
It doesn't look like the page's own charset has any influence.
The/a character set asserted by an HTTP content type header would
probably be employed in deciding how to translate incoming javascript
source into the "of characters in the Unicode character encoding" that
is needed prior to the tokenisation of the code.
I didn't find a way<snip>
to force getCharCodeAt() to a specific code page neither.
You wouldn't as by the time you are dealing with javascript you are
past the point where the normalisation to Unicode ahs happened and so
code pages are not an issue.
Richard.
.
- Follow-Ups:
- Re: What is better encoding method?
- From: Bart Van der Donck
- Re: What is better encoding method?
- References:
- What is better encoding method?
- From: mistral
- Re: What is better encoding method?
- From: Lasse Reichstein Nielsen
- Re: What is better encoding method?
- From: Bart Van der Donck
- Re: What is better encoding method?
- From: Lasse Reichstein Nielsen
- Re: What is better encoding method?
- From: Bart Van der Donck
- What is better encoding method?
- Prev by Date: Re: Apparent namespace clash
- Next by Date: Re: Changing the color of individual words in a textarea
- Previous by thread: Re: What is better encoding method?
- Next by thread: Re: What is better encoding method?
- Index(es):
Relevant Pages
|
Loading