Re: convert NCR to \u?
- From: Thomas 'PointedEars' Lahn <PointedEars@xxxxxx>
- Date: Tue, 07 Apr 2009 19:34:47 +0200
Matthias Reuter wrote:
Ken Williams wrote:
Hi, I'm trying to convert text in "numerical character reference" format
to this javascript escape (\u) format. for example 소개
should become \uC18C\uAC1C.
That's a one-liner:
"소개".replace(/&#(\d+);/g, function (search, match) { return
"\\u" + parseInt(match, 10).toString(16); });
To be precise, at least a two-liner, for legibility :)
"소개".replace(/&#(\d+);/g, function (search, match) {
return "\\u" + parseInt(match, 10).toString(16).toUpperCase(); });
It also matters that the `return' keyword and return value expression start
on the same line, else `undefined' is returned due to automatic semicolon
insertion.
However, I would write it as a general-purpose function:
function charRefToUnicodeEscape(s)
{
return String(s).replace(
/&#(\d+);/g,
function(m, p1) {
return "\\u" + parseInt(p1, 10).toString(16);
});
}
var s = ...;
/* ... */
s = charRefToUnicodeEscape(s).toUpperCase();
(Or make it a method of String.prototype.)
The issue remains that the HTML Document Character Set is UCS, which
supports code points beyond the Basic Multilingual Plane (U+10000 and
greater) with UCS-4, while ECMAScript Unicode escape sequences do not:
\uFFFF is the specified maximum. So those characters cannot be presented
equally in ECMAScript.
However, the solution to that problem would be simple (and oft-mentioned
before):
Do not output or store character references, but output raw code units and
declare the proper character encoding (e.g. UTF-7, -8, -16 or -32).
PointedEars
.
- References:
- convert NCR to \u?
- From: Ken Williams
- Re: convert NCR to \u?
- From: Matthias Reuter
- convert NCR to \u?
- Prev by Date: Re: convert NCR to \u?
- Next by Date: Re: Trying to update Child window with AjaX
- Previous by thread: Re: convert NCR to \u?
- Next by thread: Default Scripting Language in Browsers?
- Index(es):