Re: How many bytes does charCodeAt return?



Johannes Baagoe wrote:
Lasse Reichstein Nielsen :
That's probably a problem with the interpretation of input (or it's
just a bug).

It appears to be a bug in the interpretation of input.

Yes, as can be seen here. You have posted a Unicode EURO SIGN character
(U+20AC) encoded with UTF-8. For some reason Lasse's client does not appear
to support UTF-8 (or was the wrong encoding chosen intentionally here?); as
he replied to your posting, he encoded it with and declared it (in the
Content-Type header) as ISO-8859-1. As a result, the single Unicode
character was transformed, for display, into three ISO-8859-1 characters
which code points in ISO-8859-1 correspond to the values of the three UTF-8
code units required to encode the character, and which corresponds to the
number 3 you are seeing in your JavaScript shell.

See also <http://people.w3.org/rishida/tools/conversion>.

If you create the string directly from ASCII JavaScript, what does it
print? I.e.,
var euroecu = String.fromCharCode(0x20ac, 0x20a0);
print(euroecu.length); // expect 2

Thanks. I did not know that String.fromCharCode() supports an arbitrary
number of arguments; but ES3F section 15.5.3.2 agrees.

On a side note, though, JFTR: ECU (European Currency Unit) has been the
virtual currency (German: Buchgeld) in the European Community, later
European Union, from 1979 CE until including 1998. It is not to be confused
with the Euro, which is the name and sign of the official currency of the
European Economic and Monetary Union (EEMU) since 1999-01-01 (virtual) and
2002-01-01 (cash).


PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann
.



Relevant Pages

  • Re: Highlighting blanks via GO TO SPECIAL is not highlighting blank cells - HELP, Im totally s
    ... thought the ONE THING that would be benign would be the good old tilde! ... In the end, to get it done, I did the "via notepad" trick. ... Finally found the bug reporting page by signing in with Passport, ... is a wildcard that represents any one character. ...
    (microsoft.public.excel.misc)
  • Re: Small Correction to K&R Exercise 1-22 Solution on CLC-Wiki
    ... My fix is now reflected at the above address. ... Your code answering the challenge includes the obvious fix for this bug - ... lines after the last non-blank character that occurs before the n-th ... reasoning behind this design choice was that I didn't want to go through ...
    (comp.lang.c)
  • CDO / SMTP Bug: dropped period character at start of a line
    ... I noticed a bug using CDOSYS to send ... emails via a drop directory on a Windows 2000 SMTP service (the same bug ... An image URL in the body of the email was having ... the period character dropped. ...
    (microsoft.public.win32.programmer.messaging)
  • Re: ASCII codes
    ... PC might produce a different character for code 247. ... ASCII is an American standard - which conforms to the international ... everybody and his dog produced a variant ... letters - with or without 'West European' diacritics. ...
    (comp.sys.acorn.misc)
  • Re: sloppy-char flag strangeness
    ... if-statement is not performed when the -fsloppy-char flag is used. ... As the switch name sloppy-char implies, ... surprised it that same switch also allowed character data to be compared ... sending a "bug" report to Andy. ...
    (comp.lang.fortran)

Loading