Re: String "’" translated to apostrophe. Why?



On Jul 18, 10:47 am, "Jukka K. Korpela" <jkorp...@xxxxxxxxx> wrote:
Scripsit Richard:

http://whytheluckystiff.net/articles/seeingMetaclassesClearly.html,

The page declares UTF-8 encoding (in a meta tag only - not really ideal, but
it works), though it seems to use mostly just Ascii characters, representing
other characters using character references like &#8217;. Nothing wrong with
that really, but the author is not making the best possible use of UTF-8.

I particularly like the GUI the author created and want to emulate his
techniques.

What GUI? I see no Graphic User Interface there. Just a web page. If you
view it using a graphic browser, then you are using a GUI, but that's a
different issue.

In particular, he used the (three character) string ’
(hex E2 80 99) which translated in ' (ASCII apostrophe) in both
Firefox 2 and HTML-Kit HTML-Kit Version 1.0 (Build 292).

What? Where? I don't see anything like that on the page.

However, IE7 leaves it untranslated.

You're enigmatic.

I presume the author coded the apostrophe this way was for
internationalization.

The page has apostrophes written as &#8217;, which is a correct reference,
and modern browsers render it well. They don't map it to ASCII apostrophe,
except perhaps if they need to work with an ASCII-only rendering situation.

But I don't see why this works in Firefox and
HTML-Kit.

I don't see what you mean by "this".

Can anyone explain why the following works in those two
browsers?

(HTML-Kit is an authoring tool, not a browser.)

<p>If you’re new to metaprogramming in Ruby</p>

Well it doesn't. The string ’ is rendered literally, as a mess of
characters. Maybe the actual file you used for testing contains something
completely different, though. (As usual, posting a URL...)

You have some confusion here. You have probably played with a program that
converts character references to UTF-8 encoded characters and later you have
interpreted the octets of the UTF-8 representation according to theWindows
Latin 1 (windows-1252) encoding.

It's easy to get confused with character encodings, and difficult to help
people out from a confusion. It's probably best to stop here and start
afresh. What do you really want? To use a punctuation apostrophe (’) on a
web page? Then write &#8217;. Or &rsquo;, if that's easier to remember.
There are other ways too, but these methods work independently of character
encoding and don't make you confused and don't require any particular editor
or UTF-8 support in your authoring software.

--
Jukka K. Korpela ("Yucca")http://www.cs.tut.fi/~jkorpela/


Hi Yucca,

Thanks for your response. Please take a look at my response to Harlan
confessing that the fundamental problem was mine: I took a screen-
capture of a web page that contained an HTML entity, and that's what
apparently introduced that weird three-letter string.

What GUI? I see no Graphic User Interface there. Just a web page.
That's true. What I meant is I admired the presentation of this
tutorial. I'm a retired software developer who's done a lot of
teaching of computer technology, e.g about a dozen years as a college
adjunct lecturer/professor. So I wanted to learn how he achieved some
of his visual effects or graphical effects or, in a sort of short-hand
GUI.

Nothing wrong with
that really, but the author is not making the best possible use of UTF-8.

I'm interested in your assessment. I never really studied these
various encoding schemes. I just picked up a few cryptic lines from
W3C to stick on the top of my HTML without giving it much thought.
In this case, do you think the author should have used something other
than UTF-8 because his page was pure ASCII, save for the HTML
entity? Or do you think the author should have employed other
features supported by UTF-8?

In particular, he used the (three character) string ’
(hex E2 80 99) which translated in ' (ASCII apostrophe) in both
Firefox 2 and HTML-Kit HTML-Kit Version 1.0 (Build 292).

What? Where? I don't see anything like that on the page.

You're right.. I was pretty clumsy here in trying to describe this
mess.

However, IE7 leaves it untranslated.

You're enigmatic.

Yep. I was wrong here, too. Actually, I got a curly apostrophe
using all three tools.

I presume the author coded the apostrophe this way was for
internationalization.

The page has apostrophes written as &#8217;, which is a correct reference,
and modern browsers render it well. They don't map it to ASCII apostrophe,
except perhaps if they need to work with an ASCII-only rendering situation.

I now see that I misinterpreted the scenario. I was wasn’t asking why
the browsers didn’t render that entity as an ASCII apostrophe (0x27).
Instead, I incorrectly thought the browsers rendered the entity as an
ASCII apostrophe, and I was asking why the author employed an entity
rather than merely using an ASCII apostrophe directly. But, as I said
to Harlan, the author wanted a closing single quote and that, in
fact, is what Firefox and IE browsers rendered, as did HTML-Kit’s
(AFIK, built-in) interpreter.

Maybe the actual file you used for testing contains something
completely different, though. (As usual, posting a URL...)

You have some confusion here. You have probably played with a program that
converts character references to UTF-8 encoded characters and later you have
interpreted the octets of the UTF-8 representation according to theWindows
Latin 1 (windows-1252) encoding.

You’re right. I was using the file resulting from screen-capture of
the rendered web page. Sorry about that. 

Again, thanks for taking all the trouble to figure out what I was
confused about.

Best wishes,
Richard







.



Relevant Pages

  • Re: Enhanced Unicode support for "Go" tools
    ... maybe Rene and Randy to note, perhaps - is an "ASCII compatible" ... version of UNICODE...in fact, for strict 7-bit ASCII, UTF-8 and ... characters so, being on Windows, that opinion makes great sense ... where the majority of the supported languages ...
    (alt.lang.asm)
  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: Character Encoding
    ... > to decode the text when I read it from the database so I can display it ... I'm using UTF-8 character encoding. ... > characters that were UTF-8 incompatible came along for the ride, ...
    (comp.lang.java.programmer)
  • Re: Print Spanish characters in Perl?
    ... and ensure that your file is saved in the UTF-8 format. ... encoding then your display device expects. ... forgetting to specify UTF-8 as charset. ... To avoid this kind of problem, make sure that all the characters are ...
    (comp.lang.perl.misc)