Re: Standard character attributes for Hebrew?



Andreas Prilop wrote:
On Fri, 17 Mar 2006, Harlan Messinger wrote:


That is correct but somewhat misleading here. & # 1 2 3 are encoded
in US-ASCII.

Eh? Aren't they encoded in whatever encoding is being used for the file
containing them?

Yes. But they are only ASCII characters; so they are already encoded
in US-ASCII. I wrote it would be /misleading/ to take any superset
such as UTF-8 or Cyrillic Windows-1251. You /could/ say they are
encoded in Cyrillic Windows-1251 - but there are no genuine
characters from Windows-1251 other than US-ASCII here.

I think I understand what you're saying, but to me it's like describing the paths of the planets in terms of revolving around the earth with epicycles instead of in terms of revolving around the sun. Whatever encoding is being used to stored character data in a file, that's the encoding being used for all the characters. For a subset of those characters the encoding might, by design, be the same as under US-ASCII, but it doesn't serve any useful purpose to say that the characters are being encoded using US-ASCII instead of the other encoding. (There's no reason to assume that the source text as a whole contains only these characters, just because these are the characters I was discussing.)

After all, the whole analysis would still hold if the source text were encoded in EBCDIC.
.



Relevant Pages

  • Re: invalid byte sequence in US-ASCII (ArgumentError)
    ... US-ASCII (ArgumentError) ... The purpose is to strip out any ^M characters from the string. ... tried a couple of different magic comments with utf-8, ...
    (comp.lang.ruby)
  • Re: Some interesing aspect of injecting scripts on page...
    ... UTF-7 is _downwards_ compatible to US-ASCII (meaning that every US-ASCII character code sequence matches the UTF-7 encoding for code points of the same characters in Unicode, U+0000 to U+007F) and is the only UTF that has this particular property. ... Furthermore, in practice, UTF-7 may encode other US-ASCII characters, for transmission across channels where those characters are deemed unsafe. ...
    (comp.lang.javascript)
  • XSLT output and restricted encoding in Widbey
    ... in an encoding that cannot represent all the characters used (e.g. write in ... us-ascii for compatibility, and convert all non-ascii chars to entities). ...
    (microsoft.public.dotnet.xml)
  • Re: Serving RSS feeds
    ... characters which were present in the descriptions. ... If the server is saying the file is in US-ASCII then you could use ... If I was doing that I would run it through an XSLT processor with the ... output encoding set appropriately, but that is because I have an XSLT ...
    (uk.net.web.authoring)
  • Re: Turning Chinese
    ... instead of us-ascii or Western European. ... First time I peeked ... that I might want to change to 8-bit characters. ...
    (misc.news.internet.discuss)

Loading