Re: Multiple coding systems, and filesystems



Scripsit gentsquash@xxxxxxxxx:

On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,

Technically, it has the word in Greek _characters_ (letters). This is the key issue; fonts are secondary. The page has a style *** that makes special suggestions on the font of such words, in a most confusing and tricky way.

What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?

The proper _character encoding_ is UTF-8 in such cases. As soon as you have Japanese, Greek, and umlaut Latin letters on one page, that's definitely the best option. If there were just a few "special" characters, you could present them using entity references like ö or character references like ą, but this gets clumsy (or requires suitable software for generating them) if you have full sentences that consist of "special" characters.

It's not possible (in practice on web pages) to switch the character encoding in the middle of an HTML document.

In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..."; and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.

UTF-8, if Emacs can really produce it. The version of Emacs I've been using does not deal with "special" characters, but I recently looked at the newest version of Emacs for Windows, and it seems to have an impressive support to "special" characters.

Note that the server should be configured to send an appropriate HTTP header. You normally do this by adding something to your .htaccess file, and in practice you need to use the same encoding for all ".html" files in a directory (folder), though you could use, for example, ISO-8859-1 for ".html" and UTF-8 for ".htm" files.

If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

<!--#include ... -->

each of them into one webpage?

No, it won't work that way, even if your server supports SSI includes. They result in a single document, which can have one encoding only. (I won't mention <iframe>, because it's really a poor hack for things like this, but it performs sort-of include where the included document is displayed "autonomously" inside the main canvas and may have a different encoding.)

FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.

A nice mess :-) but it should be manageable when using UTF-8. When uploading with FTP, use binary (not Ascii) mode, since no character conversion shall be performed - the data is already in a system-independent encoding.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

.


Loading