Re: Representing futuristic English



jamesd@xxxxxxxxxxx (James A . Donald) wrote on 26.08.05 in <pq5vg1h5ko4dclouoja768psdkavkr5hsp@xxxxxxx>:

> Wilson Heydt:
> > Not only is there a lot of material now that isn't in
> > HTML, but is in older, proprietary formats, but the
> > assumption going forward about ASCII is very liekly
> > false given some general moves in the direction of
> > unicode.
>
> If I load up an ascii file in a unicode editor, it is
> usually near hundred percent readable, except for a few
> glitches which I can guess from context. (Unicode, when
> encoded as UTF-8 looks very much like ascii, and a
> unicode editor always tries to guess which of the four
> UTF encodings are in use. Since ascii looks much like
> UTF-8, it guesses UTF-8, which is incorrect, but close
> enough.)

Actually, ASCII is exactly a subset of UTF-8, so if this process produces
glitches, you didn't have ASCII to start with. (That was a design
principle of UTF-8 - i.e, this is not an accident.)

Maybe you think of Latin-1, which was a different scheme to extend ASCII
and hence is compatible with UTF-8 only insofar as text is in the common
subset, i.e., ASCII.

> If UTF-8 had vanished from use, he would see gibberish.
> He would then bring it up in a binary editor, and would
> at once *guess* the ascii encoding, after a few moments
> of thought, and in about an hour whip up an ascii to
> unicode translator.

Especially as the ASCII encodings would be quite familiar, just using less
00 bytes. This, incidentally, is true of Latin-1 as well, as *Unicode* was
designed to keep the numbering (but not the encoding) of everything in
Latin-1.

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (rra@xxxxxxxxxxxx)
.



Relevant Pages

  • Re: Zeichenkodierung in der shell
    ... Erfinder zu benutzen - statt sie zu vergewaltigen - werden in der ... auf 8 Bit durch UTF-8? ... dass mit Unicode (egal welcher ... an bestimmten Stellen einfach ASCII _vorgeschrieben_ ist, ...
    (de.comp.os.unix.linux.misc)
  • Re: Enhanced Unicode support for "Go" tools
    ... maybe Rene and Randy to note, perhaps - is an "ASCII compatible" ... version of UNICODE...in fact, for strict 7-bit ASCII, UTF-8 and ... characters so, being on Windows, that opinion makes great sense ... where the majority of the supported languages ...
    (alt.lang.asm)
  • Re: Representing futuristic English
    ... If I load up an ascii file in a unicode editor, ... UTF-8, it guesses UTF-8, which is incorrect, but close ...
    (rec.arts.sf.composition)
  • Re: D2008 - VCL Makeover details?
    ... new TEncoding parameters so you can specify what format to use when loading/saving data (Ascii, UTF-7, UTF-8, Unicode, etc). ...
    (borland.public.delphi.non-technical)
  • Re: Format of string output of a socket server
    ... ASCII is the same no matter what byte encoding is used. ... By definition any ASCII string is in UTF-8 encoding. ... The client program can then convert to Unicode or whatever they see fit? ... I am writing a socket server to deliver telephony events to clients on ...
    (microsoft.public.win32.programmer.networks)