Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: cryptoguy <treifamily@xxxxxxxxx>
- Date: Mon, 28 Dec 2009 07:06:38 -0800 (PST)
On Dec 28, 12:26 am, Mike Ash <m...@xxxxxxxxxxx> wrote:
In article
<121e774e-6a58-40e1-8e08-0fe7ffd3c...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
cryptoguy <treifam...@xxxxxxxxx> wrote:
Bare Unicode would double the size, assuming our texts stayed in the
BMP. It would never triple it, unless you started to post in Linear B
or cuneiform. UTF-8 crunches it down, so all the characters that match
US-ASCII stay as one byte. Fortunately, this is most characters used
in the mostly-English parts of the Internet
There's no such thing as "bare Unicode". Unicode describes a mapping of
conceptual characters to code points, and a multitude of encodings which
map sequences of those code points to bytes. (It does more than this,
it's not meant to be a complete enumeration of Unicode.)
If something is "encoded in Unicode", it could be in any number of
encodings. There's no one default encoding. UTF-16 produces two bytes
per code point in the BMP, four bytes per code point outside of it.
UTF-8 produces 1-4 bytes depending. UTF-32 produces 4 bytes per code
point all the time.
I believe you understand this, but find the wording wherein "Unicode" is
used to refer to some specific (but unspecified) encoding to increase
the confusion on the topic.
As noted above, I am not a standards wonk. I know that some modern OSs
(Win Mobile is one that springs to mind) use two-byte encodings as the
default for characters in strings, internally. This is referred to
loosely as 'doublebyte' or 'Unicode'. For regular, English-language
strings, this maps to leaving the first byte of the pair a null, and
the second as the US-ASCII value, which makes conversion easy.
This is done for localization purposes; if all the canned strings of
an application are stored in a table, and data strings are all handled
as doublebyte (aka 'Unicode', loosely speaking), it becomes much
easier to produce multi-language versions of the app; you need to
translate the table, and in GUI apps be aware of how differing
languages change the length of the displayed text.
pt
.
- Follow-Ups:
- References:
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: Szymon Sokół
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: Keith F. Lynch
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: Cryptoengineer
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: Keith F. Lynch
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: cryptoguy
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- From: Mike Ash
- Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- Prev by Date: Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- Next by Date: Re: Starving people refuse to eat food aid
- Previous by thread: Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- Next by thread: Re: New Year's Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat food aid)
- Index(es):
Relevant Pages
|