Re: Japanese Chinese tea web sites



"Space Cowboy" <netstuff@xxxxxxxxxxxxx> writes:

> Lewis Perin wrote:
> > Warning: nerdy details abound here!
> >
> > "Space Cowboy" <netstuff@xxxxxxxxxxxxx> writes:
> > >
> > > Lewis Perin wrote:
> > >> [...why are there Chinese tea names that appear only in Japanese sites...]
> > >
> > > The charset=shift_jis of the webpage indicates Japanese. All 2
> > > character pairs are used for Japanese font sets. The characters you
> > > see are from the Japanese fonts and not Chinese. That character may
> > > very well exist in the Chinese font set and vice versa but the charset
> > > setting on the HTML page tells where to look. Basically non Roman
> > > languages take two characters for representation and a corresponding
> > > font set. For example the Cha character in Japanese JIS is 3567 and
> > > simplified Chinese GB 1872.
> >
> > Yes, but it's still the same Unicode code point (33590, or 8336 in
> > hex), which is why you get both .cn and .jp web sites if you Google
> > for it.
>
> Only if the Chinese or Japanese websites uses Unicode codepoints such
> as 8336. There are plenty of Chinese and Japanese sites that use
> charset=UTF-8.

But UTF-8 *is* Unicode. More pedantically, it's an encoding of
Unicode. The codepoints exist at the abstract level of Unicode; the
encodings, like UTF-8, mediate between that level and what you see in
your browser. See

http://www.unicode.org/standard/principles.html

for an explanation.

> I'm not sure of the particulars but you can also mix language sets
> on a webpage. I use Unicode strings for Google searches. I could
> get additional hits if I used JIS or GB strings but I only track
> Unicode. On TaoBao I have to use GB strings. Ebay China uses
> Unicode.

JIS, GB, and Big5 are all parts of Unicode.

> Babelfish doesn't accept Unicode strings.

Do you mean Babelfish or Babelcar? If it's the latter, and you want
to try the alpha version that searches on Chinese characters, email me.

/Lew
---
Lew Perin / perin@xxxxxxx
http://www.panix.com/~perin/babelcarp.html
.



Relevant Pages

  • Re: UNICODE to MBCS
    ... a conversion internally from Unicode encoding to ANSI encoding. ... INI = UNICODE containing code points for English and Chinese ... Chinese characters where all unresolved and appearing as '?'. ...
    (microsoft.public.vc.language)
  • Re: Unicode is driving me nuts!
    ... using unicode, as you also mentioned you used to have ... hopefully not to have a lot of unreadable characters. ... > sample Chinese document. ... > Anthony> But when I attempted to run the script ...
    (comp.lang.python)
  • Higher Unicode characters
    ... Can someone explain something regarding unicode? ... I have to work with documents written in Simplified/Traditional Chinese (by ... read the characters and it pops up the Unicode conversion dialogue box. ... Also, once the characters have been written, why is it that applying a font ...
    (microsoft.public.word.printingfonts)
  • Re: MFC(VC6) Application Localization from French to Chinese(RPC)
    ... If you are using VC6 you must be sure to have the Chinese code page loaded for the characters to work correctly. ... If you are using 2005 you can also open the RC file with notepad and resave it as Unicode and the VC resource editor will maintain it in Unicode for you. ...
    (microsoft.public.vc.mfc)
  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)