Re: How do search engines index multilingual content?
- From: "Alan J. Flavell" <flavell@xxxxxxxxxxxx>
- Date: Mon, 30 Jan 2006 16:06:17 +0000
On Mon, 30 Jan 2006, Philip Ronan wrote:
> Andreas Prilop wrote:
>
> > Google Groups still ignore the charset parameter of Usenet
> > articles. Instead they use the group name and I-don't-know-
> > what-else to select an encoding for an article.
>
> That's an inevitable problem caused by putting multiple articles
> (with different charsets) in a single web page.
I can't agree.
Mozilla's Bugzilla made the same mistake, and some of the
charset-related bug reports are sheer incomprehensible as a
consequence - they contain a mish-mash of Chinese, Cyrillic and
whatever else, in their different encodings, served out as raw bytes.
But the mistake was made many years back...
At least their discussion shows that they have recognised their
mistake, and understand how to correct it - mapping the various
encodings into Unicode, and serving out the results accordingly -
probably in utf-8.
(This might cause problems for people who are discussing the finer
details of Han unification, but that can't be helped now.)
Google have already, in effect, implemented something like that for
indexing web content. Otherwise it wouldn't be possible to find texts
in koi8-r and Windows-1251 when searching with a utf-8-encoded query:
the kind of problems that Andreas was reporting some years back with
various search engines, which (to put it briefly) made a query in one
encoding, and only returned pages which used that same encoding.
They just need to apply the same principle to what their ggroups
thingy is serving out. Admittedly, ggroups have *other*, *serious*,
problems to attend to first, such as encouraging their users to follow
netiquette - to at least the extent needed to get them out of the
widespread killfiling that they've already earned. But I digress.
.
- References:
- How do search engines index multilingual content?
- From: Manfred Kooistra
- Re: How do search engines index multilingual content?
- From: Philip Ronan
- Re: How do search engines index multilingual content?
- From: Jukka K. Korpela
- Re: How do search engines index multilingual content?
- From: Philip Ronan
- Re: How do search engines index multilingual content?
- From: Andreas Prilop
- Re: How do search engines index multilingual content?
- From: Philip Ronan
- How do search engines index multilingual content?
- Prev by Date: Re: How do search engines index multilingual content?
- Next by Date: Re: xhtml, custom dtds, and MIME types
- Previous by thread: Re: How do search engines index multilingual content?
- Next by thread: Re: How do search engines index multilingual content?
- Index(es):
Relevant Pages
|
Loading