Re: How do search engines index multilingual content?



On Fri, 3 Feb 2006, Dr John Stockton wrote:

Your systems, I suppose, are set up for British (Scottish?)
preferences,

To be honest, I don't know what they're set up for by default; on my
own account, they always seem to inherit the values which I set
before, and the original settings of /those/ are lost in the depths of
history.

and your browser will indicate a preference for English over all
foreign languages.

Well, I just checked the MSIE setting on this system, and it's sending
en-GB,de;q=0.7,en;q=0.3

That says "give me British English if you've got it, otherwise prefer
any kind of German to any other kind of English". But it's only set
that way because of an earlier test...

As we've already discussed: MSIE's *initial* setting is in flagrant
disregard of the useful advice in RFC2616 - so what's new?

But if someone from the Continent phones to ask about something that
seems strange about a page served in Foreign, then you'll want to
look at it in Foreign. Of course, *you*'ll know how to set that up
as a browser preference; but few others will remember.

Then whoever's on the Helldesk should refer the problem upwards until
it reaches someone who /does/ understand. I'm not sure what point
you're trying to make here.

Information pre-configured to be sent by the browser cannot be
trusted,

It's arguable that when a user *first* runs the browser, or otherwise
initialises a browser profile, they should be forced^Wstrongly urged
to make a choice of default text size, language preferences, and
anything else that can't easily be deduced. Alright: in Windoze I
just fired-up the Mozilla Profile Manager and tried to create a new
profile called Stockton. It offered me two buttons: "Choose Folder"
and "Region Selection..."

Naturally I took a look at "Region Selection", and found that it
defaulted the language to "English US". Looking tolerable so far -
but when I tried to investigate the options on the Region pulldown, I
found that it offered me precisely one choice: "US Region". So that's
not very friendly. Seems that, after all, it would be necessary to
visit the rather obvious language preferences dialogue /after/ the
browser has been started up.

Anyhow, I completed the profile, and then looked at the resulting
default settings, and here they are:

HTTP_ACCEPT_LANGUAGE = en-us,en;q=0.5

Which is a reasonable choice for a USAn - unlike MSIE which, as I
said, configures by US default to refuse all kinds of English other
than en-US.

You'll note that in doing this, Mozilla took no account of my
Windows locale setting which, not surprisingly, is set to
"English (United Kingdom)".

unless it can be established that the user's OS/browser combination
has configuration facilities which are completely obvious and easy
to use.

You're not trying to tell me that /any/ of the worthwhile options in
Windows are "completely obvious and easy to use", Shirley?

Perhaps software should be written such that directly after
compilation all choices are explicitly undefined.

I must agree with you that the absence of a language selection list
would be a better initial choice than what the vast majority of
readers evidently got in MSIE (I'm referring back to that web server
study that I mentioned earlier).

The intended consequence of that will be that systems are designed
to make choosing easy and obvious.

Pull the other one! Many "surfers" have no idea that they are using
MSIE as their browser, nor do they have a clue what a URL is: they
think only that they are "opening the Internet", period.

But that doesn't change the fact that there is an IETF-specified
negotiation protocol, whether they know or care about it or not. As
and when I see fit to use it in the interests of clue-endowed
readers, I refuse to be discouraged by some amorphous mass of people
who, even if smeared with clue pheromone and dumped in a field of
randy clues... (well, you know the analogy).

But I *will* go so far as to adjust my settings so that even if they
demand en-US and nothing else, I won't go sending them the ominous
Status-406 page.
.


Loading