Re: If he bring his action



James Hogg:
The figure 643 seems very low in comparison to the 36,800,000 that
Google finds initially. Can there be so many repetitions?

Donna Richoux:
No, that's not it, as you will see if you take up their offer to show
the "omitted results."

Yes, Google estimates there are 36,000,000 whatevers in its database,
but it only *shows* you, say, 643. Often it will show you up to 999

Actually the limit is 1,000.

(we don't why it doesn't go up that high sometimes), but nothing can
coax it go beyond these limits. The hits are "out there" somewhere, and
some *other* search might display those pages, but its routine for
forming lists of hits will not go beyond these low ceilings.

Here's my guess. And it is only a guess; we can be pretty sure we're
talking here about algorithms that are protected as trade secrets.

1. The index that they use to find entries in their database includes
data not only about where to find the entries, but also how many
there are and some sort of contextual information that can be used
for the (clearly less reliable) estimates for phrase searches.

2. When you start a search, the first thing google does is to
*use this data* to estimate how much of the whole database it
will need to examine in order to find the maximum 1,000 hits,
where "how much" is measured in some sort of units internal to
the database storage system.

3. Google then constructs a cache *of that much of the database*
and saves it, indexed by a key derived by hashing your specific
search details. Also cached is some information about which
server responded to your query, presumably based on your IP
address or something derived from it.

4. To construct the results page served to you, it scans the *cached
database entries*. If you then ask for additional pages of hits,
it returns to the cache to construct them.

5. When it returns any result page, if it did not find enough hits
to fill the page, it corrects the estimated number to match the
actual one, dropping the word "about". This is when you see
"Results 501-600 of about 108,000,000" followed on the next page
by "Results 601-643 of 643". And if you repeat the search later,
you get the same results, because the cache persists for hours
if not days. There is no way to ask it to search *more* of the
main database.

6. If you ask for "repeats included", it still returns to the same
cache. So if the estimate in step 2 was low (and in my experience
when I've done this, it *usually* is), then you still don't get
1,000 hits. But if it was high, and you do step through and get
to 1,000, then you never find our how many it would have served
if you kept going, before the cache was exhausted.

7. They think this is okay because they assume people are using their
searches to quickly get to the pages they most want to see, and
nobody really wants to step through as many as 1,000 pages, let
alone millions. (Or in other words, "320K is as much memory as
anyone could ever want". But they're right -- if you search for
something you think is on the web, how many hits do you look at
before deciding you need to try a different search?)

So they aren't considering people who are mainly interested using in
the results pages themselves to compile statistics -- or, at least,
they aren't considering such users *to be commercially important*.

I repeat, all of this is just my conjecture. But it makes sense to me.
--
Mark Brader | "It is only a guess, of course.
msb@xxxxxxx | I hope none of you ever finds out for certain."
Toronto | -- Insp. Grandpierre (Peter Stone, "Charade")

My text in this article is in the public domain.
.



Relevant Pages

  • Re: What do you step up to?
    ... what's shown at the top of the Google hits. ... The index that they use to find entries in their database includes ... Google then constructs a cache *of that much of the database* ... I repeat, all of this is just my conjecture. ...
    (alt.usage.english)
  • Re: If he bring his action
    ... Google finds initially. ... It seems as if there's a thin line between "extrapolated estimate" and ... Yes, Google estimates there are 36,000,000 whatevers in its database, ... lists of hits will not go beyond these low ceilings. ...
    (alt.usage.english)
  • Re: how do I change module security levels in access?
    ... Google ... this group for desecure a database, there should be lots of hits. ...
    (microsoft.public.access.security)
  • Need a hint on how to display mysql field as a link to a query
    ... My Google searches are just getting me to many hits. ... I have a database for a golf league. ...
    (alt.php)
  • Re: Lotus Approach Password
    ... way into the database, that will allow me to either bypass the ... I don't but Google returns 1.76 million hits for lotus password crack ...
    (comp.databases)