Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)



__/ [ Big Bill ] on Saturday 17 June 2006 22:50 \__

On 17 Jun 2006 19:35:34 GMT, John Bokma <john@xxxxxxxxxxxxxxx> wrote:

Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:

__/ [ John Bokma ] on Saturday 17 June 2006 18:16 \__

Oops. This should read "indicated that my site had over 100,000
pages". Missing 0 left place for misinterpretation.

I had no idea where the 0 was missing,

site:schestowitz.com 1 - 10 of about 709

but:

7,300 from www.schestowitz.com

You might want to fix that. Question is: how many pages does it really
have?


This used to be uniform, i.e. with or without the "www" umbilical cord, I
would get the same number. Moreover, until 2-3 days ago, "site:" was
showing about 6,700 pages. Yesterday it sank to 700 for the first time,
whether it means something or not... it's very unpredictable and it's
difficult to analyse (no good tools). All I know is that many pages are not
in the index and referrals volume (for text, not images) is down
significantly as a result. Pace of crawling is as good as ever, but unlike
Brian Waken's testimony, there is no improvement, i.e. nothing is being
added.


Question is: are there 4.something billion pages, or are there just a
few million.


I tend to (or want to *wink*) believe that the space has been wasted
on actually storing and indexing junk content.

I tend to think that the site: operator needs too many resources at this
moment to operate correctly and hence it gives a wrong number.

The question is: is it a factor, and does the factor grow?

67 from castleamber.com
Actual number: 80 (html, excluded CGI, some might be orphan).

factor: 0.84


I see that in smaller sites of mine as well, but when a CMS is used, the
number goes beyond the point that I predict. Think, for example, about
Gallery. For each photo, there are various scales of zoom.


9,640 from johnbokma.com (has some wrong URLs)
Actual number: 1117 (html, will add some more soon).


factor: 8.63


Question is: does this factor grow, and how?


It seems to go upwards. It only ever increased before the Big Daddy
awkwardness. This climb means that old cache (or broken URL's) might leave a
trail...? One assumption I had is that a CMS was accepting parameters (and
making them concrete/including them through links). Never found an answer
and didn't mind to much to mend it. "If it's ain't broken, why fix it" was
my -- shall we call it -- mantra/motto.


Good, no more MS bashing then?

No, I promise. I know it annoys you.

It does because it's often based on lack of knowledge IMO. I did it ages
ago, until I discovered that a lot of the fans of the OS / computer I
was using were just lying and very biased. Things like: "our" OS can't
get a virus, because it's in ROM. The funny thing was, you could
overrule modules in ROM and extend them. And hence a virus could just do
the same. Anyway, when I had experience with several operating systems I
learned that each suck, and that each OS has it's own issues. Also it's
either a company, or a bunch of geek egos that make things harder then
they should be (or a combination).


I accept that. I'll leave advocacy to other, more relevant groups (400+
messages/week) and will try to abstain fully while I'm here.


same problem: there work x people, and they all are busy.

Maybe they should employ us to increase the value of /x/. We can
develop sites for them to crawl and serve to people. And we can even
work _for them_, sometimes. *smile*

site: is not core business. So if we are going to get jobs at Google we
are probably going to work on GPay, or even GEvil.


*giggle*

GEvil (pronounced jivvel?) could become a tool where you enter a person's
name into a textarea, then wait for Google to scan the Internet for patterns
and determine is the person is evil. Given the hype over Trends, I can see
people using it. Maybe they can have a 1-to-10 scale for levels of evil.
This might work rather nicely assuming that names are unique. It's a big
like Copyscape with something extra on top, I suppose.


I suspect I know what Roy wants to work on at Google... :-))


They have some nice massage tables. *smile*

Best wishes,

Roy

--
Roy S. Schestowitz
http://Schestowitz.com | GNU/Linux ¦ PGP-Key: 0x74572E8E
4:25am up 51 days 9:39, 12 users, load average: 1.45, 1.07, 1.00
http://iuron.com - next generation of search paradigms
.



Relevant Pages

  • Re: CD/DVDW drive not working after using Nero clean tool
    ... That should fix your problem (basically, a filter driver is missing or ... the registry patch will delete the filter drivers ... CD-ROM Drive or DVD-ROM Drive Missing After You Install Windows XP ... (It'll fix code 41 errors too as per the contents of the last kb article) ...
    (microsoft.public.windowsxp.hardware)
  • Re: "Application has failed to start..." error message with VS2005 on new machine
    ... secondarily, it is to protect my 88-year-old mother from ... The fact that it is not consistent is disturbing. ... Reinstalling the application may fix this problem" ... Does anyone know what this means and how to find out what is missing? ...
    (microsoft.public.vc.mfc)
  • Re: BUSH WINS!
    ... The school of Hillel (an ancient Jewish rabbi) opposed ... > greater evil is only the beginning of righteousness. ... > way to more positive righteousness when he admonished us to return ... > fix the mistakes. ...
    (sci.electronics.design)
  • Re: CD/DVD drives not working
    ... That should fix your problem (basically, a filter driver is missing or ... the registry patch will delete the filter drivers ... CD-ROM Drive or DVD-ROM Drive Missing After You Install Windows XP ... (It'll fix code 41 errors too as per the contents of the last kb article) ...
    (microsoft.public.windows.mediacenter)
  • Re: CD/DVD drives not working
    ... That should fix your problem (basically, a filter driver is missing or ... the registry patch will delete the filter drivers ... CD-ROM Drive or DVD-ROM Drive Missing After You Install Windows XP ... (It'll fix code 41 errors too as per the contents of the last kb article) ...
    (microsoft.public.windowsxp.hardware)