Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)
- From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
- Date: Sun, 18 Jun 2006 04:42:15 +0100
__/ [ Big Bill ] on Saturday 17 June 2006 22:50 \__
On 17 Jun 2006 19:35:34 GMT, John Bokma <john@xxxxxxxxxxxxxxx> wrote:
Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote:
__/ [ John Bokma ] on Saturday 17 June 2006 18:16 \__
Oops. This should read "indicated that my site had over 100,000
pages". Missing 0 left place for misinterpretation.
I had no idea where the 0 was missing,
site:schestowitz.com 1 - 10 of about 709
but:
7,300 from www.schestowitz.com
You might want to fix that. Question is: how many pages does it really
have?
This used to be uniform, i.e. with or without the "www" umbilical cord, I
would get the same number. Moreover, until 2-3 days ago, "site:" was
showing about 6,700 pages. Yesterday it sank to 700 for the first time,
whether it means something or not... it's very unpredictable and it's
difficult to analyse (no good tools). All I know is that many pages are not
in the index and referrals volume (for text, not images) is down
significantly as a result. Pace of crawling is as good as ever, but unlike
Brian Waken's testimony, there is no improvement, i.e. nothing is being
added.
Question is: are there 4.something billion pages, or are there just a
few million.
I tend to (or want to *wink*) believe that the space has been wasted
on actually storing and indexing junk content.
I tend to think that the site: operator needs too many resources at this
moment to operate correctly and hence it gives a wrong number.
The question is: is it a factor, and does the factor grow?
67 from castleamber.com
Actual number: 80 (html, excluded CGI, some might be orphan).
factor: 0.84
I see that in smaller sites of mine as well, but when a CMS is used, the
number goes beyond the point that I predict. Think, for example, about
Gallery. For each photo, there are various scales of zoom.
9,640 from johnbokma.com (has some wrong URLs)
Actual number: 1117 (html, will add some more soon).
factor: 8.63
Question is: does this factor grow, and how?
It seems to go upwards. It only ever increased before the Big Daddy
awkwardness. This climb means that old cache (or broken URL's) might leave a
trail...? One assumption I had is that a CMS was accepting parameters (and
making them concrete/including them through links). Never found an answer
and didn't mind to much to mend it. "If it's ain't broken, why fix it" was
my -- shall we call it -- mantra/motto.
Good, no more MS bashing then?
No, I promise. I know it annoys you.
It does because it's often based on lack of knowledge IMO. I did it ages
ago, until I discovered that a lot of the fans of the OS / computer I
was using were just lying and very biased. Things like: "our" OS can't
get a virus, because it's in ROM. The funny thing was, you could
overrule modules in ROM and extend them. And hence a virus could just do
the same. Anyway, when I had experience with several operating systems I
learned that each suck, and that each OS has it's own issues. Also it's
either a company, or a bunch of geek egos that make things harder then
they should be (or a combination).
I accept that. I'll leave advocacy to other, more relevant groups (400+
messages/week) and will try to abstain fully while I'm here.
same problem: there work x people, and they all are busy.
Maybe they should employ us to increase the value of /x/. We can
develop sites for them to crawl and serve to people. And we can even
work _for them_, sometimes. *smile*
site: is not core business. So if we are going to get jobs at Google we
are probably going to work on GPay, or even GEvil.
*giggle*
GEvil (pronounced jivvel?) could become a tool where you enter a person's
name into a textarea, then wait for Google to scan the Internet for patterns
and determine is the person is evil. Given the hype over Trends, I can see
people using it. Maybe they can have a 1-to-10 scale for levels of evil.
This might work rather nicely assuming that names are unique. It's a big
like Copyscape with something extra on top, I suppose.
I suspect I know what Roy wants to work on at Google... :-))
They have some nice massage tables. *smile*
Best wishes,
Roy
--
Roy S. Schestowitz
http://Schestowitz.com | GNU/Linux ¦ PGP-Key: 0x74572E8E
4:25am up 51 days 9:39, 12 users, load average: 1.45, 1.07, 1.00
http://iuron.com - next generation of search paradigms
.
- Follow-Ups:
- References:
- Part 2 - Wondering why your site is not indexed in Google?
- From: www.1-script.com
- Re: Part 2 - Wondering why your site is not indexed in Google?
- From: Roy Schestowitz
- Re: Part 2 - Wondering why your site is not indexed in Google?
- From: John Bokma
- Re: Part 2 - Wondering why your site is not indexed in Google?
- From: Roy Schestowitz
- Re: Part 2 - Wondering why your site is not indexed in Google?
- From: John Bokma
- Re: Part 2 - Wondering why your site is not indexed in Google?
- From: Roy Schestowitz
- What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)
- From: John Bokma
- Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)
- From: Big Bill
- Part 2 - Wondering why your site is not indexed in Google?
- Prev by Date: Re: flame: a sea of tacky directory pages
- Next by Date: Re: flame: a sea of tacky directory pages
- Previous by thread: Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)
- Next by thread: Re: What does site: report and what it really is? (was Re: Part 2 - Wondering why your site is not indexed in Google?)
- Index(es):
Relevant Pages
|