Last week, I offered to explain how the Google Collection relates to the Index (or, rather, vise-verse) and Karen took me up on it.
First, I’m going to refer you to last week’s post where I gave definitions for the terms “collection” and “index”. I know that they seem quite simplistic words, but I can assure you that there has been much confusion and miscommunication surrounding them during the course of this project.
Collection = all pages that Google can find in our domain (except the ones we told it not to find).
Index = the (approximately) 5 million pages that are searched and returned when a user enters a query.
So. The Index is a sub-set of the Collection–that’s pretty easy.
Here’s the $64,000 dollar question:
How does Google figure out what’s in the Index?
The short answer is “rank”. Google ranks every page in the Collection and then sifts the top 5 million of them into the Index.
Google is constantly monitoring both the Collection and the Index and will bump sites in either direction (in or out of the Index) as their rank dictates.
Were this late-night tv of yore, now would be the point where I would don a turban and hold an envelope up to my forehead in an attempt to divine the future.
The answer is: Proprietary information!
The question was: How does Google determine rank?
Damn, I’m good.
Right. So Google is very tetchy about sharing the details of its ranking system (if you are interested in the technical specifications on the Google page ranking algorithm, visit http://www.google.com/corporate/tech.html or http://www.whitelines.nl/html/google-page-rank.html), but it basically boils down to two things:
1. How many pages link to you.
2. The rank of those pages.
There’s more to it than that, of course, and you can find out what you can do to help improve your rank at: http://web.cornell.edu/resources/google_help/rank.html.
Can we control what’s in the Index?
No, not directly. Not to the best of my knowledge, anyway. And, I can assure you, this is exceedingly frustrating for me.
To some extent, we are “controlling” it as we fine-tune the Collection by dropping out page hogs like databases and session ids, but that’s really all we can do. You can help by tuning your site to boost your ranking, but we are largely at the mercy of the Google Collective.
Resistance is Futile,