Skip to main content

“Fresh” Pages in Google Search Rankings

As the web has advanced over the years, so has web search. Google, the world’s most popular search engine, recently created a blog post about an improvement to their search algorithm. Beyond Hub and Authority and PageRank, Google has introduced a time dependence into their search, rearranging results by freshness. For example, if we search for Occupy Wall Street, we probably want web pages that were updated recently and not from a month ago. However, if we search for web pages about knitting a hat, we probably don’t want pages that were uploaded yesterday. Other searches, such as the presidential election, have other requirements. Because it recurs once every four years, sometimes we want to know about the upcoming elections, and sometimes we want to know about the recent election. In order to create fresh web results, Google utilizes their Caffeine web indexing system. Caffeine allows Google to quickly crawl and index the web for the most recent and newly updated page, by using a modified version of a random search. By selectively sorting results for queries, Google can give more relevant results to the searcher.

In this course, we have examined two methods of ranking search results, Hub and Authority, and PageRank. Here, I will aim to modify PageRank to create a simplistic model of the ranking algorithm that Google uses. First, we must somehow parse the search query. This is to differentiate between the different types of searches. For example, time dependent queries such as current events and time independent queries such as history need to be marked differently. To do this, I can introduce a parameter T that varies from -1 to 1 for each page, where bias toward older posts is represented with a negative T and bias toward newer posts with a positive T. We then implement the same PageRank rule as before but during each round each page in the network divides its current PageRank value across its out-going links in a biased amount depending on the T values of the parent and child pages. While this method is simplistic, it offers a basic way to give certain rankings of search results time dependence. The hardest step is parsing the language of the searches, which requires much more complicated algorithms.


Leave a Reply

Blogging Calendar

November 2011
« Oct   Aug »