Skip to main content



A Better Search Algorithm than Google’s? A Look into Top-Sensitive PageRank

In Taher H. Haveliwala’s research paper, Haveliwala discusses his research in developing a new search query algorithm. His proposal comes in three parts. First, he challenges the effectiveness of using one generic PageRank vector and instead purposes creating a set of PageRank vectors that are “biased” toward representative topics of the search. Second,for “ordinary keyword search queries” he calculates the “topic-sensitive” PageRank score for pages that satisfy the query. Third, for searches in context, which means highlighting a word on a web page and then running a Google search, Haveliwala computes the topic-sensitive PageRank score in the context of topics discussed on the web page that it was extracted from. For each page that is proposed as relevant to the search, the page is given a score for every PageRank vector. Each score represents how well that page scores in the topic of that PageRank vector. The reasoning behind this approach is that some pages that are heavily linked to may be a good authority in one topic but not a very good authority in others that are relevant to the query.

In the experiment, they used 16 topics that were determined before query-time. These topics included “Arts”, “Computers”, “Games”, “Health”, “Home”, “Kids & Teens”, and “News”. They also had “No-Bias” to check how their approach compares to the basic generic PageRank vector result. Some sample queries used include “affirmative action”, “alcoholism”, “lyme disease”, and “amusement parks”.  They used these topics to determine which were the best links for each query. They then had a random sample of users judge whether the non-biased results were better than the topic sensitive results. In almost all queries, topic-sensitive results were ranked better. Below are the results for how some of the queries did in precision for no-bias versus bias.

queries

I found this topic incredibly interesting. Although I think this algorithm can definitely be improved by refining which topic is relevant to each query, I think this research could have a great impact on how we can navigate an ever-increasing database of pages.

Haveliwala’s research is relevant to the discussion of the basic PageRank algorithm that we have had in class. Although Haveliwala’s approach is much more advanced than the algorithm we used, all of the same topics of hubs and authorities still applies.

I definitely would recommend skimming his article if you find the time, because it is a fascinating read.

Comments

Leave a Reply

Blogging Calendar

October 2016
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives