Skip to main content



Limitations of PageRank

A natural question to ask when learning an algorithm is, “What are the limitations?”

The PageRank algorithm crawls pages on the Internet and assigns each page a score depending on its “importance” in the structure of the Internet. A greater score implies greater authority on a topic or key phrase. At one point an essential algorithm in Google’s search infrastructure, PageRank’s usefulness and weighting in Google’s search results have fallen. We discuss two kinds of queries that exploit the shortcomings of PageRank.

The first is that PageRank scores do not reflect current events. Current events are an important aspect of search to users, as we use search to find the most recent information on a subject matter, and we also use search to learn more about recent news and recently fashionable trends. PageRank scores are not calculated at the time of search – this would be too expensive and slow. Instead, PageRank scores are determined at the time of indexing, a process where Google scans each page on the Internet for topics and key phrases, and subsequently records the relevant pages to each topic or key phrase. This means that a recently updated page is not determined to be an authority on a particular topic until after it has gained exposure as well as paths from other authority pages. For this reason, Google often searches news articles separately as part of Google News, or lists Google News results separately at the top of the general search results. (However, these are slowly being changed as Google evolves.)

The second is its inability to handle queries containing natural language and information outside of keywords. Often we search using sentences such as, “How many days longer does a whale live than a human?” This sentence is, first of all, not a common request and, second of all, complex in phrasing and logic. PageRank hopes can topics relevant to “days”, “whale lifespan”, and possibly even “human lifespan”, but it cannot understand the logical requirement of the query. Furthermore, PageRank does not understand which of these key phrases is most important or least important to the user. There are perhaps several natural language processing and machine learning algorithms involved in responding to such a query. In fact, Google has made significant developments recently in this direction.

Consulted articles:
Google is using an AI called ‘RankBrain’ to answer ambiguous questions, The Verge, http://www.theverge.com/2015/10/26/9614836/google-search-ai-rankbrain
A Comparative Analysis of Web Page Ranking Algorithms, International Journal on Computer Science and Engineering, http://www.enggjournals.com/ijcse/doc/IJCSE10-02-08-060.pdf

Comments

Leave a Reply

Blogging Calendar

October 2015
M T W T F S S
« Sep   Nov »
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives