Skip to main content



More than just a Web Search algorithm: Google’s PageRank in non-Internet contexts

Most people familiar with web development and the Internet in general have heard of PageRank, Google’s most famous search result ranking algorithm. Far less well known, however, are the remarkably wide variety and surprising power of applications of the PageRank algorithm in non-Internet contexts. While it’s unclear whether PageRank will ever find as much success as it has in web search in a different field, the sheer number of different versions and reworkings of the algorithm to suit field-specific needs in the past 16 years makes it clear that any algorithm as general and elegant as PageRank will lend itself aptly in hundreds of different situations, and in unexpected and unprecedented ways.

Designed by Google founders Sergey Brin and Larry Page in 1996[5], PageRank is an algorithm designed to use the structure of an Internet graph to assign a PageRank value, or relevance factor, to each page in the graph. Google Search uses the weighting produced by PageRank to generate an ordered listing of web pages relevant to a specific search query. In reality, Google Search employs a large collection of up to 200 ranking factors to determine a page’s relevance and popularity[1], but PageRank is still arguably the most famous ranking algorithm in this collection[3]. While PageRank is best known and most often associated with the Internet and search engines, the algorithm has recently been used to a significant degree of success in a variety of other unrelated fields, and in this blogger’s opinion, there are hundreds more applications of this algorithm still untapped. To explore these applications, however, it’s important to first have a basic intuition for what, exactly, PageRank measures in a general network.

There are several different ways to think about the meaning or intuition behind PageRank. First and foremost, it is important to keep in mind that PageRank is designed to use the structure of a graph to quantify the importance of each node in that graph. Accordingly, every usage of PageRank outside of a web context must maintain some notion of importance, even if the interpretation of the importance of a node varies from application to application.

PageRank can be thought of as a fluid that circulates throughout a network, passing from node to node across edges, and pooling at the nodes that are the most important[2]. An equivalent way of thinking about PageRank is to think of terms of votes – a node acquires votes from other nodes along inward edges, and votes from more important nodes are intuitively worth more than votes from unimportant or average nodes[6]. Finally, another way of thinking about the algorithm is in terms of random “walks” through a network. The PageRank of a certain node is simply the probability of ending up at that node after starting from a random node in the network and stepping through the graph a node at a time by selecting a random outward edge from each node and traveling down it[2].

Although this algorithm was designed for analyzing Internet networks, its simplicity and elegance allow it to be a much more general and powerful tool. Because PageRank uses only the structure of a graph to compute importance, it does not rely on anything intrinsic only to web networks and so can be applied to a wide range of other network contexts. Indeed, it was only a matter of time before the power of PageRank in these other contexts was discovered and applied.

The number and variety of the fields in which PageRank has been used for analysis is large and growing by the year. In 2014, David Gleich, a Computer Scientist at Purdue University, released a paper which detailed the use of the algorithm in more than 10 different fields, including “biology, chemistry, ecology, neuroscience, physics,…and computer systems.”[4] According to Gleich, with PageRank, “[i]t’s sort of like Google invented a lens. If you have different combinations of lenses, you can look at all kinds of different systems–you can get microscopes, telescopes, or digital cameras. But you needed that unique insight of the lens.” Most of these field-specific applications of PageRank made some degree of changes to the nature of the algorithm, but the key intuitions behind PageRank – that is, the random walks, fluid pooling, or weighting vouching system – largely still hold. Often, PageRank is used to find a central node or subgraph within a larger graph, enabling researchers to restrict their analyses to a small subset of important nodes. However, in other cases, researchers made use of all of the rankings of the nodes for their analysis.

Some of the more interesting cases include using PageRank on

  • sports: using networks of football teams and tennis players, researchers were able to find the best teams and athletes (Jimmy Connors was returned in the top spot for tennis players)

  • literature: using a network of 19th century authors to find quantitative evidence that Jane Austin and Walter Scott were found to be the most original authors of the 19th century

  • neuroscience: using fMRI scans to generate a network where the nodes are voxels on the fMRI scan and edges between nodes represent that the voxels are strongly time correlated and a version of PageRank designed for undirected graphs, neuroscientists were able to identify parts of the brain that change together as subjects aged.

  • toxic waste management: scientists were able to use PageRank to help determine the position of water molecules in an ionic solution, enabling them to find the best ways to remove nuclear waste and toxic chemicals. According to Aurora Clark, an associate professor at WSU, once you know the probable positions of different molecules in the solution, “…you can control the chemistry and force certain reactions to occur.” PageRank essentially maps where toxic chemicals are likely to pool in the solution, enabling a waste cleanup team to quickly and efficiently contain and remove the toxic or radioactive contaminant. [3]

  • debugging: MonitorRank is a version of PageRank designed to analyze complex, engineered systems. The algorithm “returns a ranked list of systems based on the likelihood that they contributed to, or participated in, an anomalous situation.” In other words, MonitorRank is a debugging tool like no other – instead of crawling through error pages and debugging callbacks, it actually analyzes the structure of buggy system itself to suggest possible and probable causes of error.

  • Predicting road and foot traffic in urban spaces: PageRank has been found to accurately predict traffic flow on individual roads as well as connected road maps represented as graphs, where nodes are streets and intersections are edges. Furthermore, PageRank has also been found to accurately reflect observed human mobility through urban spaces, including sections of San Francisco and London.

All told, the above examples represent but a small sample of the dozens or so non-web applications of PageRank in the last 16 years. Clearly, the surprisingly wide variety of these existing applications of PageRank point to a rich future for the algorithm in research contexts of all types. It seems intuitive that any problem in any field where a network comes into play might benefit from using PageRank or another graph analysis algorithm such as HITS. The power of the network in research is only as great as our ability to extract meaning from a given network, and it’s clear the Brin and Page have given the world a remarkable tool for a such a purpose and made themselves Web Kings in the process.

  1. Comstock, Ray. “So… You Think SEO Has Changed?” Search Engine Watch. N.p., 19 Mar. 2014. Web. 03 Nov. 2014. <http://searchenginewatch.com/article/2334934/So…-You-Think-SEO-Has-Changed>.
  2. Easley, David, and Jon Kleinberg. “Chapter 14.” Networks, Crowds, and Markets: Reasoning about a Highly Connected World. New York: Cambridge UP, 2010. N. pag. Print.
  3. Garling, Caleb. “Researchers Fight Toxic Waste With Google PageRank.” WIRED.com. N.p., 16 Feb. 2012. Web. 3 Nov. 2014. <http://www.wired.com/2012/02/google-pagerank-water/>.
  4. Gleich, David F. “PageRank beyond the Web.” (n.d.): n. pag. ArXiv. 18 June 2014. Web. 3 Nov. 2014. <http://arxiv.org/pdf/1407.5107v1.pdf>.
  5. Raphael Phan Chung Wei (2002-05-16). “Resources”. New Straits Times (Computimes; 2 ed.).
  6. Rogers, Ian. “The Google Pagerank Algorithm and How It Works.” Pagerank Explained Correctly with Examples. IPR Computing Ltd., 2002. Web. 03 Nov. 2014. <http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm>.

Comments

Leave a Reply

Blogging Calendar

November 2014
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

Archives