Skip to main content

Application of PageRank-style Algorithms in Natural Language Processing

In this post, I discuss the paper “PageRank on Semantic Networks, with application to Word Sense Disambiguation” by Mihalcia, Tarau, and Figa.

The main aim of Natural Language Processing is to create computational models that are good at understanding and generating “natural language” (i.e., the way human beings use language.) The main difficulty in NPL is that several aspects of natural languages are ambiguous to machines or computational models whereas they are intuitive to humans. One such ambiguity is that words can have multiple meanings, and thus understanding which meaning is intended depends on the context. Methods that determine which of the multiple dictionary meanings of a word is intended in a particular context are called word sense disambiguation methods.

One popular word sense disambiguation method is the Lesk algorithm which “attempts to identify the most likely meanings for the words in a given context based on a measure of contextual overlap between the dictionary definitions of the ambiguous words, or between the current context and dictionary definitions provided for a given target word” (page 3). Essentially, the Lesk algorithm goes through each dictionary definition of a word and picks the one that has the most overlap with the dictionary meanings of the words that surround the said word.

The paper “PageRank on Semantic Networks, with application to Word Sense Disambiguation” presents and analyzes a word sense disambiguation method that makes use of PageRank-style algorithms. In order to create such a method, the paper first explains that in languages, we have semantic networks, which represent the semantic relation between concepts. In particular, we have WordNet, “a lexical knowledge base for English that defines words, meanings, and relations between them” (page 3). The paper explains that we can have networks by using synsets (the basic building blocks of WordNet) as vertices and the relations between two synsets as edges. The relations can be causality, entailment, attribute, etc. This way, we can change an entire text to a directed network to which the PageRank algorithm can be applied. The paper discusses this in more detail and analyzes the results of such a method to show that the PageRank algorithm outperforms the Lesk algorithm. Furthermore, a combination of the two methods results in the most accurate method.

This paper is fascinating because it shows the similar intuition behind page ranking in the web and contextual word meaning ranking in NPL. We determine the importance of a vertex within a graph not by relying only on local vertex-specific information but by “taking into account global information recursively computed from the entire graph” (page 1). In a similar way as the PageRank prioritizes web links, the word meaning that the PageRank method offers as the most likely is the one that is most recommended by other related meanings in the text, with preference given to recommendations made by meanings that are in turn highly recommended by other related meanings.

Paper in discussion can be found at:


Leave a Reply

Blogging Calendar

October 2017