Skip to main content



PageRank on Semantic Networks, with Application to Word Sense Disambiguation

Link:

http://www.cse.unt.edu/~rada/papers/mihalcea.coling04.pdf

This was an academic paper by Rada Mihalcea, Paul Tarau, and Elizabeth Figa discussing the use of the PageRank algorithm to solve a problem in natural language processing.  The idea also utilized a semantic knowledge graph called WordNet.  WordNet organizes the words in the English language into groups called “Synsets”.  Each of these synsets represents a node in the WordNet graph.  The edges in the graph come from semantic relationships defined by WordNet; these relations are connections such as ISA relationships (hypernyms and hyponym), and PART-OF relationships (meronyms and holonyms).  Since Wordnet is structured in a graph, it lends itself well to an algorithm such as PageRank.

The researchers set up the algorithm in a way such that the input was raw text and the output is a text with word meaning annotations for all open-class words.  The first step involves preprocessing the text (tagging part of speech, identifying collocations, etc).  Once the preprocessing is done, a synset graph is build for all open class words in the text.  Page Rank is run until convergence, and each ambiguous word is assigned the synset that has the highest PageRank score.

I think that this is a very cool application of PageRank. It shows that this algorithm very powerful because it can be used for many things other than just ranking web pages. It seems that anything that can be represented in terms of a graph could potentially be analyzed using the PageRank method.   Furthermore, natural language processing is a very challenging field, and I think applying the PageRank algorithm to solve a language problem is a very creative idea.  And it is fascinating to me that it produced such promising results.  As discussed in the paper, the algorithm was able to identify concepts such as “competing word senses” since they tended to share targets of incoming or outgoing links.  Because of advantages such as this, PageRank-based word sense disambiguation exceeded the baseline by a large margin and always outperformed the Lesk algorithm (which is also used for word sense disambiguation).

This paper applies to the class because it is uses the PageRank algorithm.  But instead of using it to rank websites for a search engine, these researchers used it to tackle a different problem: word sense disambiguation.  Both challenges are similar in that they each want to return an ordered list of related things for a certain text input.  For Google, they wanted to return a list of websites, whereas for these researchers, they wanted to return a list of related semantic meanings.  This just shows that the material we learn in class is useful not only in things presented in class, but can be applied to many practical problems out there in the world today.

Comments

Leave a Reply

Blogging Calendar

October 2011
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives