PageRank and Victorian Novels
This article described how Google’s PageRank algorithm could be applied to novels written by Victorian authors in order to find the most influential female and male writers. The novels used in the project are selected from the area the United Kingdom, Ireland, and the United States from the period 1780 to 1900. The algorithm PageRank and some machine learning techniques are incorporated into software that is used to extract thematic data on frequencies of future tomes in each novel. Every source book is compared to a target book by calculating the distances/similarities between them. The goal in this project is to find the novels that have the most and the strongest links to future tomes. After running the algorithm, it is found that Jane Austen and Walter Scott are the most influential authors of the 1900s.
PageRank is an important feature in ranking web pages and returning to users the most important and frequent links based on their searches. However, it is interesting to see that PageRank can be used in other fields outside of web searches. In this case each book corresponds to a node in the network, and the books are ranked by the distances between these book nodes and their target nodes. The book with the most similarities and shortest distances is the book that is the closest to the future tome, and also, judging by the PageRank standard, the most important piece that should be returned. The flow of this network corresponds to how much influence an author has on the other. Unfortunately both Jane Austen and Walter Scott are very early in the timeframe chosen, which means it is impossible to analyze which authors prior to this period have had a strong influence in Austen and Scott’s novels.
Besides actually ranking the novels and decide which author is the most influential during the period, PageRank algorithm is able to separate authors of different categories and distinguish them by different characteristics. For example, we know that the authors are split roughly evenly by gender, and we can see that by using PageRank algorithm, female authors and male authors are separated into two ends of the spectrum according to their style and thematic. It also helps finding a few outliers in this case, such as Uncle Tom’s Cabin by Beecher Stowe, which is supposed to resemble more to novels written by females but in fact shares more similarities with novels written by males.
Source: http://www.wired.co.uk/news/archive/2012-08/17/influential-literature-algorithm.