Wikipedia Articles live in a Small World
According to Stephen Dolan, who wrote the article this post is discussing, Wikipedia contained 2,301,486 articles (nodes) with 55,550,003 links (edges) connecting them at the time the article was written. His original goal was to find the articles which were the closest to every other article or the center of the network. He quickly found that this is impractical as Wikipedia has chains of articles that linearly link to each other and so make determining the center of the data set difficult. He instead takes the strongly connected portions of the network, which consist of a group of 2,111,480 articles that are all interconnected, and leaves out the rest of the pages because they are either disambiguation pages or other articles that cannot be navigated to via other pages. Reanalyzing the data he finds that it takes an average of 3.45 clicks to get from the wiki page for ‘2007’ to any other of the pages in his set of strongly linked articles.
This linked characteristic of Wikipedia articles has sparked game ideas such as the three degrees of Wikipedia(http://www.threewiki.com/) in which you must try to navigate link by link from one article to another target one. The natural question is that despite a large amount of nodes in the graph, why is the average path length between any two pages so low? In fact it is just a natural consequence of graph theory because of the fact that each node has an average of about 26 edges connected to it. As with the friendship example described in class, just as it is unlikely that every person in our class has 26 unique friends, it is unlikely that every Wikipedia article has 26 unique links in it and this leads to the low average path length between articles. Therefore we find that Wikipedia is another network that demonstrates the small world phenomenon.