Skip to main content



Wikipedia as a Strongly Connected Component

https://en.wikipedia.org/wiki/Wikipedia:Wiki_Game

https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy

https://www.huffpost.com/entry/wikipedia-philosophy_n_1093460

Wikipedia is a free, open content online encyclopedia created through the collaborative effort of a community of users. Overall, it compromises more than 40 million articles in 301 different languages. Wikipedia also has the essential feature of excessive internal linking. Each Wikipedia article has hyperlinks to other Wikipedia articles as well as external links.

The Wiki Game also known as Wikipedia racing, is a hypertextual game designed for Wikipedia’s use of excessive internal links specifically. Players start on a randomly selected article and must navigate to another randomly pre-selected target article, only by clicking on links within each article. The goal is to arrive at the target article in the least number of clicks.

It has been shown that almost all Wikipedia pages lead to the “Philosophy” page. Clicking on the first hyperlink in every Wikipedia article and then repeating the process for the following articles will eventually reach the “Philosophy” page. As of February 2016, 97% of all articles in Wikipedia lead to the article about Philosophy. The remaining articles lead to an article without any outgoing hyperlinks, pages that do not exist, or loops.

The theory behind this phenomenon can be related to the fact that Wikipedia can be represented as a strongly connected component. Style guidelines on how to write a Wikipedia article suggest that the article should define the topic at hand in the beginning. Therefore, the first hyperlink of each article should generally take players to a broader subject. Eventually, articles will lead to “Philosophy” since philosophy studies everything including physics, mathematics, ethics, law, politics, psychology, sociology, language, etc.

In class we learned that a strongly connected component (SCC) in a directed graph is a subset of the nodes such that (i) every node in the subset has a path to every other and (ii) the subset is not part of some larger set with the property that every node can reach every other. The first requirement for an SCC is explained by Wikipedia racing and that all pages lead to “Philosophy.” Wikipedia articles can be considered nodes and the hyperlinks within each article are paths to all the other articles. The second requirement for an SCC can be satisfied by going through all 40 million articles which are connected to “Philosophy” and each other, not including the articles without any outgoing hyperlinks, articles that do not exist, or articles in loops. Overall, Wikipedia can be considered a strongly connected component.

Comments

Leave a Reply

Blogging Calendar

October 2019
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
28293031  

Archives