Skip to main content



Wikipedia as a Network (The Wiki Game)

Wikipedia, a free online encyclopedia, is well-known for the range of topics that it covers. Wikipedia also has the distinct feature of excessive internal linking. That is, one will find that every Wikipedia article inevitably has hyperlinks to other Wikipedia articles, in addition to external links. Visualizing Wikipedia as a network, then, yields an interesting picture, because of the strong connectivity between multiple pairs of nodes. Since the pages are indexed by topic, this graph/network also gives us a good idea of which topics are strongly linked to a certain topic.

As of now, Wikipedia consists of over 37 million pages, in addition to millions (perhaps billions) of links between these pages (source: https://en.wikipedia.org/wiki/Wikipedia:Statistics). The size of the Wikipedia network is so big that there have been games developed around it. A popular one is called ‘Speed Race‘ — an online, real-time, multiplayer game where every individual tries to get to the destination Wikipedia article, from the starting Wikipedia article, as soon as possible. The starting and the ending pages are randomly chosen in each round of the game, and players are allowed to navigate only through Wikipedia links. The winner is decided based on the time taken to reach the destination page. Therefore, more formally, this game tests the players’ ability to reach Node B from Node A in a graph through edges that represent commonalities between different topics. A variation of this game, called ‘Least Clicks‘ also exists. As the name suggests, the winner is decided by looking at the shortest path between the two pages, rather than the shortest time taken. This is essentially a problem that can be solved using Dijkstra’s Shortest Path Algorithm — except that the player only has limited knowledge regarding the nodes and edges of the graph.

There are a few interesting patterns that emerge from these two games. On playing them yourself, you’ll see that you often find yourself following an arching path. First, you’ll look for increasingly general nodes, so that you make your ways to nodes with more outgoing connections. Once you reach a suitable ‘top’ node, you begin the reverse journey, looking for narrower links to reach your destination page. This is a very intuitive strategy, and is used by almost everyone playing this game for the first time. As a result, there’s a large percentage of valid paths that contain ‘United States’, ‘Christianity’, ‘Second World War’, ‘Europe’, ‘Jesus’, or other broad terms of the like. These are nodes that have so many outgoing connections that they are ever so often used as the ‘top’ pivot nodes. For this reason, other versions of this game, such as ‘No United States‘ and ‘Five Clicks to Jesus‘ have also been developed. A list of most-referenced nodes in the Wikipedia graph can be found here: https://en.wikipedia.org/wiki/Wikipedia:Most-referenced_articles — the listed articles are often the ones used as the top-level nodes in Speed Race paths.

These Wiki Games are interesting to study, because they reveal how our internal knowledge graphs line up with an actual, encyclopedic knowledge graph (that represents Wikipedia). The process of traversing the relevant edges to find our destination node on the web is a fascinating problem that often throws up unexpected results!

 

Sources:

https://en.wikipedia.org/wiki/Wikipedia:Wikirace

https://en.wikipedia.org/wiki/Wikipedia:Most-referenced_articles

https://en.wikipedia.org/wiki/Wikipedia:Statistics

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

http://thewikigame.com/

Comments

Leave a Reply

Blogging Calendar

October 2015
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives