Skip to main content



An Analysis of Wikiracing and the Web as a Network

https://medium.com/@hadrien.croubois/cheating-at-wikirace-an-incite-on-the-wikipedia-network-dda261496f8f

In class we learned that the structure of the web can be closely approximated by the structure of Wikipedia and the pages it contains. Pages have incoming links which we can’t see and outgoing links which are shown referenced on pages. By clicking on these links one can navigate through the web. Wikipedia is a scaled down version of this in that it is all contained on one website with millions of pages which each reference one another. A game that makes use of this fact is one called “Wikiracing” where players will compete against one another to make it from one wiki page to another. For instance they may start on a wiki page about the 1936 Berlin Olympics and then have to find their way to a page about Covid 19 as fast as possible through links located on each subsequent page. 

While this game is meant to be played by human players, in the article “Cheating at Wikirace: An incite on big networks with Wikipedia” author Hadrien Croubois attempts to create a computer program that competes and completes the game as fast as possible. When done by players the winner is usually determined by either the quickest time or the least amount of links travelled. He starts his process with the question “What would the shortest path look like, and how to find it?”. To solve this he draws inspiration from Dijkstra’s shortest path algorithm which divides pages into groups of found and not found pages. Once a page in the not found group is found it is moved to the found group and all outgoing links are then searched. This is a breadth-first search algorithm which aims to quickly analyze and sort pages on the network in order to find the shortest path. Interestingly, in his process for creating the algorithm he found “only a small fraction of these contain more than a few hundred links”. Drawing knowledge from class one could classify these pages as hubs with large hub scores as they point to a large number of pages.

After completing his algorithm, Croubois was able to draw some interesting conclusions. Firstly, he provides examples of a few paths between two pages which shows that the shortest path almost always consists of accessing a wiki page about a specific date. This is due to how specific a date can be and how it can point to an exact event from the page about that date. Thus one would classify date pages as both hubs with high hub scores as well as being part of the giant strongly connected component that makes up the majority of the web. Many pages that point to these date pages and pages that these date pages point to could be seen as the in and out of the giant strongly connected component. A second conclusion he drew from his algorithm is that it was very rare for shortest paths to exceed 8 links. This is very interesting as it relates to the concept of six degrees of separation that we learned about in class. The fact that almost all wiki pages on the internet can be connected to one another by a path of at most 8 wiki pages shows how connected a network the internet is. While this is only one example of an exploration of this concept, it goes to show that there are very few pages are truly their own page separate from the giant strongly connected component without any links going in or out of it. 

In conclusion, the online game Wikiracing provides an excellent platform to analyze the structure of the web as people must compete to find the fastest route of links from one page to another. An analysis of the game reaffirms concepts of the giant strongly connected component that contains a majority of web pages as well as that of the six degrees of separation. Furthermore it shows how networks can be found in the real world and are firmly grounded in reality.

Comments

Leave a Reply

Blogging Calendar

October 2022
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives