Skip to main content



Web Search and the Wikipedia Game

The wikipedia game is a game that I used to play in computer lab in middle school. We would sit at our computers and all start at the same wikipedia page. Then we would all agree on a random page that we had to get to that doesn’t necessarily have any sort of correlation with the first page. All of us would start at the same time and by only clicking links within the wikipedia pages, we would try to be the first to get to our final destination page.

For example say we all decide to start on the wikipedia “Bee” page and end on the “Harry Styles” page. The following was the order of links I clicked to get to the final destination: “Bee”, “Socially”, “Mammal”, “Hair”, “Hairstyles”, “Europe”, “London”, “Ed Sheeran”, “One Direction”, “Harry Styles”. As you can see, two things that one might not think could connect, can actually connect, it might just take more than a few steps. If someone else were to be playing this game, they would probably come up with an entirely different pathway to get to the final page destination. The length of the pathway would also vary.

The web is an entire network of different connections where the webpages are nodes and the links/paths are hyperlinks. In my example above I would have had a total of 10 nodes, but someone else doing the game might have had 8 nodes or 15 nodes. The web itself is, for the most part, one huge component because everything has a link to everything else. However the web also contains things called Strongly Connected Components (SCC) which means that every pair of nodes in the component have a directed path (there is a way to get to and from every single node of the component). A SCC also cannot exist within another SCC, the bigger SCC always overpowers.

With the wikipedia game, you are able to create these Strongly Connected Components. In the above example, every node from “Bee” to “Harry Styles” is a SCC because every node has a way or a link to get back to every node. However, taking the nodes from “Bee” to “London” would not result in a SCC because there is a larger component and the larger always takes over.

Now, there might be some instances where this game would result in not one big SCC but maybe two smaller ones and this could be because there is one page that is so obscure that it only has one node that leads to it and one node that leaves from it that there would be no way to go backwards to get to that node. So one really has to pay attention to every single node and what direction of the links go to to be able to know if it would be considered a Strongly Connected Component.

The wikipedia game is useful to researchers to figure out connected group and can help machines be able to target audiences based off of what is viewed as related. Navigation is a large part of what the links within wikipedia can do. not many people use the search bar because they will usually make their initial search in google, click the wikipedia page and then click the hyperlinks within the page to find what they’re looking for. The ability to find which links may be missing becomes easier when there is data on the order in which people will click through the hyperlinks to get to their final destination. This just started off as a wikipedia game that people think is fun to play, but not only is it fun, but it is useful to researchers and other data scientists in their collection of data and the way people think and make connections. Further analysis of these thoughts of the users of the wikipedia game can result in better predictions and more targeted information on the web.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4664478/

Comments

Leave a Reply

Blogging Calendar

November 2022
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

Archives