Analyzing Character Networks With Graph Theory
Kirell Benzi, Ph.D. is a data scientist who used graph theory to analyze the network of characters for a certain fan wiki. 21,647 articles on the Star Wars fan wiki represent characters within this fictional universe. In order to model this massive network of characters, Benzi represented each character’s article in the wiki as a node. If one character’s article mentioned or linked to another character’s article, this link would be represented with an edge on the network graph. To determine which characters were the most connected, Benzi found the degree (number of edges originating from a certain node) for every character’s node. As expected, the most well-connected characters are considered “main characters” within the larger context of the story (Darth Vader, Luke Skywalker, and other characters from the films). This reveals a pattern in storytelling that makes intuitive sense: that protagonists will naturally have the furthest reach to every character in the universe because the story is told through their perspective.
As the Star Wars timeline spans over 36,000 years, one way Benzi represented the graph of character connections was color-coding the graph by era, with each node’s color representing the time period that the character lived in. Benzi noted that there were many character articles for which a time period was missing, which he represented by a black node. Using graph theory, Benzi was able to predict what eras these characters were living in. From our class on graph theory, we can infer that when nodes are connected to other nodes, they are more likely to have similar attributes. Therefore, if a node is connected to other nodes representing characters from a certain era, we can say with some degree of confidence that this node also represents a character from that era. Not only is this ability to discern attributes about certain nodes useful within the context of filling in missing information within the Star Wars wiki, but it can also do so for other networks where information is missing, such as the real Wikipedia.
Benzi also noticed that the graph of character connections was disconnected, which means that it has multiple connected components. We can see in this new visual representation of the network (pictured) that there seems to be one massive cluster of nodes that is connected to a smaller one, and then many small, disconnected nodes. In class, we learned that it is highly unlikely that our global network would have two massive, disconnected components because all it would take for those components to be joined together would be one person from one of the massive components to meet another person from the other massive component, which is very likely to happen. We also noted in class how there might be small “pocket” communities that are isolated from the outside world, so it is interesting to note that such pockets also exist within this fictional universe. However, we must also be aware that this fan wiki is not entirely accurate—it may be missing edges between characters that might otherwise be linked due to incomplete information. Benzi reasons that, similar to how one can predict the era that a character lived in, one can also try to find characters that are likely to be connected in order to fill in this missing information on the Wiki. We could use the triadic closure property (which states that when one node is connected to two other nodes, those nodes are more likely to be connected) to see when nodes might be more likely to be connected to other nodes.
Overall, while analyzing the network of Star Wars characters may seem on the surface like a niche decision, it speaks to the larger idea of how fiction mimics real life, suggesting how the global network of relationships between all people on Earth may look like and evolve similarly to the fictional microcosm that Benzi analyzed here. It also speaks to the nature of missing information, especially on resources like Wikipedia, and in literature or history. Historians and scientists can use graph theory in order to figure out what our gaps in any knowledge-representing network might be, and how to best fill them in with further research.
Sources:
https://www.kirellbenzi.com/blog/exploring-the-star-wars-expanded-universe
https://www.kirellbenzi.com/blog/exploring-the-star-wars-expanded-universe-part-2

