Bots and the Spread of Information
In the aftermath of the 2016 Presidential Election, the Cambridge Analytics scandal exploded onto the headlines of news stories across the U.S. How did Cambridge Analytica get information from over 50 million Facebook users? How did facebook allow this to happen? Should someone regulate social media platforms? If so, how much regulation should there be? These questions became the focal point of conversation across the nation. The Cambridge Analytica Scandal made people realize that in the age of big data, there is a heightened risk of a malicious actor using your social media information for a malicious intent. Since social networks are typically pretty dense, information disseminates across the network quickly. One strategy to infiltrate a social network is through the use of bots, or social media robots. A bot is defined as a “a computer algorithm that automatically produces content and interacts with humans on social media, trying to emulate and possibly alter their behavior” (Ferrara et al). Bots have the ability to interact with other users and spread information.
Detecting bots in social networks can be a difficult problem because most social media platforms support billions of users. Therefore, finding a bot in the sea of Facebook users can seem like finding a needle in a haystack. As pointed out in “Graph-based Anomaly Detection and Description: A Survey” (Akoglu), there are advantages for using a graph based approach” to finding anomalous actors in a network. The researchers claim that using a graph based approach is useful because of “the relational nature of the problem domain and adversarial robustness.” In the issue of bot detection, the adversaries, a.k.a the bots, are considered robust because by a network approach, it is harder for the bot to completely “blend in” relative to the rest of the network. For example, it makes sense that a twitter bot follows more people than it has followers. This is because bots have to infiltrate the network by initially following other twitter users. If we constructed a directed graph, my guess is that a bot will have a disproportionately higher out degree versus in-degree. Additionally, my guess is that the clustering coefficient of a bot will be lower than the rest of nodes in the graph. This is because of the strong triadic closure property. If twitter users A and B follow each other and twitter users C and B follow each other, it is highly likely that A and C follow each other. I expect that this property will not typically be held with bots and can help detect bots in large networks.
Sources:
(1) “Graph-based Anomaly Detection and Description: A Survey” by Akoglu etal. https://arxiv.org/abs/1404.4679
(2) “The Rise of Social Bots” by Ferrera etal. https://m-cacm.acm.org/magazines/2016/7/204021-the-rise-of-social-bots/fulltext?mobile=true