## MIT Study: How can we use SIR models to predict the source of viruses in computer networks?

https://dspace.mit.edu/openaccess-disseminate/1721.1/63150

https://www.theguardian.com/technology/2014/may/06/antivirus-software-fails-catch-attacks-security-expert-symantec

In lecture, we are beginning to discuss the spread of diseases from person to person. We have discussed how contagion works in contact networks. An interesting application I am going to explore is the “transmission” and spread of viruses through computer networks. Computer viruses are designed to spread and replicate, using network connections to infiltrate files and steal data. Models for epidemics of human viruses in populations can also be used to model the spread of viruses on computers. This is especially important to study since the computer and the Internet has become a central part of the personal, professional, financial, and educational aspects of our lives. Despite the advancement of antivirus software, malware has taken on novel approaches, causing  many antivirus software companies to modify their missions from “protect” to “detect and respond”.  Being able to identify the source of these viruses is crucial to mitigating the damage they cause.

MIT conducted a study about identifying the source of viruses in a network using an SIR model based on rumor centrality. This was done under the premise that computers in a network are infected with a virus and all that is known (besides which computers are infected) is which computers have been communicating with each other. Other things, such as when the computer was infected are not known. Although there has been a lot of work on understanding how the structure of networks facilitate or hinder the spread of computer viruses, little work has been done to identify the source of these viruses, due to the fact that constructing a source estimator is very complex.

MIT’s SIR model uses a countably infinite set of nodes, whereby a virus can be spread only if there is an edge between them. The study compared two source estimators, one based on rumor centrality and the other based on distance centrality. The rumor center is a combinatorial quantity. Distance centrality considers the shortest paths in the network (i.e the distance center is the node closest to all other nodes). You can read about the derivation of the formulas in the study linked above. My main takeaway from the study (aside from the fact that the math is very complicated) is that the effectiveness of the source estimator depends on the type of network being analyzed. This is why virus source identification is so difficult. On trees, the source identifying method is arbitrary because the rumor center equals the distance center. In non-tree-like networks, however, rumor centrality outperforms distance centrality. For example, in small-world computer networks, rumor centrality more accurately identifies the source of a virus (it has 16% correct detection versus 2% correct detection for distance centrality).

Studying and trying to predict the source of computer viruses in addition to the  spread of computer viruses is a practical approach that will overall help stop malware. This integrated approach is likely to be used in the future as more studies are conducted.