Epidemiology and Statistical Modeling
This semester has been particularly interesting to me as I have seen tremendous overlap in many of my courses. Whether it be analysis of network effects and friendship networks in INFO 2450, or the study of online advertisement auctioning in ECE 4450, I enjoyed the practical applicability of topics covered in Networks. Recently, we were asked to perform statistical analysis of a topic of our choice for my statistics course, ENDGRD 2700. I chose to write about COVID-19, and was pleased to see concepts we learned in Networks apply to epidemiological modelling of diseases such as COVID. My paper focused on my attempts at creating a statistical model, and later a simulation, to study the spread of disease in pandemics, heavily inspired by 3Blue1Brown’s video on the same topic. In the end, while my model may not have been the most accurate, I learned a whole lot about epidemiology, including SIR models and more complex and accurate methods of modelling, some of which I would like to share here!
Starting at a baseline, the acronym SIR stands for Susceptible, Infectious, and Recovered/Removed—depending on who you ask. Those three parameters provide a simple yet powerful way to model epidemics by analyzing a population based on who currently has the disease and who it can be transmitted to. Integral to these models is the probability of infection when exposed to the pathogen, usually some percentage. In simple models, this can be a flat rate, such as 20%, for example. This would mean that if we began our simulation with one patient zero, anyone else in “contact” with this person would have a 20% chance of contracting the virus. More complex models typically deal with simulation of transmission and try to use a range of parameters to more accurately determine a dynamic probability of infection. In the 3Blue1Brown video, for example, “nodes” in the simulation must be within a defined proximity to spread the disease, so even just this simple change can be powerful in describing the effects of things such as social distancing. The video also explores using travel and simulated communities to further improve the notion of proximity, and plays around with changing the base transmission rate to approximate things such as wearing masks or quarantining of infected individuals. As with every model, there can always be improvement and addition of various factors and variables to more accurately depict a particular scenario, so it is important to note the inherent limitations present in all models, especially when dealing with something as complex as epidemiology. The video creates some really cool and thought provoking models towards the end, and uses them to observe some possible outcomes of differing levels of mask adoption, social distancing, and other preventative measures.
SIR models form part of a broad class of epidemiological models known as compartmental models, which are all based off of ordinary differential equations relating to model-specific parameters. For example, a common example of these equations for a basic SIR model is as follows:
where 𝛽 describes the infection rate/probability
𝛾 represents the recovery rate
and N is the total population
It is important to notice that while the SIR model can be modified to include things such as disease latency and control efforts, if one were to run a SIR simulation with the exact same parameters multiple times, it would produce the exact same results. Furthermore, deterministic models such as SIR and other compartmental models produce the same infection dynamics in a population of 1000 as they would in a population of 100,000. Therefore, for my model, I chose to research ABM (agent-based modeling) as a way to more accurately simulate a pandemic. ABM itself is more powerful as it considers the actions of individual “agents” or nodes representing individuals. Aside from allowing important simulation of things such as physical proximity, these ABMs serve as a great way to expand basic SIR models to include a wide range of agent behaviors such as social distancing and differing individual risk levels, to name a few.
One final modification that can be made to ABM or compartmental models that can make them more realistic is switching from deterministic probabilistic models to stochastic ones. Using the simple ODE model we arrived at earlier, we can modify the infection and recovery processes to use random statistical distributions instead of flat probabilities. By using Bernoulli random variables to simulate the infection and recovery process, we can arrive at a much more realistic view of what really happens in the real world.
The Institute for Disease Modeling (https://idmod.org/) provides an amazing resource for those attempting to create epidemiological models and even provides some open-source platforms for running ABMs and traditional compartmental models. IDM has a wide range of modelling software, so for my 2700 project I tried my best to implement my own rudimentary versions of EMOD and Covasim, two stochastic ABM frameworks. I would have loved to have been able to become more proficient in both of these systems, but due to limited time and skill level, I didn’t get as far as I would have hoped in creating complex models. However, after hours of pouring over documentation, I was successful in creating my own installations of the two programs (accessible at https://github.com/InstituteforDiseaseModeling/covasim and https://github.com/InstituteforDiseaseModeling/EMOD/releases/tag/v2.20.0).
Both platforms operate on discrete stochastic models, so for my implementation I opted to use initial difference equations:
where Pi represents the probability of infection, given by Pi = 1 – e ^ (λδ),
Pr is recovery probability, 1 – e ^ (γδ)
and lambda, also known in the documentation as the “force” of the infection is equal to (βI)/N.
EDOM and Covasim both offer extensive parameters that can be tweaked to specifically model phenomena such as travel between cities, and the effects of quarantining and social distancing. I encourage anyone interested to try to tinker with the platforms as they are both wonderfully complex and very interesting. Hopefully you have better luck than I did, but I will say I was proud to get the whole thing working, if at least partially! I tweaked one of the starter programs provided which centers around modelling a disease in current day Seattle, complete with accurate census and geographic data. By tweaking some of the model’s parameters, such as setting the R0 value to 2-4.5 (estimates for COVID), and by enabling partial social-distancing that some percentage of agents follow, I was able to produce these pretty cool graphs which nicely illustrate the so-called “second-wave” in a simulated population of 300,000. Overall, I hope to play around with the program more extensively, but was pleasantly surprised with how much I learned and accomplished in the progress!