I want to devote this post to a very different modeling style which neither statisticians nor ML-types devote much attention to: what I will refer to as mechanistic models. I think these are worthwhile discussing for a number of reasons.
- In one sense, they represent one of the best arguments against the ML viewpoint in terms of identifying where human intelligence and understanding becomes important to science.
- I want to distinguish mechanistic from intepretable in this context. In particular, my concerns are not really about the benefits of mechanistic models (although this is also an interesting topic) and I want to clarify this.
- Statisticians rarely think of modeling in these terms and I think this represents one of the discipline’s greatest deficiencies.
The sense in which I use mechanistic is somewhat broader than is sometimes employed (ie, it encompasses more than simply physical mechanics). The distinction I am making is between these and what I would describe as data-descriptive models; it also roughly distinguishes the models employed by applied mathematicians from those used by statisticians.
To make it clear for the physicists: I use the word interpretable to be a property of the mathematical form that a model takes, not of its real-world meaning. Ie, I am asking “Should we worry about whether we can understand what the mathematics does?” I am aware of the vagueness of the term “understand” — that’s a large part of the reason for this blog.
Essentially, mechanistic models are generally dynamic models based around a description of processes that we believe are happening in a system, even if we cannot observe these particularly well. i.e. they provide a mechanism that generates the patterns we see. They are often given by ordinary differential equations, but this has mostly been because ODE’s are easy to analyze, and we can be broader than that. ***
The simplest example that I can think of is the SIR model to describe epidemics and I think this will make a good exposition. We want to describe how a disease spreads through a population. To do so, we’ll divide the population into susceptible individuals (S) who have not been exposed to the disease, infectious (I) who are currently sick, and recovered (R) who have recovered and are now immune***. Any individual has a progression through these stages S -> I -> R; we now need to describe how the progression comes about.
I -> R is the easiest of these to model — people get sick and stay sick for a while before recovering. Since each individual is different, we can expect the length of time that an individual stays sick to be random. For convenience, an exponential distribution is often used (say with parameter m), although the realism of this is debatable.
S -> I is more tricky. In order to become sick you must get infected, presumably by contact with someone in the I group. This means that we must describe both how often you come in contact with an I, and the chances of becoming infected if you do. The simplest models envision that the more I’s there are around, the sooner an S will bump into one and become infected. If we model this waiting time by an exponential distribution (for each S) we give it parameter bI so the more I there are, the sooner you get infected.
If you turn this individual-level model into aggregate numbers (assuming exponential distributions again because of their memoryless property), you get I -> R at rate mI (since we’re talking about the whole I population) and S -> I at rate bSI. You can simulate the model for individuals, or in terms of aggregate quantities, or if the population is large enough (and you re-scale so we don’t have individuals, but a proportion) we can approximate it all by an ODE:
DS = – bSI
DI = bSI – mI
DR = mR
where DS means the time-derivative of S. Doing this turns the model into a deterministic system which can be a reasonable approximation, especially for mathematical analysis, although in real data the noise from individual variability is often evident.
There are obviously many ways to make this model more complicated — stages of disease progression, sub-populations that mix with each other more than others, geographic spread, visitors, immunization, loss of immunity and a whole bunch of others. The epidemiological literature is littered with these types of elaboration.
The point of this model is that it tells a coherent story about what is happening in the system and how it is happening, hence the moniker “mechanistic”. This is in contrast to most statistical and ML models that seek to describe static relationships without concern as to how they came about — even time-series models are usually explanation-free. I have also avoided the term “causal” — although it would be quite appropriate here — in order to not confuse it with the statistical notions of causal modeling as studied by Judea Pearl, which are similarly static.
Having gone through all this, there are some observations that I now want to make:
1. I think we can distinguish mechanistic versus interpretable here. My father would be inclined to view this type of model as the only type worth interpreting — he sniffly dismissed the models I examined earlier as all being “correlational”, and would presumably say the same thing of causal models in Pearl’s sense.
I’m not sure he’s wrong in that (see below), but it’s not quite the problem that I want to examine in this blog and I think I can make some distinctions here: while the structure of the SIR model above is clearly motivated by mechanisms, a substantial part of it is dictated by mathematical convenience rather than realism. The exponential distribution, and an assumption that an S is as likely to run into one I as any other are cases in point. Moreover there is no particular reason why the description of some of these mechanisms should have algebraically elegant forms. Newton’s law of gravity, for example, would still be a mechanistic description if the force decayed according to some algebraically-complicated function of the distance between objects rather than the inverse square (even if this would be less mathematically elegant).
Indeed, one might imagine employing ML to obtain some term in a mechanistic model if the relationship was complex and there were data that allowed ML to be used. For example, the bSI term in the SIR model is an over-simplification and is often made more complex — it’s not clear that using some black-box model here would really remove much by the way of interpretation. My central concern — esoteric though it may be — is with regard to the algebraic (or, more generally, cognitive) simplicity of the mathematical functions that we use.
2. Mechanistic models do, however, provide some more-compelling responses to the ML philosophy. A mechanistic understanding of a system is more suggestive of which additional measurements of a system are going to allow for better prediction and therefore what we might want to target. In work I do with colleagues in ecology, we believe that some dynamics are driven by sub-species structure and this suggests we will be able to parse this out better after genotyping individuals. Similarly, it allows us to conceive of interventions in the system that we might hope will either test our hypotheses, or pin down certain system parameters more accurately.
An ML philosophy might retort that we can, of course, predict the future with a black box model, just give us some data. That mechanistic interpretation is mostly developed post-hoc and humans have many times been shown to be very good at making up stories to explain whatever data they see (more on that in another post) and that active learning looks at what new observations would be most helpful, and you could pose this problem in that context, too. Of course, this does rather rely on the circular argument “interpretation is bullshit therefore interpretation is bullshit”.
3. As a statistician who has spent a considerable amount of time working on these types of models, I am distressed at how foreign this type of modeling is to most of my colleagues. Almost all models taught (and used) in statistics are some variant on linear regression, and basically none attempt to explain how the relationships we see in data come about — even the various time series models (ARIMA, GARCH etc) take a punt on this. The foreignness of these modeling frameworks to statisticians is, I suspect, because they make up no part of the statistical curriculum (when faced with a particularly idiotic referee report I’m somewhat inclined to say it’s that statisticians just aren’t that good at math, myself included) and I think this is the case for three reasons:
a) On the positive side, statisticians have had a healthy skepticism of made-up models (and ODEs really do tend to not fit data well at all). Much of the original statistical modeling categorized the world in terms of levels of an experiment so that exact relationships did not have to be pinned down: your model described plant growth at 0.5kg of fertilizer and 1kg of fertilizer separately and didn’t worry about what happened at 0.75kg. I’m fairly sure many statisticians would be as skeptical about SIR models as and ML-proponent, particularly given all the details it leaves out.
b) More neutrally, in many disciplines such mechanistic explanations simply aren’t available, or are too remote from the observations to be useful. To return to the agricultural example above, we know something about plant biochemistry, but there is a long chain of connections between fertilizing soil, wash-out with water, nutrient uptake and physical growth. When the desire is largely to assess evidence for the effectiveness of the fertilizer, something more straightforward is probably useful.
Of course, statisticians have chosen these fields, and have not attempted to generate mechanistic models of the processes involved. I sometimes feel that this is due to an inclination to work with colleagues who are less mathematically sophisticated than the statistician and hence cannot question their expertise. I sometimes also think it’s due to a lack of interest in the science, or at least the very generalist approach that statisticians take which means that they don’t know enough of any science to attempt a mechanistic explanation. Both of these may be unfair — see uncharitable parenthetical comments above.
c) Most damningly, it isn’t particularly easy to conduct the sort of theoretical analysis that statisticians like to engage in for these models. And it makes this type of work difficult to publish in journals that have a theory fetish. There are plenty of screeds out there condemning this aspect of statistics and I won’t add another here: it’s not as bad as it used to be (in fact, it never was) and theory can be of enormous practical import. However, convenient statistical theory does tend to drive methodology more than it ought, and it does drive the models that statisticians like to examine.
Of course, everyone thinks that all other researchers should do only what they do. *** Case in point was Session 80 at ENAR 2014 which convinced my cynical view that “Big Data” did indeed have a precise definition: it’s whatever the statistician in question found interesting. I’m not an exception to this, but then blogs are a vehicle for espousing opinions that couldn’t get published in more thoughtful journals, so….
In any case, mechanistic modeling might be an answer to ML (see Nate Silver for practical corroboration) and I might explore that in more detail. They are distinct from interpretable models, and although mechanistic models generally employ interpretable mathematical forms, they need not do so. Up next: what can we understand besides simple algebra?
*** Anyone who has examined data from outside the physical sciences should find the idea that an ODE generated it to be laughable, although the ODE can be a useful first approximation.
*** Alternatively R can mean “removed” or dead.
*** This is foolish: who wants all that competition?