## Bayesian Networks and Genetic Evidence in Paternity Cases

A Bayesian network is represented by a graph in which the nodes are probabilistic variables and the edges represent conditional probability relationships, based on Bayes’ Rule, between those nodes.  If no edge exists between nodes, then those random variables are independent.  For example, if there are three nodes in a Bayesian network, labeled A, B, and C, and a directed edge exists from node A to node C, then according to Bayes’ Rule, there exists the relationship between nodes A and C such that

If there is no edge connecting nodes A and B, they are independent, so

Not surprisingly, these graphical depictions of probabilistic dependencies have a multitude of uses. Many such networks exist, with important applications in economics, computer science, and almost any other field that involves statistical decision-making.  One particularly interesting use of such a network structure is at the intersection of biology and law.  In her 2010 paper, “Improving Forensic Identification Using Bayesian Networks and Relatedness Estimation: Allowing for Population Substructure,” Amanda Hepler explores how population substructure can affect the genetic properties of different subpopulations and how these differences in DNA can be exploited for use in paternity cases.

In general, humans tend to mate in relatively nonrandom ways and tend to mostly take partners that share the same geographical location and race as themselves, among many other factors.  This means that certain genetic allele combinations have higher probabilities of occurrence in certain subpopulations than others.  To formalize this idea, Hepler utilizes a formula, called the “population allele frequency,” that takes this population structure into account when determining the probability that a certain genetic sequence will be present in an individual’s genome.  The equation is as follows,

where Ai is the allele in question, ni is the number of that type of allele that has already been observed, pi is the frequency of that particular allele in the subpopulation under study, n is the total number of alleles, and θ represents the “inbreeding coefficient” for that specific population.

In her paper, Hepler elaborates on this idea with a simple two allele example, in which she constructs a Bayesian network to describe the genotype of an individual.  For example, in a paternity case, the putative father’s genetic network might look like this,

where “pfpg” represents the putative father’s paternal genotype, “pfmg” represents the putative father’s maternal genotype, and “pfgt” represents the putative father’s genotype.  The probabilities of the values of “pfpg” and “pfmg” can be calculated using the “population allele frequency” equation above, and because this network of genes is Bayesian, it also follows that .

This same structure can be extended to include all individuals involved in the paternity case, meaning that when the mother’s genes, the hypothetical “true” father’s genes, and the child’s genes are all considered, the overall Bayesian network for the paternity case will look something like this:

Notice that, in essence, this Bayesian network will allow one to calculate the probability that the “true” father’s genes will be two specific alleles, as determined by (or “given”) the alleles of the putative father, the mother, and the child.  This will in turn allow one to calculate the probability that the putative father and the “true” father are indeed the same person.

Clearly, Bayesian networks like these can have far-reaching consequences.  In the Networks class textbook, Bayes’ Rule is presented as “A Model of Decision-Making Under Uncertainty” and this idea is even more influential than it might at first appear.  In fact, Bayes’ Rule could be considered to be one of the most powerful theorems in statistics.  Amanda Hepler shows how these principles can be extended so far as to determine the probability that an individual is a child’s father.  Undoubtedly, many more important questions stand to be answered with the insight gained through Bayesian network models like the one presented here, which could in turn lead to exciting new developments in numerous fields.

Amanda B. Hepler’s paper on Bayesian Networks, Relatedness, and Population Substructure is available on the National Criminal Justice Reference Service web site at this URL: https://www.ncjrs.gov/pdffiles1/nij/grants/231831.pdf

-jjb284