Applying Bayes’ Theorem to Cornell Demographics
Bayes’ Theorem is an essential tool in statistics. Named after Thomas Bayes, it describes the conditional probability of the occurrence of an event given the prior knowledge of the conditions of another event. In Chapter 16 of the textbook, Bayes’ Theorem is introduced to analyze information cascades. However, besides networks, Bayes’ Theorem is useful to solve many other questions, which one of the interesting applications is to get insights into Cornell demographics.
There are a variety of questions we could ask about Cornell demographics and compute with Bayes’ Theorem. For example, one might wonder if we identify an Asian randomly in Ithaca, what is the probability that he or she is currently a student at Cornell University? To do so, we can apply Bayes’ Theorem:
- P(Asian | Cornell) = 0.18 (https://www.collegesimply.com/colleges/new-york/cornell-university/students/)
- P(Cornell) = total population of Cornell students / total population of Ithaca = 25582 / 32108 = 0.7967 (https://www.cornell.edu/about/facts.cfm)
- P(Asian in Ithaca) = 0.172 (https://www.census.gov/quickfacts/fact/table/ithacacitynewyork/PST040221)
Thus, P(Cornell | Asian) = P(Asian | Cornell) * P(Cornell) / P(Asian) = 0.18 * 0.7967 / 0.172 = 0.8341. It means that if we randomly ask an Asian in Ithaca, we should expect a probability of 0.8341 that he or she is currently studying at Cornell University.
Another example could be that if we randomly ask a girl on a school day on the Cornell campus (assume that she is currently a Cornell student), what is the probability that she is an undergrad in Dyson? To do so, similarly, we can apply Bayes’ Theorem:
- P(Female | Dyson Undergrad) = 0.41 (https://dyson.cornell.edu/about/)
- P(Dyson Undergrad) = total population of Dyson Undergrad / total population of Cornell students = 684 / 25582 = 0.02674 (https://www.cornell.edu/about/facts.cfm)
- P(Female at Cornell) = 0.51 (https://www.collegesimply.com/colleges/new-york/cornell-university/students/)
Thus, P(Dyson | Female) = P(Female | Dyson Undergrad) * P(Dyson Undergrad) / P(Female) = 0.41 * 0.02674 / 0.51 = 0.02150. It means that if we randomly ask a girl that is currently a Cornell student, we should expect a probability of 0.02150 that she is currently studying in Dyson.
Besides the two examples and calculations discussed above, there are a lot of interesting facts that we could get about Cornell demographics using Bayes’ Theorem. If anyone is passionate to continue the explorations, feel free to do more research on this topic and have some fun!
Sources:
https://www.collegesimply.com/colleges/new-york/cornell-university/students/
https://www.census.gov/quickfacts/fact/table/ithacacitynewyork/PST040221
https://www.cornell.edu/about/facts.cfm
www.dyson.cornell.edu/about/