## Bayes’ Rule in Machine Learning

In class, our examples of applications of Bayes’ rule have been somewhat straightforward, with typical examples being marbles out of a bag and the likelihood of someone having a disease given they tested positive. Another area where Bayes’ rule is applied is in machine learning. Machine learning may sound like a “buzzword” that surrounds some complicated part of the field of artificial intelligence, but at its heart machine learning can simply an application of Bayes’ rule. For example, machine learning is often used to create a classifier – some kind of function that takes in various parameters and produces a prediction of the “class” that the input parameters correspond to. One kind of machine learning employs what is known as naive Bayes’ algorithm. The “naive” is just because this algorithm assumes each parameter is independent of the other parameters, that is each parameter is conditionally independent (this is a necessary simplification because otherwise there would be far too much computation required).  For our classifier function, we want to predict the most likely class given the inputs, that is, the class with the highest pr(Class|Inputs). Look familiar? This is the start of a Bayes’ rule application – pr(Class|Inputs) = pr(Inputs|Class)*pr(Class)/pr(Inputs). Returning to our classifier – we are given random sample inputs where the desired output class is given to “train” the function – that is, we are given data that indicates pr(Class). To calculate pr(Inputs|Class) is where the earlier assumption comes in – it is assumed that pr(Inputs|Class) = pr(Input 1|Class)*pr(Input 2|Class) and so on, where the probability of all the inputs given a class is equal to the product of the individual probabilities given a class. pr(Inputs|Class) can now be calculated by modeling pr(Input i|Class) using a likelihood function (such as a gaussian distribution based on the data), and multiplying all of those input probabilities together. Interestingly enough, it isn’t actually necessary to calculate pr(Inputs) because the purpose of the classifier is to return the most likely class given the inputs – so pr(Inputs) is a constant in the denominator and won’t affect the relative probabilities of different classes for a given set of inputs. And that is all the information needed for the Naive Bayes machine learning algorithm! It essentially uses Bayes’ rule and some sample data to calculate the most likely class given some inputs! One practical application of Naive Bayes Algorithm in machine learning is used in Natural Language Processing, but the potential applications are truly endless.

Example explanation with NLP: https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/