Skip to main content



Naive Bayes in Machine Learning

One of the simplest and fastest machine learning algorithms is based on the Naive Bayes equation. In a supervised classification task, we are given training data, each with features x1, x2,…xi, and its corresponding label, Y. Then, we must assign a new label to previously unseen data. For example, in spam classification, our features could be the number of occurrences of different words, and our labels could be 0 or 1, 1 for spam and 0 for regular email. We then classify new data by asking ourselves: given the features we see in this example, which label is the most likely? In other words, which Y will maximize P(Y|X), where X is x1, x2…xi?

To solve for P(Y|X), we can use Bayes rule, since P(Y|X) = P(X|Y) P(Y)/P(X). However, it is not immediately clear how to find P(X|Y), since X contains many features. To get around this, we make the Naive Bayes assumption, assuming that given a particular label, the feature values for each xi are independent of each other. Then, P(X|Y) simply becomes P(x1|Y) *  P(x2|Y) * … P(xi|Y). We can also simplify our equation further: since we are trying to maximize P(Y|X) by varying Y, and P(X) does not depend on Y we can simply discard P(X). Thus to find P(Y|X), we only need to find P(xi|Y) for all i and P(Y).

Finding P(Y) is easy, as it is just the number of examples with the label Y divided by the number of total examples. Finding P(xi|Y) depends on the type of data we are dealing with. If the features xi are categorical, meaning they are split into categories a,b,c…, then P(xi = a|Y) is just the number of examples with label Y and feature xi = a, divided by the number of examples that have label Y. If the features are multinomial, or represent counts, then P(xi|Y) would be the total number of times xi appears in all examples labeled Y, divided by the total number of counts of all features with label Y. Then, P(xi|Y) would accurately represent the probability of seeing feature xi given a label Y. With these computed values, Bayes rule, and a related assumption, we have a simple classification algorithm that does surprisingly well on tasks such as spam classification.

Sources:

https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf

https://blog.floydhub.com/naive-bayes-for-machine-learning/

Comments

Leave a Reply

Blogging Calendar

November 2021
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  

Archives