Here’s some Naïve ML for you guys
https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
https://www.geeksforgeeks.org/naive-bayes-classifiers/
You most probably have heard of a lil’ thing called machine learning – that stuff’s everywhere! Everyone from the technogiants like Apple and Google, to smaller enterprises and banks use it extensively to make great, almost futuristic products and aid them in decision making. Fields like augmented reality, computer vision, artificial intelligence and algorithmic trading are being completely revolutionized due to machine learning. This blog post explores one of the most widely used machine learning algorithms, the Naïve Bayes Algorithm.
It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.What does it mean for an event to be independent? Let A and B be two events. If events A and B are independent, then the probability of both events occurring , which is the product of the probability of each event occurring at all. Independence can be interpreted intuitively – essentially, event A happening/not happening does not affect the chances of event B occurring.
To make it easier to understand the Naïve Bayes algorithm, I thought it’d be better to work with an example. Consider the dataset (taken from GeekForGeeks) hat describes the weather conditions for playing a game of golf. Given all the weather data, each tuple classifies the conditions as either fit or unfit for playing.
OUTLOOK | TEMPERATURE HUMIDITY | WINDY | PLAY GOLF | ||
0 | Rainy | Hot | High | False | No |
1 | Rainy | Hot | High | True | No |
2 | Overcast | Hot | High | False | Yes |
3 | Sunny | Mild | High | False | Yes |
4 | Sunny | Cool | Normal | False | Yes |
5 | Sunny | Cool | Normal | True | No |
6 | Overcast | Cool | Normal | True | Yes |
7 | Rainy | Mild | High | False | No |
8 | Rainy | Cool | Normal | False | Yes |
9 | Sunny | Mild | Normal | False | Yes |
10 | Rainy | Mild | Normal | True | Yes |
11 | Overcast | Mild | High | True | Yes |
12 | Overcast | Hot | Normal | False | Yes |
13 | Sunny | Mild | High | True | No |
The dataset is divided into two parts, namely, feature matrix and the response vector.
- Feature matrix contains all the vectors (rows) of dataset in which each vector consists of the value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
- Response vector contains the value of class variable (prediction or output) for each row of feature matrix. In above dataset, the class variable name is ‘Play golf’.
Now, we assume that no pair of features are independent. For example, the temperature has nothing to do with whether it is windy or not. We also assume that each feature is of an equal importance when deciding the outcome.
We can now apply Bayes theorem in the following way:
Where is the class variable and X is a dependent feature vector (of size n), where:
Assuming independence of features, we get
Now, we need to create a classifier model. For this, we find the probability of given set of inputs for all possible values of the class variable y and pick up the output with maximum probability.
So, finally, we are left with the task of calculating P(y) and P(xi | y).
Please note that P(y) is also called class probability and P(xi | y) is called conditional probability. The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(xi | y). Here’s an example
Let today = (Sunny, Hot, Normal, False)
Therefore, the probability of playing golf is given by:
= 0.0141
Similarly, the probability of not playing golf = 0.0068
Scaling these probabilities, we get the probability of playing as 0.67, and so not playing as 0.33. Therefore, our predicted outcome is that we should play golf.
Wohoo, Bayes ftw!