Here’s some Naïve ML for you guys : Networks Course blog for INFO 2040/CS 2850/Econ 2040/SOC 2090

You most probably have heard of a lil’ thing called machine learning – that stuff’s everywhere! Everyone from the technogiants like Apple and Google, to smaller enterprises and banks use it extensively to make great, almost futuristic products and aid them in decision making. Fields like augmented reality, computer vision, artificial intelligence and algorithmic trading are being completely revolutionized due to machine learning. This blog post explores one of the most widely used machine learning algorithms, the Naïve Bayes Algorithm.

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.What does it mean for an event to be independent? Let A and B be two events. If events A and B are independent, then the probability of both events occurring , which is the product of the probability of each event occurring at all. Independence can be interpreted intuitively – essentially, event A happening/not happening does not affect the chances of event B occurring.

To make it easier to understand the Naïve Bayes algorithm, I thought it’d be better to work with an example. Consider the dataset (taken from GeekForGeeks) hat describes the weather conditions for playing a game of golf. Given all the weather data, each tuple classifies the conditions as either fit or unfit for playing.

	OUTLOOK	TEMPERATURE HUMIDITY		WINDY	PLAY GOLF
0	Rainy	Hot	High	False	No
1	Rainy	Hot	High	True	No
2	Overcast	Hot	High	False	Yes
3	Sunny	Mild	High	False	Yes
4	Sunny	Cool	Normal	False	Yes
5	Sunny	Cool	Normal	True	No
6	Overcast	Cool	Normal	True	Yes
7	Rainy	Mild	High	False	No
8	Rainy	Cool	Normal	False	Yes
9	Sunny	Mild	Normal	False	Yes
10	Rainy	Mild	Normal	True	Yes
11	Overcast	Mild	High	True	Yes
12	Overcast	Hot	Normal	False	Yes
13	Sunny	Mild	High	True	No

The dataset is divided into two parts, namely, feature matrix and the response vector.

Feature matrix contains all the vectors (rows) of dataset in which each vector consists of the value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
Response vector contains the value of class variable (prediction or output) for each row of feature matrix. In above dataset, the class variable name is ‘Play golf’.

Now, we assume that no pair of features are independent. For example, the temperature has nothing to do with whether it is windy or not. We also assume that each feature is of an equal importance when deciding the outcome.

We can now apply Bayes theorem in the following way:

Where is the class variable and X is a dependent feature vector (of size n), where:

Assuming independence of features, we get

Now, we need to create a classifier model. For this, we find the probability of given set of inputs for all possible values of the class variable y and pick up the output with maximum probability.

So, finally, we are left with the task of calculating P(y) and P(xi | y).

Please note that P(y) is also called class probability and P(xi | y) is called conditional probability. The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of P(xi | y). Here’s an example

Let today = (Sunny, Hot, Normal, False)

Therefore, the probability of playing golf is given by:

= 0.0141

Similarly, the probability of not playing golf = 0.0068

Scaling these probabilities, we get the probability of playing as 0.67, and so not playing as 0.33. Therefore, our predicted outcome is that we should play golf.

Wohoo, Bayes ftw!

November 24, 2019 | category: Uncategorized

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Networks

Here’s some Naïve ML for you guys

Comments

Leave a Reply

Blogging Calendar

Archives