Search

Machine learning Filter

What is Naive Bayes in Machine Learning

Naive Bayes is a simple but surprisingly powerful probabilistic machine learning algorithm used for predictive modeling and classification tasks. Some typical applications of Naive Bayes are spam filtering, sentiment prediction, classification of documents, etc. It is a popular algorithm mainly because it can be easily written in code and predictions can be made real quick which in turn increases the scalability of the solution. The Naive Bayes algorithm is traditionally considered the algorithm of choice for practical-based applications mostly in cases where instantaneous responses are required for user’s requests.It is based on the works of the Rev. Thomas Bayes and hence the name. Before starting off with Naive Bayes, it is important to learn about Bayesian learning, what is ‘Conditional Probability’ and ‘Bayes Rule’.Bayesian learning is a supervised learning technique where the goal is to build a model of the distribution of class labels that have a concrete definition of the target attribute. Naïve Bayes is based on applying Bayes' theorem with the naïve assumption of independence between each and every pair of features.What is Conditional Probability?Let us start with the primitives by understanding Conditional Probability with some examples.Example 1Consider you have a coin and fair dice. When you flip a coin, there is an equal chance of getting either a head or a tail. So you can say that the probability of getting heads or the probability of getting tails is 50%.Now if you roll the fair dice, the probability of getting 1 out of the 6 numbers would be 1/6 = 0.166. The probability will also be the same for other numbers on the dice.Example 2Consider another example of playing cards. You are asked to pick a card from the deck. Can you guess the probability of getting a king given the card is a heart?The given condition here is that the card is a heart, so the denominator has to be 13 (there are 13 hearts in a deck of cards) and not 52. Since there is only one king in hearts, so the probability that the card is a king given it is a heart is 1/13 = 0.077.So when you say the conditional probability of A given B, it refers to the probability of the occurrence of A given that B has already occurred. This is a typical example of conditional probability.Mathematically, the conditional probability of A given B can be defined as P(A AND B) / P(B).Example 3Let us see another slightly complicated example to understand conditional probability better.Consider a school with a total population of 100 people. These 100 people can be classified as either ‘Students’ and ‘Teachers’ or as a population of ‘Males’ and ‘Females’.With the table below of the 100 people tabulated in some form, what will be the conditional probability that a certain person of the school is a ‘Student’ given that she is a ‘Female’?FemaleMaleTotalTeacher101020Student305080Total4060100To compute this, you can filter the sub-population of 40 females and focus only on the 30 female students. So the required probability stands as P(Student | Female) = 30/40 = 0.75 .P(Student | Female) = [P(Student ∩ Female)] / [P(Female)] = 30/40 = 0.75This is defined as the intersection(∩) of Student(A) and Female(B) divided by Female(B). Similarly, the conditional probability of B given A can also be calculated using the same mathematical expression.What is Bayes' Theorem?Bayes' Theorem helps you examine the probability of an event based on the prior knowledge of any event that has correspondence to the former event. Its uses are mainly found in probability theory and statistics. The term naive is used in the sense that the features given to the model are not dependent on each other. In simple terms, if you change the value of one feature in the algorithm, it will not directly influence or change the value of the other features.Consider for example the probability that the price of a house is high can be calculated better if we have some prior information like the facilities around it compared to another assessment made without the knowledge of the location of the house. P(A|B) = [P(B|A)P(A)]/[P(B)]The equation above shows the basic representation of the Bayes' theorem where A and B are two events and:P(A|B): The conditional probability that event A occurs, given that B has occurred. This is termed as the posterior probability. P(A) and P(B): The probability of A and B without any correspondence with each other. P(B|A):  The conditional probability of the occurrence of event B, given that A has occurred.Now the question is how you can use Bayes' Theorem in your machine learning models. To understand it clearly, let us take an example. Consider a simple problem where you need to learn a machine learning model from a given set of attributes. Then you will have to describe a hypothesis or a relation to a response variable and then using this relation, you will have to predict a response, given the set of attributes you have. You can create a learner using Bayes' Theorem that can predict the probability of the response variable that will belong to the same class, given a new set of attributes. Consider the previous question again and then assume that A is the response variable and B is the given attribute. So according to the equation of Bayes' Theorem, we have:P(A|B): The conditional probability of the response variable that belongs to a particular value, given the input attributes, also known as the posterior probability.P(A): The prior probability of the response variable.P(B): The probability of training data(input attributes) or the evidence.P(B|A): This is termed as the likelihood of the training data.The Bayes' Theorem can be reformulated in correspondence with the machine learning algorithm as:posterior = (prior x likelihood) / (evidence)Let’s look into another problem. Consider a situation where the number of attributes is n and the response is a Boolean value. i.e. Either True or False. The attributes are categorical (2 categories in this case). You need to train the classifier for all the values in the instance and the response space.This example is practically not possible in most machine learning algorithms since you need to compute 2∗(2^n-1) parameters for learning this model.  This means for 30 boolean attributes, you will need to learn more than 3 billion parameters which is unrealistic.What is a Naive Bayes Classifier?A classifier is a machine learning model which is used to classify different objects based on certain behavior. Naive Bayes classifiers in machine learning are a family of simple probabilistic machine learning models that are based on Bayes' Theorem. In simple words, it is a classification technique with an assumption of independence among predictors.The Naive Bayes classifier reduces the complexity of the Bayesian classifier by making an assumption of conditional dependence over the training dataset.Consider you are given variables X, Y, and Z. X will be conditionally independent of Y given Z if and only if the probability distribution of X is independent of the value of Y given Z. This is the assumption of conditional dependence.In other words, you can also say that X and Y are conditionally independent given Z if and only if, the knowledge of the occurrence of X provides no information on the likelihood of the occurrence of Y and vice versa, given that Z occurs. This assumption is the reason behind the term naive in Naive Bayes.The likelihood can be written considering n different attributes as:                n           P(X₁...Xₙ|Y) = π P(Xᵢ|Y)        i=1In the mathematical expression, X represents the attributes, Y represents the response variable. So, P(X|Y) becomes equal to the product of the probability distribution of each attribute given Y.Maximizing a PosterioriIf you want to find the posterior probability of P(Y|X) for multiple values of Y, you need to calculate the expression for all the different values of Y. Let us assume a new instance variable X_NEW. You need to calculate the probability that Y will take any value given the observed attributes of X_NEW and given the distributions P(Y) and P(X|Y) which are estimated from the training dataset. In order to predict the response variable depending on the different values obtained for P(Y|X), you need to consider a probable value or the maximum of the values. Hence, this method is known as maximizing a posteriori.Maximizing LikelihoodYou can simplify the Naive Bayes algorithm if you assume that the response variable is uniformly distributed which means that it is equally likely to get any response. The advantage of this assumption is that the a priori or the P(Y) becomes a constant value. Since the a priori and the evidence become independent from the response variable, they can be removed from the equation. So, maximizing the posteriori becomes maximizing the likelihood problem.How to make predictions with a Naive Bayes model?Consider a situation where you have 1000 fruits which are either ‘banana’ or ‘apple’ or ‘other’. These will be the possible classes of the variable Y.The data for the following X variables all of which are in binary (0 and 1):Long SweetYellowThe training dataset will look like this:FruitLong (x1)Sweet (x2)Yellow (x3)Apple001Banana101Apple010Other111........Now let us sum up the training dataset to form a count table as below:TypeLongNot LongSweetNot sweetYellowNot YellowTotalBanana40010035015045050500Apple03001501503000300Other1001001505050150200Total5005006503508002001000The main agenda of the classifier is to predict if a given fruit is a ‘Banana’ or an ‘Apple’ or ‘Other’ when the three attributes(long, sweet and yellow) are known.Consider a case where you’re given that a fruit is long, sweet and yellow and you need to predict what type of fruit it is. This case is similar to the case where you need to predict Y only when the X attributes in the training dataset are known. You can easily solve this problem by using Naive Bayes.The thing you need to do is to compute the 3 probabilities,i.e. the probability of being a banana or an apple or other. The one with the highest probability will be your answer. Step 1:First of all, you need to compute the proportion of each fruit class out of all the fruits from the population which is the prior probability of each fruit class. The Prior probability can be calculated from the training dataset:P(Y=Banana) = 500 / 1000 = 0.50P(Y=Apple) = 300 / 1000 = 0.30P(Y=Other) = 200 / 1000 = 0.20The training dataset contains 1000 records. Out of which, you have 500 bananas, 300 apples and 200 others. So the priors are 0.5, 0.3 and 0.2 respectively. Step 2:Secondly, you need to calculate the probability of evidence that goes into the denominator. It is simply the product of P of X’s for all X:P(x1=Long) = 500 / 1000 = 0.50P(x2=Sweet) = 650 / 1000 = 0.65P(x3=Yellow) = 800 / 1000 = 0.80Step 3:The third step is to compute the probability of likelihood of evidence which is nothing but the product of conditional probabilities of the 3 attributes. The Probability of Likelihood for Banana:P(x1=Long | Y=Banana) = 400 / 500 = 0.80P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90Therefore, the overall probability of likelihood for banana will be the product of the above three,i.e. 0.8 * 0.7 * 0.9 = 0.504.Step 4:The last step is to substitute all the 3 equations into the mathematical expression of Naive Bayes to get the probability.P(Banana|Long,Sweet and Yellow)  =   [P(Long|Banana)∗P(Sweet|Banana)∗P(Yellow|Banana) x P(Banana)] /                              [P(Long)∗P(Sweet)∗P(Yellow)]=  0.8∗0.7∗0.9∗0.5/[P(Evidence)] = 0.252/[P(Evidence)]P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0P(Other|Long,Sweet and Yellow) = 0.01875/P(Evidence)In a similar way, you can also compute the probabilities for ‘Apple’ and ‘Other’. The denominator is the same for all cases. Banana gets the highest probability, so that will be considered as the predicted class.What are the types of Naive Bayes classifier?The main types of Naive Bayes classifier are mentioned below:Multinomial Naive Bayes — These types of classifiers are usually used for the problems of document classification.  It checks whether the document belongs to a particular category like sports or technology or political etc and then classifies them accordingly. The predictors used for classification in this technique are the frequency of words present in the document. Complement Naive Bayes — This is basically an adaptation of the multinomial naive bayes that is particularly suited for imbalanced datasets.  Bernoulli Naive Bayes — This classifier is also analogous to multinomial naive bayes but instead of words, the predictors are Boolean values. The parameters used to predict the class variable accepts only yes or no values, for example, if a word occurs in the text or not. Out-of-Core Naive Bayes — This classifier is used to handle cases of large scale classification problems for which the complete training dataset might not fit in the memory. Gaussian Naive Bayes — In a Gaussian Naive Bayes, the predictors take a continuous value assuming that it has been sampled from a Gaussian Distribution. It is also called a Normal Distribution.Since the likelihood of the features is assumed to be Gaussian, the conditional probability will change in the following manner:P(xᵢ|y) = 1/(√2пσ²ᵧ) exp[ –(xᵢ - μᵧ)²/2σ²ᵧ]What are the pros and cons of the Naive Bayes?The naive Bayes algorithm has both its pros and its cons. Pros of Naive Bayes —It is easy and fast to predict the class of the training data set. It performs well in multiclass prediction.It performs better as compared to other models like logistic regression while assuming the independent variables.It requires less training data. It performs better in the case of categorical input variables as compared to numerical variables.Cons of Naive Bayes —The model is not able to make a prediction in situations where the categorical variable has a category that was not observed in the training data set and assigns a 0 (zero) probability to it. This is known as the ‘Zero Frequency’. You can solve this using the Laplace estimation.Since Naive Bayes is considered to be a bad estimator, the probability outputs are not taken seriously.Naive Bayes works on the principle of assumption of independent predictors, but it is practically impossible to get a set of predictors that are completely independent.What is Laplace Correction?When you have a model with a lot of attributes, it is possible that the entire probability might become zero because one of the feature’s values is zero. To overcome this situation, you can increase the count of the variable with zero to a small value like in the numerator so that the overall probability doesn’t come as zero. This type of correction is called the Laplace Correction. Usually, all naive Bayes models use this implementation as a parameter.What are the applications of Naive Bayes? There are a lot of real-life applications of the Naive Bayes classifier, some of which are mentioned below:Real-time prediction — It is a fast and eager machine learning classifier, so it is used for making predictions in real-time. Multi-class prediction — It can predict the probability of multiple classes of the target variable. Text Classification/ Spam Filtering / Sentiment Analysis — They are mostly used in text classification problems because of its multi-class problems and the independence rule. They are used for identifying spam emails and also to identify negative and positive customer sentiments on social platforms. Recommendation Systems — A Recommendation system is built by combining Naive Bayes classifier and Collaborating Filtering. It filters unseen information and predicts whether the user would like a given resource or not using machine learning and data mining techniques. How to build a Naive Bayes Classifier in Python?In Python, the Naive Bayes classifier is implemented in the scikit-learn library. Let us look into an example by importing the standard iris dataset to predict the Species of flowers:# Import packages from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns; sns.set() # Import data training = pd.read_csv('/content/iris_training.csv') test = pd.read_csv('/content/iris_test.csv') # Create the X, Y, Training and Test X_Train = training.drop('Species', axis=1) Y_Train = training.loc[:, 'Species'] X_Test = test.drop('Species', axis=1) Y_Test = test.loc[:, 'Species'] # Init the Gaussian Classifier model = GaussianNB() # Train the model model.fit(X_Train, Y_Train) # Predict Output pred = model.predict(X_Test) # Plot Confusion Matrix mat = confusion_matrix(pred, Y_Test) names = np.unique(pred) sns.heatmap(mat, square=True, annot=True, fmt='d', cbar=False,         xticklabels=names, yticklabels=names) plt.xlabel('Truth') plt.ylabel('Predicted')The output will be as follows:Text(89.18, 0.5, 'Predicted')How to improve a Naive Bayes Model?You can improve the power of a Naive Bayes model by following these tips:Transform variables using transformations like BoxCox and YeoJohnson to make continuous features to normal distribution.Use Laplace Correction for handling zero values in X variables and to predict the class of test data set for zero frequency issues. Check for correlated features and remove the highly correlated ones because they are voted twice in the model and might lead to over inflation.Combine different features to make a new product which makes some intuitive sense. Provide more realistic prior probabilities to the algorithm based on knowledge from business. Use ensemble methods like bagging and boosting to reduce the variance. SummaryLet us see what we have learned so far —Naive Bayes and its typesPros and Cons of Naive BayesApplications of Naive BayesHow a Naive Bayes model makes the predictionCreating a Naive Bayes classifierImproving a Naive Bayes modelNaive Bayes is mostly used in real-world applications like sentiment analysis, spam filtering, recommendation systems, etc. They are extremely fast and easy to implement as compared to other machine learning models. However, the biggest drawback of Naive Bayes is the requirement of predictors to be independent. In most real-life cases, the predictors are dependent in nature which hinders the performance of the classifier.  We have covered most of the topics related to algorithms in our series of machine learning blogs, click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.
Rated 4.5/5 based on 2 customer reviews

What is Naive Bayes in Machine Learning

5114
What is Naive Bayes in Machine Learning

Naive Bayes is a simple but surprisingly powerful probabilistic machine learning algorithm used for predictive modeling and classification tasks. Some typical applications of Naive Bayes are spam filtering, sentiment prediction, classification of documents, etc. It is a popular algorithm mainly because it can be easily written in code and predictions can be made real quick which in turn increases the scalability of the solution. The Naive Bayes algorithm is traditionally considered the algorithm of choice for practical-based applications mostly in cases where instantaneous responses are required for user’s requests.

It is based on the works of the Rev. Thomas Bayes and hence the name. Before starting off with Naive Bayes, it is important to learn about Bayesian learning, what is ‘Conditional Probability’ and ‘Bayes Rule’.

What is Naive Bayes in Machine Learning

Bayesian learning is a supervised learning technique where the goal is to build a model of the distribution of class labels that have a concrete definition of the target attribute. Naïve Bayes is based on applying Bayes' theorem with the naïve assumption of independence between each and every pair of features.

What is Conditional Probability?

Let us start with the primitives by understanding Conditional Probability with some examples.

Example 1

Consider you have a coin and fair dice. When you flip a coin, there is an equal chance of getting either a head or a tail. So you can say that the probability of getting heads or the probability of getting tails is 50%.

Now if you roll the fair dice, the probability of getting 1 out of the 6 numbers would be 1/6 = 0.166. The probability will also be the same for other numbers on the dice.

Example 2

Consider another example of playing cards. You are asked to pick a card from the deck. Can you guess the probability of getting a king given the card is a heart?

The given condition here is that the card is a heart, so the denominator has to be 13 (there are 13 hearts in a deck of cards) and not 52. Since there is only one king in hearts, so the probability that the card is a king given it is a heart is 1/13 = 0.077.

So when you say the conditional probability of A given B, it refers to the probability of the occurrence of A given that B has already occurred. This is a typical example of conditional probability.

Mathematically, the conditional probability of A given B can be defined as P(A AND B) / P(B).

Example 3

Let us see another slightly complicated example to understand conditional probability better.

Consider a school with a total population of 100 people. These 100 people can be classified as either ‘Students’ and ‘Teachers’ or as a population of ‘Males’ and ‘Females’.

With the table below of the 100 people tabulated in some form, what will be the conditional probability that a certain person of the school is a ‘Student’ given that she is a ‘Female’?


FemaleMaleTotal
Teacher101020
Student305080
Total4060100

To compute this, you can filter the sub-population of 40 females and focus only on the 30 female students. So the required probability stands as P(Student | Female) = 30/40 = 0.75 .

P(Student | Female) = [P(Student ∩ Female)] / [P(Female)= 30/40 = 0.75

This is defined as the intersection(∩) of Student(A) and Female(B) divided by Female(B). Similarly, the conditional probability of B given A can also be calculated using the same mathematical expression.

What is Bayes' Theorem?

Bayes' Theorem helps you examine the probability of an event based on the prior knowledge of any event that has correspondence to the former event. Its uses are mainly found in probability theory and statistics. The term naive is used in the sense that the features given to the model are not dependent on each other. In simple terms, if you change the value of one feature in the algorithm, it will not directly influence or change the value of the other features.

Consider for example the probability that the price of a house is high can be calculated better if we have some prior information like the facilities around it compared to another assessment made without the knowledge of the location of the house. 

P(A|B) = [P(B|A)P(A)]/[P(B)]

The equation above shows the basic representation of the Bayes' theorem where A and B are two events and:

P(A|B): The conditional probability that event A occurs, given that B has occurred. This is termed as the posterior probability. 

P(A) and P(B): The probability of A and B without any correspondence with each other. 

P(B|A):  The conditional probability of the occurrence of event B, given that A has occurred.

Now the question is how you can use Bayes' Theorem in your machine learning models. To understand it clearly, let us take an example. 

Consider a simple problem where you need to learn a machine learning model from a given set of attributes. Then you will have to describe a hypothesis or a relation to a response variable and then using this relation, you will have to predict a response, given the set of attributes you have. 

You can create a learner using Bayes' Theorem that can predict the probability of the response variable that will belong to the same class, given a new set of attributes. 

Consider the previous question again and then assume that A is the response variable and B is the given attribute. So according to the equation of Bayes' Theorem, we have:

P(A|B): The conditional probability of the response variable that belongs to a particular value, given the input attributes, also known as the posterior probability.

P(A): The prior probability of the response variable.

P(B): The probability of training data(input attributes) or the evidence.

P(B|A): This is termed as the likelihood of the training data.

The Bayes' Theorem can be reformulated in correspondence with the machine learning algorithm as:

posterior = (prior x likelihood) / (evidence)

Let’s look into another problem. Consider a situation where the number of attributes is n and the response is a Boolean value. i.e. Either True or False. The attributes are categorical (2 categories in this case). You need to train the classifier for all the values in the instance and the response space.

This example is practically not possible in most machine learning algorithms since you need to compute 2∗(2^n-1) parameters for learning this model.  This means for 30 boolean attributes, you will need to learn more than 3 billion parameters which is unrealistic.

What is a Naive Bayes Classifier?

A classifier is a machine learning model which is used to classify different objects based on certain behavior. Naive Bayes classifiers in machine learning are a family of simple probabilistic machine learning models that are based on Bayes' Theorem. In simple words, it is a classification technique with an assumption of independence among predictors.

The Naive Bayes classifier reduces the complexity of the Bayesian classifier by making an assumption of conditional dependence over the training dataset.

Consider you are given variables X, Y, and Z. X will be conditionally independent of Y given Z if and only if the probability distribution of X is independent of the value of Y given Z. This is the assumption of conditional dependence.

In other words, you can also say that X and Y are conditionally independent given Z if and only if, the knowledge of the occurrence of X provides no information on the likelihood of the occurrence of Y and vice versa, given that Z occurs. This assumption is the reason behind the term naive in Naive Bayes.

The likelihood can be written considering n different attributes as:

                n          
P(X₁...Xₙ|Y) = π P(Xᵢ|Y)
       i=1

In the mathematical expression, X represents the attributes, Y represents the response variable. So, P(X|Y) becomes equal to the product of the probability distribution of each attribute given Y.

Maximizing a Posteriori

If you want to find the posterior probability of P(Y|X) for multiple values of Y, you need to calculate the expression for all the different values of Y. 

Let us assume a new instance variable X_NEW. You need to calculate the probability that Y will take any value given the observed attributes of X_NEW and given the distributions P(Y) and P(X|Y) which are estimated from the training dataset. 

In order to predict the response variable depending on the different values obtained for P(Y|X), you need to consider a probable value or the maximum of the values. Hence, this method is known as maximizing a posteriori.

Maximizing Likelihood

You can simplify the Naive Bayes algorithm if you assume that the response variable is uniformly distributed which means that it is equally likely to get any response. The advantage of this assumption is that the a priori or the P(Y) becomes a constant value. 

Since the a priori and the evidence become independent from the response variable, they can be removed from the equation. So, maximizing the posteriori becomes maximizing the likelihood problem.

How to make predictions with a Naive Bayes model?

Consider a situation where you have 1000 fruits which are either ‘banana’ or ‘apple’ or ‘other’. These will be the possible classes of the variable Y.

The data for the following X variables all of which are in binary (0 and 1):

  • Long 
  • Sweet
  • Yellow

The training dataset will look like this:

FruitLong (x1)Sweet (x2)Yellow (x3)
Apple001
Banana101
Apple010
Other111
........

Now let us sum up the training dataset to form a count table as below:

TypeLongNot LongSweetNot sweetYellowNot YellowTotal
Banana40010035015045050500
Apple03001501503000300
Other1001001505050150200
Total5005006503508002001000

The main agenda of the classifier is to predict if a given fruit is a ‘Banana’ or an ‘Apple’ or ‘Other’ when the three attributes(long, sweet and yellow) are known.

Consider a case where you’re given that a fruit is long, sweet and yellow and you need to predict what type of fruit it is. This case is similar to the case where you need to predict Y only when the X attributes in the training dataset are known. You can easily solve this problem by using Naive Bayes.

The thing you need to do is to compute the 3 probabilities,i.e. the probability of being a banana or an apple or other. The one with the highest probability will be your answer. 

Step 1:

First of all, you need to compute the proportion of each fruit class out of all the fruits from the population which is the prior probability of each fruit class. 

The Prior probability can be calculated from the training dataset:

P(Y=Banana) = 500 / 1000 = 0.50

P(Y=Apple) = 300 / 1000 = 0.30

P(Y=Other) = 200 / 1000 = 0.20

The training dataset contains 1000 records. Out of which, you have 500 bananas, 300 apples and 200 others. So the priors are 0.5, 0.3 and 0.2 respectively. 

Step 2:

Secondly, you need to calculate the probability of evidence that goes into the denominator. It is simply the product of P of X’s for all X:

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65

P(x3=Yellow) = 800 / 1000 = 0.80

Step 3:

The third step is to compute the probability of likelihood of evidence which is nothing but the product of conditional probabilities of the 3 attributes. 

The Probability of Likelihood for Banana:

P(x1=Long | Y=Banana) = 400 / 500 = 0.80

P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70

P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

Therefore, the overall probability of likelihood for banana will be the product of the above three,i.e. 0.8 * 0.7 * 0.9 = 0.504.

Step 4:

The last step is to substitute all the 3 equations into the mathematical expression of Naive Bayes to get the probability.

P(Banana|Long,Sweet and Yellow)  =   [P(Long|Banana)∗P(Sweet|Banana)∗P(Yellow|Banana) x P(Banana)] /                              [P(Long)∗P(Sweet)∗P(Yellow)]
=  0.8∗0.7∗0.9∗0.5/[P(Evidence)= 0.252/[P(Evidence)]

P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0

P(Other|Long,Sweet and Yellow) = 0.01875/P(Evidence)

In a similar way, you can also compute the probabilities for ‘Apple’ and ‘Other’. The denominator is the same for all cases. 

Banana gets the highest probability, so that will be considered as the predicted class.

What are the types of Naive Bayes classifier?

The main types of Naive Bayes classifier are mentioned below:

  • Multinomial Naive Bayes — These types of classifiers are usually used for the problems of document classification.  It checks whether the document belongs to a particular category like sports or technology or political etc and then classifies them accordingly. The predictors used for classification in this technique are the frequency of words present in the document. 
  • Complement Naive Bayes — This is basically an adaptation of the multinomial naive bayes that is particularly suited for imbalanced datasets.  
  • Bernoulli Naive Bayes — This classifier is also analogous to multinomial naive bayes but instead of words, the predictors are Boolean values. The parameters used to predict the class variable accepts only yes or no values, for example, if a word occurs in the text or not. 
  • Out-of-Core Naive Bayes — This classifier is used to handle cases of large scale classification problems for which the complete training dataset might not fit in the memory. 
  • Gaussian Naive Bayes — In a Gaussian Naive Bayes, the predictors take a continuous value assuming that it has been sampled from a Gaussian Distribution. It is also called a Normal Distribution.

What are the types of Naive Bayes classifier

Since the likelihood of the features is assumed to be Gaussian, the conditional probability will change in the following manner:

P(xᵢ|y) = 1/(√2пσ²ᵧ) exp[ –(xᵢ - μᵧ)²/2σ²ᵧ]

What are the pros and cons of the Naive Bayes?

The naive Bayes algorithm has both its pros and its cons. 

Pros of Naive Bayes —

  • It is easy and fast to predict the class of the training data set. 
  • It performs well in multiclass prediction.
  • It performs better as compared to other models like logistic regression while assuming the independent variables.
  • It requires less training data. 
  • It performs better in the case of categorical input variables as compared to numerical variables.

Cons of Naive Bayes —

  • The model is not able to make a prediction in situations where the categorical variable has a category that was not observed in the training data set and assigns a 0 (zero) probability to it. This is known as the ‘Zero Frequency’. You can solve this using the Laplace estimation.
  • Since Naive Bayes is considered to be a bad estimator, the probability outputs are not taken seriously.
  • Naive Bayes works on the principle of assumption of independent predictors, but it is practically impossible to get a set of predictors that are completely independent.

What is Laplace Correction?

When you have a model with a lot of attributes, it is possible that the entire probability might become zero because one of the feature’s values is zero. To overcome this situation, you can increase the count of the variable with zero to a small value like in the numerator so that the overall probability doesn’t come as zero. 

This type of correction is called the Laplace Correction. Usually, all naive Bayes models use this implementation as a parameter.

What are the applications of Naive Bayes? 

There are a lot of real-life applications of the Naive Bayes classifier, some of which are mentioned below:

  • Real-time prediction — It is a fast and eager machine learning classifier, so it is used for making predictions in real-time. 
  • Multi-class prediction — It can predict the probability of multiple classes of the target variable. 
  • Text Classification/ Spam Filtering / Sentiment Analysis — They are mostly used in text classification problems because of its multi-class problems and the independence rule. They are used for identifying spam emails and also to identify negative and positive customer sentiments on social platforms. 
  • Recommendation Systems — A Recommendation system is built by combining Naive Bayes classifier and Collaborating Filtering. It filters unseen information and predicts whether the user would like a given resource or not using machine learning and data mining techniques. 

How to build a Naive Bayes Classifier in Python?

In Python, the Naive Bayes classifier is implemented in the scikit-learn library. Let us look into an example by importing the standard iris dataset to predict the Species of flowers:

# Import packages
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

# Import data
training = pd.read_csv('/content/iris_training.csv')
test = pd.read_csv('/content/iris_test.csv')

# Create the X, Y, Training and Test
X_Train = training.drop('Species', axis=1)
Y_Train = training.loc[:, 'Species']
X_Test = test.drop('Species', axis=1)
Y_Test = test.loc[:, 'Species']

# Init the Gaussian Classifier
model = GaussianNB()

# Train the model
model.fit(X_Train, Y_Train)

# Predict Output
pred = model.predict(X_Test)

# Plot Confusion Matrix
mat = confusion_matrix(pred, Y_Test)
names = np.unique(pred)
sns.heatmap(mat, square=True, annot=True, fmt='d', cbar=False,
        xticklabels=names, yticklabels=names)
plt.xlabel('Truth')
plt.ylabel('Predicted')

The output will be as follows:

Text(89.18, 0.5, 'Predicted')

How to build a Naive Bayes Classifier in Python

How to improve a Naive Bayes Model?

You can improve the power of a Naive Bayes model by following these tips:

  1. Transform variables using transformations like BoxCox and YeoJohnson to make continuous features to normal distribution.
  2. Use Laplace Correction for handling zero values in X variables and to predict the class of test data set for zero frequency issues. 
  3. Check for correlated features and remove the highly correlated ones because they are voted twice in the model and might lead to over inflation.
  4. Combine different features to make a new product which makes some intuitive sense. 
  5. Provide more realistic prior probabilities to the algorithm based on knowledge from business. Use ensemble methods like bagging and boosting to reduce the variance. 

Summary

Let us see what we have learned so far —

  • Naive Bayes and its types
  • Pros and Cons of Naive Bayes
  • Applications of Naive Bayes
  • How a Naive Bayes model makes the prediction
  • Creating a Naive Bayes classifier
  • Improving a Naive Bayes model

Naive Bayes is mostly used in real-world applications like sentiment analysis, spam filtering, recommendation systems, etc. They are extremely fast and easy to implement as compared to other machine learning models. However, the biggest drawback of Naive Bayes is the requirement of predictors to be independent. In most real-life cases, the predictors are dependent in nature which hinders the performance of the classifier.  

We have covered most of the topics related to algorithms in our series of machine learning blogs, click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technological advancements. Out of many of these, we have AI (Artificial Intelligence) and ML (Machine learning). The era of machines and robots are taking center stage and soon there will be a time when AI and ML will be an integral part of our lives. From automated cars to android systems in many phones, apps, and other electronic devices, AI and ML have a wide range of impact on how easy machines and AI can make our lives. The future of these technologies is quite promising; it is beyond our wildest imagination. So, there is already and will be a lot of demand for AI and ML professionals, known as AI and ML engineers. Before understanding the essential skills required to become an AI and ML engineer, we should understand what kind of job roles these two are. AI Engineer vs ML Engineer: Are they the same?Although they look the same, there are some subtle differences between AI and ML engineers. It boils down to the way they work and the software and languages they work on, to reach one common goal: Artificial Intelligence. Simply put, an AI engineer applies AI algorithms to solve real-life problems and building software. On similar terms, an ML engineer utilizes machine learning techniques in solving real-life problems and to build software. They enable computers to self-learn by giving them the thinking capability of humans. Like mentioned earlier, these two job roles get the same output using different methods. However, many top companies are hiring professionals skilled in working both on AI and ML. The capability of an astounding AI and ML engineer is reflected by both the technical and non-technical skills. Let us see what it takes to be one of these two professionals. Common skills for Artificial and Machine Learning Technical Skills 1. Programming Languages A good understanding of programming languages, preferably python, R, Java, Python, C++ is necessary. They are easy to learn, and their applications provide more scope than any other language. Python is the undisputed lingua franca of Machine Learning. 2. Linear Algebra, Calculus, Statistics It is recommended to have a good understanding of the concepts of Matrices, Vectors, and Matrix Multiplication. Moreover, knowledge in Derivatives and Integrals and their applications is essential to even understand simple concepts like gradient descent. Whereas statistical concepts like Mean, Standard Deviations, and Gaussian Distributions along with probability theory for algorithms like Naive Bayes, Gaussian Mixture Models, and Hidden Markov Models are necessary to thrive in the world of Artificial Intelligence and Machine Learning. 3. Signal Processing TechniquesA Machine Learning engineer should be competent in understanding Signal Processing and able to solve several problems using Signal Processing techniques because feature extraction is one of the most critical aspects of Machine Learning. Then we have Time-frequency Analysis and Advanced Signal Processing Algorithms like Wavelets, Shearlets, Curvelets, and Bandlets. A profound theoretical and practical knowledge of these will help you to solve complex situations. 4. Applied Math and AlgorithmsA solid foundation and expertise in algorithm theory is surely a must. This skill set will enable understanding subjects like Gradient Descent, Convex Optimization, Lagrange, Quadratic Programming, Partial Differential equation, and Summations. As tough as it may seem, Machine Learning and Artificial Intelligence are much more dependable on mathematics than how things are in, e.g. front-end development. 5. Neural Network ArchitecturesMachine Learning is used for complex tasks that are beyond human capability to code. Neural networks have been understood and proven to be by far the most precise way of countering many problems like Translation, Speech Recognition, and Image Classification, playing a pivotal role in the AI department. Non-Technical and Business skills 1. Communication Communication is the key in any line of work, AI/ML engineering is no exception. Explaining AI and ML concepts to even to a layman is only possible by communicating fluently and clearly. An AI and ML engineer does not work alone. Projects will involve working alongside a team of engineers and non-technical teams like the Marketing or Sales departments. So a good form of communication will help to translate the technical findings to the non-technical teams. Communication does not only mean speaking efficiently and clearly.2. Industry KnowledgeMachine learning projects that focus on major troubling issues are the ones that finish without any flaws. Irrespective of the industry an AI and ML engineer works for, profound knowledge of how the industry works and what benefits the business is the key ingredient to having a successful AI and ML career. Channeling all the technical skills productively is only possible when an AI and ML engineer possesses sound business expertise of the crucial aspects required to make a successful business model. Proper industry knowledge also facilitates in interpreting potential challenges and enabling the continual running of the business. 3. Rapid PrototypingIt is quite critical to keep working on the perfect idea with the minimum time consumed. Especially in Machine Learning, choosing the right model along with working on projects like A/B testing holds the key to a project’s success. Rapid Prototyping helps in forming an array of techniques to fasten building a scale model of a physical part. This is also true while assembling with three-dimensional computer-aided design, more so while working with 3D models Additional skills for Machine Learning 1. Language, Audio and Video ProcessingWith Natural Language Processing, AI and ML engineers get the chance to work with two of the foremost areas of work: Linguistics and Computer Science like text, audio, or video. An AI and ML engineer should be well versed with libraries like Gensim, NLTK, and techniques like word2vec, Sentimental Analysis, and Summarization 2. Physics, Reinforcement Learning, and Computer VisionPhysics: There will be real-world scenarios that require the application of machine learning techniques to systems, and that is when the knowledge of Physics comes into play. Reinforcement Learning: The year, 2017 witnessed Reinforcement Learning as the primary reason behind improving deep learning and artificial intelligence to a great extent. This will act as a helping hand to pave the way into the field of robotics, self-driving cars, or other lines of work in AI. Computer Vision: Computer Vision (CV) and Machine Learning are the two major computer science branches that can separately work and control very complex systems, systems that rely exclusively on CV and ML algorithms but can bring more output when the two work in tandem. 
Rated 4.5/5 based on 0 customer reviews
3591
10 Mandatory Skills to Become an AI & ML Engineer

The world has been evolving rapidly with technol... Read More

10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarming pace, generating a revenue of $3.03 billion in India alone. Even a 10% increase in data accessibility is said to result in over $65 million additional net income for the typical Fortune 1000 companies worldwide. The data scientist has been ranked the best job in the US for the 4th year in a row, with an average salary of $108,000; and the demand for more data scientists only seems to be growing. Who is a Data scientist?A data scientist is precisely someone who collects all the massive data that is available online, organizes the unstructured formats into bite-sized readable content, and analyses this to extract vital information about customer trends, thinking patterns, and behavior. This information is then used to create business goals or agendas that are aligned to the end-user/customer’s needs.  This outlines that a data scientist is someone with sound technical knowledge, interpersonal skills, strong business acumen, and most importantly, a passionate data enthusiast. Listed below are some mandatory skills that an aspiring data scientist must develop. 10 Mandatory Skills to Become a Data Scientist Technical Skills  1. Programming, Packages, and Software Since the first task of data scientists is to gather all the information or raw data and transform this into actionable insights, they need to have advanced knowledge in coding and statistical data processing. Some of the common programming languages used by data scientists are Python, R, SQL, NoSQL, Java, Scala, Hadoop, and many more.  2. Machine Learning and Deep LearningMachine Learning and Deep Learning are subsets of Artificial Intelligence (AI). Data science largely overlaps the growing field of AI, as data scientists use their potentials to clean, prepare, and extract data to run several AI applications. While machine learning enables supervised, unsupervised, and reinforced learning, deep learning helps in making datasets study and learn from existing information. A good example is the facial recognition feature in photos, doodling games like quick draw, and more. 3. Big Data Data Scientists are the best bridge between the vast pool of big data and emerging businesses. Big data analytics uses Hadoop or Spark to gather, distribute, and process various datasets. This is an important business trend that companies are using to predict customer tendencies and create a competitive edge.  4. NLP, Cloud Computing and othersNatural Language Processing (NLP), a branch of AI that uses the language used by human beings, processes it, and learns to respond accordingly. Several apps and voice-assisted devices like Alexa and Siri are already using this remarkable feature. As data scientists use large amounts of data stored on clouds, familiarity with cloud computing software like AWS, Azure, and Google cloud will be beneficial. Learning frameworks like DevOps can help data scientists streamline their work, along with several other such upcoming technologies. 5. Database management and visualizationWhile all the above skills deal with gathering and reading data, database management is related to data manipulation. In database management, the data clusters are edited, indexed, and manipulated to yield desirable outcomes or information. The next step to this transformed raw data is to present it in a visually comprehensible manner, which is nothing but data visualization. It includes graphical representation and other elements to make the data easily understandable even by a layman.  Non-technical Skills 6. Communication skills As explained above, once the raw data is processed, it needs to be presented understandably. This does not limit the job to just visually coherent information but also the ability to communicate the insights of these visual representations. The data scientist should be excellent at communicating the results to the marketing team, sales team, business leaders, and other stakeholders. 7. Team player This is related to the previous point. Along with effective communication skills, data scientists need to be good team players, accommodating feedback, and other inputs from business teams. They should also be able to efficiently communicate their requirements to the data engineers, data analysts, and other members of the team. Coordination with their team members can yield faster results and optimal outputs. 8. Business acumenSince the job of the data scientist ultimately boils down to improving/growing the business, they need to be able to think from a business perspective while outlining their data structures. They should have in-depth knowledge of the industry of their business, the existing business problems of their company, and forecasting potential business problems and their solutions. 9. Critical thinking Apart from finding insights, data scientists need to align these results with the business. They need to be able to frame appropriate questions and steps/solutions to solve business problems. This objective ability to analyse data and addressing the problem from multiple angles is crucial in a data scientist. 10. Intellectual curiosityAccording to Harvard Business Review, data scientists spend 80% of their time discovering and preparing data. For this, they must always be a step ahead and catch up with the latest trends. Constant upskilling and a curiosity to learn new ways to solve existing problems quicker can get data scientists a long way in their careers. Taking data-driven decisionsData science is indisputably one of the leading industries today. Whether you are from a technical field or a non-technical background, there are several ways to build up the skill to become a data scientist. From online courses to boot camps, one should always be a step ahead in this competitive field to build up their data work portfolios. Additionally, reading up on the latest technologies and regular experimentation with new trends is the way forward for aspirants. 
Rated 4.5/5 based on 0 customer reviews
3869
10 Mandatory Skills to Become a Data Scientist

The data science industry is growing at an alarmin... Read More

Fighting Covid-19 Using Data Science, AI, and Machine Learning

The world is suffering from a pandemic, the emergence of the novel Coronavirus has left the world in turbulence. COVID-19, the disease caused by the virus, has reached every corner of the world. As of April 24th, 2020, COVID-19 had taken the lives of 1,90,872, across at least 79 countries, including the United States and the United Kingdom. This makes the coronavirus’ total death toll more than that of its ‘cousin’ SARS (severe acute respiratory syndrome) virus in 2003 (774 total deaths) and ‘bird flu’ in 2013 (616 total deaths). So, how the world is handling such a critical condition? Let’s discuss how the world is fighting COVID-19 using Data Science, AI, and Machine Learning. We will look at the current trend of technology that the world is using to fight coronavirus.  Role of Technologies during corona pandemic:The coronavirus has spread across the world has affected more than 100 countries with more than 191K deaths. This resulted in nations across the world started fighting COVID-19 using AI and other technologies. Now, let us have a look at the use of Artificial Intelligence and various other technologies in tackling the pandemic.  Artificial Intelligence in Global Health Emergency Because of the wide-scale spread of the coronavirus, it has gotten important to screen traffic at open places, for example, air terminals, railroad stations, and other transportation centre points. It needs different observing apparatuses furnished with Computerized reasoning, AI, and warm sensors. These instruments can help check 200 individuals/minute. Also, they can perceive the internal heat level and can flag if it is more noteworthy than 37.3°. They can likewise be utilized to identify and isolate the presumes who may be COVID-19 positive. AI helps in the following ways: Automating Healthcare Processes Predicting the Survival Chances Using AI Drug Research Using AI Virus Research with Artificial Intelligence Let us look or glance at every single one of them in detail. Automating Healthcare ProcessesAs the instances of COVID-19 are expanding quickly, it gets important to play out the analysis of patients at the earliest opportunity. For COVID-19 positive patients, the normal side effect is pneumonia. It is normally distinguished by a CT sweep of the chest of the speculated patients.   Since there are a set number of clinical assets, machines outfitted with man-made consciousness and AI can help specialists to recognize the sickness rapidly and precisely and watch the patients with more consideration. For battling COVID-19 utilizing man-made consciousness viably, nations are robotizing their clinical procedures by utilizing machines furnished with man-made intelligence in all sections and leave focuses.Predicting Survival Chances Using AIFor dealing with such a basic circumstance, where a huge number of individuals are influenced, China has made a simulated intelligence instrument that predicts the endurance pace of patients. This computer-based intelligence instrument likewise helps in choosing the medicine to be given to the patient. Besides, it assists specialists with settling on better clinical choices for the treatment of COVID-19 patients. Additionally, researchers have assembled the AI frameworks to anticipate the infection of the patients. Thusly, alongside man-made reasoning, the world is battling COVID-19 with AI.Drug Research Using AI We are undependable from this novel illness until we create an immunization that can fix it. To locate an appropriate immunization or a viable medication for COVID-19, wellbeing organizations and researchers around the globe are investing their best amounts of energy into an investigation. It is in the testing of antibodies that computer-based intelligence comes into the image.    Through a huge number of tests directed with the assistance of simulated intelligence empowered instruments, scientists can demonstrate the viability of medication, and its results too. If it is prepared by people, at that point, it would take over 10 years and would include billions of dollars, which would be deadly in the present situation.Virus Research using Artificial IntelligenceAs of late, man-made brainpower has contributed a ton to innovative work in the social insurance area. Presently, in such a crisis, the need for man-made reasoning ascents all things get considered. To discover a remedy for the coronavirus, we must initially comprehend the conduct of the infection. For this, computer-based intelligence is helping us process many experiments on the infection in lesser time when contrasted with the time taken by manual preparing. It can recognize the malady and its degree of results. As of now, for battling COVID-19 utilizing Information Science, artificial intelligence, and AI, researchers and wellbeing scientists are working day and night.Big Data and Data Science The primary driver of the spread of the coronavirus is the absence of data about the beginning period indications. This has prompted a circumstance where individuals don't know that they are influenced. They venture out starting with one spot then onto the next with no piece of information that they are conveying the infection with them. Presently, the legislatures have begun gathering the data of residents, for example, their movement history and clinical records. This has brought about the assortment of colossal information of residents. Nations have just begun preparing this information with the assistance of Huge Information devices. The handling of the information of billions of residents includes expelling excess, scaling the information, and organizing it for additional utilization. This is just conceivable with the assistance of different basic devices of Large Information. After the assortment and preparation of such colossal information, the administration specialists examine and envision it. Here, by investigating the information and envisioning the patterns in it, Information Science enables the administrations to make appraises about the extent of further spread of the infection, the accessible clinical framework to concede influenced patients and the financial backing required for the entirety of this. With the assistance of these estimations, Information Science is helping the legislatures choose for clinical offices and money to spend on their residents. This is helping a ton in battling COVID-19 utilizing Information Science. To conclude This is the way the world is dealing with the worldwide health-related crisis and battling coronavirus with Information Science, Computerized reasoning, and huge information. Be that as it may, the endeavours of the legislatures and the wellbeing associations are still in a hurry as it is difficult to battle the coronavirus. Hence, if you are a specialist in Data Science,AI, or Information Science, this is the correct time to enter the field and help experts in battling COVID-19! 
Rated 4.5/5 based on 45 customer reviews
4391
Fighting Covid-19 Using Data Science, AI, and Mach...

The world is suffering from a pandemic, the emerge... Read More