Search

Machine learning Filter

What is Naive Bayes in Machine Learning

Naive Bayes is a simple but surprisingly powerful probabilistic machine learning algorithm used for predictive modeling and classification tasks. Some typical applications of Naive Bayes are spam filtering, sentiment prediction, classification of documents, etc. It is a popular algorithm mainly because it can be easily written in code and predictions can be made real quick which in turn increases the scalability of the solution. The Naive Bayes algorithm is traditionally considered the algorithm of choice for practical-based applications mostly in cases where instantaneous responses are required for user’s requests.It is based on the works of the Rev. Thomas Bayes and hence the name. Before starting off with Naive Bayes, it is important to learn about Bayesian learning, what is ‘Conditional Probability’ and ‘Bayes Rule’.Bayesian learning is a supervised learning technique where the goal is to build a model of the distribution of class labels that have a concrete definition of the target attribute. Naïve Bayes is based on applying Bayes' theorem with the naïve assumption of independence between each and every pair of features.What is Conditional Probability?Let us start with the primitives by understanding Conditional Probability with some examples.Example 1Consider you have a coin and fair dice. When you flip a coin, there is an equal chance of getting either a head or a tail. So you can say that the probability of getting heads or the probability of getting tails is 50%.Now if you roll the fair dice, the probability of getting 1 out of the 6 numbers would be 1/6 = 0.166. The probability will also be the same for other numbers on the dice.Example 2Consider another example of playing cards. You are asked to pick a card from the deck. Can you guess the probability of getting a king given the card is a heart?The given condition here is that the card is a heart, so the denominator has to be 13 (there are 13 hearts in a deck of cards) and not 52. Since there is only one king in hearts, so the probability that the card is a king given it is a heart is 1/13 = 0.077.So when you say the conditional probability of A given B, it refers to the probability of the occurrence of A given that B has already occurred. This is a typical example of conditional probability.Mathematically, the conditional probability of A given B can be defined as P(A AND B) / P(B).Example 3Let us see another slightly complicated example to understand conditional probability better.Consider a school with a total population of 100 people. These 100 people can be classified as either ‘Students’ and ‘Teachers’ or as a population of ‘Males’ and ‘Females’.With the table below of the 100 people tabulated in some form, what will be the conditional probability that a certain person of the school is a ‘Student’ given that she is a ‘Female’?FemaleMaleTotalTeacher101020Student305080Total4060100To compute this, you can filter the sub-population of 40 females and focus only on the 30 female students. So the required probability stands as P(Student | Female) = 30/40 = 0.75 .P(Student | Female) = [P(Student ∩ Female)] / [P(Female)] = 30/40 = 0.75This is defined as the intersection(∩) of Student(A) and Female(B) divided by Female(B). Similarly, the conditional probability of B given A can also be calculated using the same mathematical expression.What is Bayes' Theorem?Bayes' Theorem helps you examine the probability of an event based on the prior knowledge of any event that has correspondence to the former event. Its uses are mainly found in probability theory and statistics. The term naive is used in the sense that the features given to the model are not dependent on each other. In simple terms, if you change the value of one feature in the algorithm, it will not directly influence or change the value of the other features.Consider for example the probability that the price of a house is high can be calculated better if we have some prior information like the facilities around it compared to another assessment made without the knowledge of the location of the house. P(A|B) = [P(B|A)P(A)]/[P(B)]The equation above shows the basic representation of the Bayes' theorem where A and B are two events and:P(A|B): The conditional probability that event A occurs, given that B has occurred. This is termed as the posterior probability. P(A) and P(B): The probability of A and B without any correspondence with each other. P(B|A):  The conditional probability of the occurrence of event B, given that A has occurred.Now the question is how you can use Bayes' Theorem in your machine learning models. To understand it clearly, let us take an example. Consider a simple problem where you need to learn a machine learning model from a given set of attributes. Then you will have to describe a hypothesis or a relation to a response variable and then using this relation, you will have to predict a response, given the set of attributes you have. You can create a learner using Bayes' Theorem that can predict the probability of the response variable that will belong to the same class, given a new set of attributes. Consider the previous question again and then assume that A is the response variable and B is the given attribute. So according to the equation of Bayes' Theorem, we have:P(A|B): The conditional probability of the response variable that belongs to a particular value, given the input attributes, also known as the posterior probability.P(A): The prior probability of the response variable.P(B): The probability of training data(input attributes) or the evidence.P(B|A): This is termed as the likelihood of the training data.The Bayes' Theorem can be reformulated in correspondence with the machine learning algorithm as:posterior = (prior x likelihood) / (evidence)Let’s look into another problem. Consider a situation where the number of attributes is n and the response is a Boolean value. i.e. Either True or False. The attributes are categorical (2 categories in this case). You need to train the classifier for all the values in the instance and the response space.This example is practically not possible in most machine learning algorithms since you need to compute 2∗(2^n-1) parameters for learning this model.  This means for 30 boolean attributes, you will need to learn more than 3 billion parameters which is unrealistic.What is a Naive Bayes Classifier?A classifier is a machine learning model which is used to classify different objects based on certain behavior. Naive Bayes classifiers in machine learning are a family of simple probabilistic machine learning models that are based on Bayes' Theorem. In simple words, it is a classification technique with an assumption of independence among predictors.The Naive Bayes classifier reduces the complexity of the Bayesian classifier by making an assumption of conditional dependence over the training dataset.Consider you are given variables X, Y, and Z. X will be conditionally independent of Y given Z if and only if the probability distribution of X is independent of the value of Y given Z. This is the assumption of conditional dependence.In other words, you can also say that X and Y are conditionally independent given Z if and only if, the knowledge of the occurrence of X provides no information on the likelihood of the occurrence of Y and vice versa, given that Z occurs. This assumption is the reason behind the term naive in Naive Bayes.The likelihood can be written considering n different attributes as:                n           P(X₁...Xₙ|Y) = π P(Xᵢ|Y)        i=1In the mathematical expression, X represents the attributes, Y represents the response variable. So, P(X|Y) becomes equal to the product of the probability distribution of each attribute given Y.Maximizing a PosterioriIf you want to find the posterior probability of P(Y|X) for multiple values of Y, you need to calculate the expression for all the different values of Y. Let us assume a new instance variable X_NEW. You need to calculate the probability that Y will take any value given the observed attributes of X_NEW and given the distributions P(Y) and P(X|Y) which are estimated from the training dataset. In order to predict the response variable depending on the different values obtained for P(Y|X), you need to consider a probable value or the maximum of the values. Hence, this method is known as maximizing a posteriori.Maximizing LikelihoodYou can simplify the Naive Bayes algorithm if you assume that the response variable is uniformly distributed which means that it is equally likely to get any response. The advantage of this assumption is that the a priori or the P(Y) becomes a constant value. Since the a priori and the evidence become independent from the response variable, they can be removed from the equation. So, maximizing the posteriori becomes maximizing the likelihood problem.How to make predictions with a Naive Bayes model?Consider a situation where you have 1000 fruits which are either ‘banana’ or ‘apple’ or ‘other’. These will be the possible classes of the variable Y.The data for the following X variables all of which are in binary (0 and 1):Long SweetYellowThe training dataset will look like this:FruitLong (x1)Sweet (x2)Yellow (x3)Apple001Banana101Apple010Other111........Now let us sum up the training dataset to form a count table as below:TypeLongNot LongSweetNot sweetYellowNot YellowTotalBanana40010035015045050500Apple03001501503000300Other1001001505050150200Total5005006503508002001000The main agenda of the classifier is to predict if a given fruit is a ‘Banana’ or an ‘Apple’ or ‘Other’ when the three attributes(long, sweet and yellow) are known.Consider a case where you’re given that a fruit is long, sweet and yellow and you need to predict what type of fruit it is. This case is similar to the case where you need to predict Y only when the X attributes in the training dataset are known. You can easily solve this problem by using Naive Bayes.The thing you need to do is to compute the 3 probabilities,i.e. the probability of being a banana or an apple or other. The one with the highest probability will be your answer. Step 1:First of all, you need to compute the proportion of each fruit class out of all the fruits from the population which is the prior probability of each fruit class. The Prior probability can be calculated from the training dataset:P(Y=Banana) = 500 / 1000 = 0.50P(Y=Apple) = 300 / 1000 = 0.30P(Y=Other) = 200 / 1000 = 0.20The training dataset contains 1000 records. Out of which, you have 500 bananas, 300 apples and 200 others. So the priors are 0.5, 0.3 and 0.2 respectively. Step 2:Secondly, you need to calculate the probability of evidence that goes into the denominator. It is simply the product of P of X’s for all X:P(x1=Long) = 500 / 1000 = 0.50P(x2=Sweet) = 650 / 1000 = 0.65P(x3=Yellow) = 800 / 1000 = 0.80Step 3:The third step is to compute the probability of likelihood of evidence which is nothing but the product of conditional probabilities of the 3 attributes. The Probability of Likelihood for Banana:P(x1=Long | Y=Banana) = 400 / 500 = 0.80P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90Therefore, the overall probability of likelihood for banana will be the product of the above three,i.e. 0.8 * 0.7 * 0.9 = 0.504.Step 4:The last step is to substitute all the 3 equations into the mathematical expression of Naive Bayes to get the probability.P(Banana|Long,Sweet and Yellow)  =   [P(Long|Banana)∗P(Sweet|Banana)∗P(Yellow|Banana) x P(Banana)] /                              [P(Long)∗P(Sweet)∗P(Yellow)]=  0.8∗0.7∗0.9∗0.5/[P(Evidence)] = 0.252/[P(Evidence)]P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0P(Other|Long,Sweet and Yellow) = 0.01875/P(Evidence)In a similar way, you can also compute the probabilities for ‘Apple’ and ‘Other’. The denominator is the same for all cases. Banana gets the highest probability, so that will be considered as the predicted class.What are the types of Naive Bayes classifier?The main types of Naive Bayes classifier are mentioned below:Multinomial Naive Bayes — These types of classifiers are usually used for the problems of document classification.  It checks whether the document belongs to a particular category like sports or technology or political etc and then classifies them accordingly. The predictors used for classification in this technique are the frequency of words present in the document. Complement Naive Bayes — This is basically an adaptation of the multinomial naive bayes that is particularly suited for imbalanced datasets.  Bernoulli Naive Bayes — This classifier is also analogous to multinomial naive bayes but instead of words, the predictors are Boolean values. The parameters used to predict the class variable accepts only yes or no values, for example, if a word occurs in the text or not. Out-of-Core Naive Bayes — This classifier is used to handle cases of large scale classification problems for which the complete training dataset might not fit in the memory. Gaussian Naive Bayes — In a Gaussian Naive Bayes, the predictors take a continuous value assuming that it has been sampled from a Gaussian Distribution. It is also called a Normal Distribution.Since the likelihood of the features is assumed to be Gaussian, the conditional probability will change in the following manner:P(xᵢ|y) = 1/(√2пσ²ᵧ) exp[ –(xᵢ - μᵧ)²/2σ²ᵧ]What are the pros and cons of the Naive Bayes?The naive Bayes algorithm has both its pros and its cons. Pros of Naive Bayes —It is easy and fast to predict the class of the training data set. It performs well in multiclass prediction.It performs better as compared to other models like logistic regression while assuming the independent variables.It requires less training data. It performs better in the case of categorical input variables as compared to numerical variables.Cons of Naive Bayes —The model is not able to make a prediction in situations where the categorical variable has a category that was not observed in the training data set and assigns a 0 (zero) probability to it. This is known as the ‘Zero Frequency’. You can solve this using the Laplace estimation.Since Naive Bayes is considered to be a bad estimator, the probability outputs are not taken seriously.Naive Bayes works on the principle of assumption of independent predictors, but it is practically impossible to get a set of predictors that are completely independent.What is Laplace Correction?When you have a model with a lot of attributes, it is possible that the entire probability might become zero because one of the feature’s values is zero. To overcome this situation, you can increase the count of the variable with zero to a small value like in the numerator so that the overall probability doesn’t come as zero. This type of correction is called the Laplace Correction. Usually, all naive Bayes models use this implementation as a parameter.What are the applications of Naive Bayes? There are a lot of real-life applications of the Naive Bayes classifier, some of which are mentioned below:Real-time prediction — It is a fast and eager machine learning classifier, so it is used for making predictions in real-time. Multi-class prediction — It can predict the probability of multiple classes of the target variable. Text Classification/ Spam Filtering / Sentiment Analysis — They are mostly used in text classification problems because of its multi-class problems and the independence rule. They are used for identifying spam emails and also to identify negative and positive customer sentiments on social platforms. Recommendation Systems — A Recommendation system is built by combining Naive Bayes classifier and Collaborating Filtering. It filters unseen information and predicts whether the user would like a given resource or not using machine learning and data mining techniques. How to build a Naive Bayes Classifier in Python?In Python, the Naive Bayes classifier is implemented in the scikit-learn library. Let us look into an example by importing the standard iris dataset to predict the Species of flowers:# Import packages from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns; sns.set() # Import data training = pd.read_csv('/content/iris_training.csv') test = pd.read_csv('/content/iris_test.csv') # Create the X, Y, Training and Test X_Train = training.drop('Species', axis=1) Y_Train = training.loc[:, 'Species'] X_Test = test.drop('Species', axis=1) Y_Test = test.loc[:, 'Species'] # Init the Gaussian Classifier model = GaussianNB() # Train the model model.fit(X_Train, Y_Train) # Predict Output pred = model.predict(X_Test) # Plot Confusion Matrix mat = confusion_matrix(pred, Y_Test) names = np.unique(pred) sns.heatmap(mat, square=True, annot=True, fmt='d', cbar=False,         xticklabels=names, yticklabels=names) plt.xlabel('Truth') plt.ylabel('Predicted')The output will be as follows:Text(89.18, 0.5, 'Predicted')How to improve a Naive Bayes Model?You can improve the power of a Naive Bayes model by following these tips:Transform variables using transformations like BoxCox and YeoJohnson to make continuous features to normal distribution.Use Laplace Correction for handling zero values in X variables and to predict the class of test data set for zero frequency issues. Check for correlated features and remove the highly correlated ones because they are voted twice in the model and might lead to over inflation.Combine different features to make a new product which makes some intuitive sense. Provide more realistic prior probabilities to the algorithm based on knowledge from business. Use ensemble methods like bagging and boosting to reduce the variance. SummaryLet us see what we have learned so far —Naive Bayes and its typesPros and Cons of Naive BayesApplications of Naive BayesHow a Naive Bayes model makes the predictionCreating a Naive Bayes classifierImproving a Naive Bayes modelNaive Bayes is mostly used in real-world applications like sentiment analysis, spam filtering, recommendation systems, etc. They are extremely fast and easy to implement as compared to other machine learning models. However, the biggest drawback of Naive Bayes is the requirement of predictors to be independent. In most real-life cases, the predictors are dependent in nature which hinders the performance of the classifier.  We have covered most of the topics related to algorithms in our series of machine learning blogs, click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.

What is Naive Bayes in Machine Learning

5587
What is Naive Bayes in Machine Learning

Naive Bayes is a simple but surprisingly powerful probabilistic machine learning algorithm used for predictive modeling and classification tasks. Some typical applications of Naive Bayes are spam filtering, sentiment prediction, classification of documents, etc. It is a popular algorithm mainly because it can be easily written in code and predictions can be made real quick which in turn increases the scalability of the solution. The Naive Bayes algorithm is traditionally considered the algorithm of choice for practical-based applications mostly in cases where instantaneous responses are required for user’s requests.

It is based on the works of the Rev. Thomas Bayes and hence the name. Before starting off with Naive Bayes, it is important to learn about Bayesian learning, what is ‘Conditional Probability’ and ‘Bayes Rule’.

What is Naive Bayes in Machine Learning

Bayesian learning is a supervised learning technique where the goal is to build a model of the distribution of class labels that have a concrete definition of the target attribute. Naïve Bayes is based on applying Bayes' theorem with the naïve assumption of independence between each and every pair of features.

What is Conditional Probability?

Let us start with the primitives by understanding Conditional Probability with some examples.

Example 1

Consider you have a coin and fair dice. When you flip a coin, there is an equal chance of getting either a head or a tail. So you can say that the probability of getting heads or the probability of getting tails is 50%.

Now if you roll the fair dice, the probability of getting 1 out of the 6 numbers would be 1/6 = 0.166. The probability will also be the same for other numbers on the dice.

Example 2

Consider another example of playing cards. You are asked to pick a card from the deck. Can you guess the probability of getting a king given the card is a heart?

The given condition here is that the card is a heart, so the denominator has to be 13 (there are 13 hearts in a deck of cards) and not 52. Since there is only one king in hearts, so the probability that the card is a king given it is a heart is 1/13 = 0.077.

So when you say the conditional probability of A given B, it refers to the probability of the occurrence of A given that B has already occurred. This is a typical example of conditional probability.

Mathematically, the conditional probability of A given B can be defined as P(A AND B) / P(B).

Example 3

Let us see another slightly complicated example to understand conditional probability better.

Consider a school with a total population of 100 people. These 100 people can be classified as either ‘Students’ and ‘Teachers’ or as a population of ‘Males’ and ‘Females’.

With the table below of the 100 people tabulated in some form, what will be the conditional probability that a certain person of the school is a ‘Student’ given that she is a ‘Female’?


FemaleMaleTotal
Teacher101020
Student305080
Total4060100

To compute this, you can filter the sub-population of 40 females and focus only on the 30 female students. So the required probability stands as P(Student | Female) = 30/40 = 0.75 .

P(Student | Female) = [P(Student ∩ Female)] / [P(Female)= 30/40 = 0.75

This is defined as the intersection(∩) of Student(A) and Female(B) divided by Female(B). Similarly, the conditional probability of B given A can also be calculated using the same mathematical expression.

What is Bayes' Theorem?

Bayes' Theorem helps you examine the probability of an event based on the prior knowledge of any event that has correspondence to the former event. Its uses are mainly found in probability theory and statistics. The term naive is used in the sense that the features given to the model are not dependent on each other. In simple terms, if you change the value of one feature in the algorithm, it will not directly influence or change the value of the other features.

Consider for example the probability that the price of a house is high can be calculated better if we have some prior information like the facilities around it compared to another assessment made without the knowledge of the location of the house. 

P(A|B) = [P(B|A)P(A)]/[P(B)]

The equation above shows the basic representation of the Bayes' theorem where A and B are two events and:

P(A|B): The conditional probability that event A occurs, given that B has occurred. This is termed as the posterior probability. 

P(A) and P(B): The probability of A and B without any correspondence with each other. 

P(B|A):  The conditional probability of the occurrence of event B, given that A has occurred.

Now the question is how you can use Bayes' Theorem in your machine learning models. To understand it clearly, let us take an example. 

Consider a simple problem where you need to learn a machine learning model from a given set of attributes. Then you will have to describe a hypothesis or a relation to a response variable and then using this relation, you will have to predict a response, given the set of attributes you have. 

You can create a learner using Bayes' Theorem that can predict the probability of the response variable that will belong to the same class, given a new set of attributes. 

Consider the previous question again and then assume that A is the response variable and B is the given attribute. So according to the equation of Bayes' Theorem, we have:

P(A|B): The conditional probability of the response variable that belongs to a particular value, given the input attributes, also known as the posterior probability.

P(A): The prior probability of the response variable.

P(B): The probability of training data(input attributes) or the evidence.

P(B|A): This is termed as the likelihood of the training data.

The Bayes' Theorem can be reformulated in correspondence with the machine learning algorithm as:

posterior = (prior x likelihood) / (evidence)

Let’s look into another problem. Consider a situation where the number of attributes is n and the response is a Boolean value. i.e. Either True or False. The attributes are categorical (2 categories in this case). You need to train the classifier for all the values in the instance and the response space.

This example is practically not possible in most machine learning algorithms since you need to compute 2∗(2^n-1) parameters for learning this model.  This means for 30 boolean attributes, you will need to learn more than 3 billion parameters which is unrealistic.

What is a Naive Bayes Classifier?

A classifier is a machine learning model which is used to classify different objects based on certain behavior. Naive Bayes classifiers in machine learning are a family of simple probabilistic machine learning models that are based on Bayes' Theorem. In simple words, it is a classification technique with an assumption of independence among predictors.

The Naive Bayes classifier reduces the complexity of the Bayesian classifier by making an assumption of conditional dependence over the training dataset.

Consider you are given variables X, Y, and Z. X will be conditionally independent of Y given Z if and only if the probability distribution of X is independent of the value of Y given Z. This is the assumption of conditional dependence.

In other words, you can also say that X and Y are conditionally independent given Z if and only if, the knowledge of the occurrence of X provides no information on the likelihood of the occurrence of Y and vice versa, given that Z occurs. This assumption is the reason behind the term naive in Naive Bayes.

The likelihood can be written considering n different attributes as:

                n          
P(X₁...Xₙ|Y) = π P(Xᵢ|Y)
       i=1

In the mathematical expression, X represents the attributes, Y represents the response variable. So, P(X|Y) becomes equal to the product of the probability distribution of each attribute given Y.

Maximizing a Posteriori

If you want to find the posterior probability of P(Y|X) for multiple values of Y, you need to calculate the expression for all the different values of Y. 

Let us assume a new instance variable X_NEW. You need to calculate the probability that Y will take any value given the observed attributes of X_NEW and given the distributions P(Y) and P(X|Y) which are estimated from the training dataset. 

In order to predict the response variable depending on the different values obtained for P(Y|X), you need to consider a probable value or the maximum of the values. Hence, this method is known as maximizing a posteriori.

Maximizing Likelihood

You can simplify the Naive Bayes algorithm if you assume that the response variable is uniformly distributed which means that it is equally likely to get any response. The advantage of this assumption is that the a priori or the P(Y) becomes a constant value. 

Since the a priori and the evidence become independent from the response variable, they can be removed from the equation. So, maximizing the posteriori becomes maximizing the likelihood problem.

How to make predictions with a Naive Bayes model?

Consider a situation where you have 1000 fruits which are either ‘banana’ or ‘apple’ or ‘other’. These will be the possible classes of the variable Y.

The data for the following X variables all of which are in binary (0 and 1):

  • Long 
  • Sweet
  • Yellow

The training dataset will look like this:

FruitLong (x1)Sweet (x2)Yellow (x3)
Apple001
Banana101
Apple010
Other111
........

Now let us sum up the training dataset to form a count table as below:

TypeLongNot LongSweetNot sweetYellowNot YellowTotal
Banana40010035015045050500
Apple03001501503000300
Other1001001505050150200
Total5005006503508002001000

The main agenda of the classifier is to predict if a given fruit is a ‘Banana’ or an ‘Apple’ or ‘Other’ when the three attributes(long, sweet and yellow) are known.

Consider a case where you’re given that a fruit is long, sweet and yellow and you need to predict what type of fruit it is. This case is similar to the case where you need to predict Y only when the X attributes in the training dataset are known. You can easily solve this problem by using Naive Bayes.

The thing you need to do is to compute the 3 probabilities,i.e. the probability of being a banana or an apple or other. The one with the highest probability will be your answer. 

Step 1:

First of all, you need to compute the proportion of each fruit class out of all the fruits from the population which is the prior probability of each fruit class. 

The Prior probability can be calculated from the training dataset:

P(Y=Banana) = 500 / 1000 = 0.50

P(Y=Apple) = 300 / 1000 = 0.30

P(Y=Other) = 200 / 1000 = 0.20

The training dataset contains 1000 records. Out of which, you have 500 bananas, 300 apples and 200 others. So the priors are 0.5, 0.3 and 0.2 respectively. 

Step 2:

Secondly, you need to calculate the probability of evidence that goes into the denominator. It is simply the product of P of X’s for all X:

P(x1=Long) = 500 / 1000 = 0.50

P(x2=Sweet) = 650 / 1000 = 0.65

P(x3=Yellow) = 800 / 1000 = 0.80

Step 3:

The third step is to compute the probability of likelihood of evidence which is nothing but the product of conditional probabilities of the 3 attributes. 

The Probability of Likelihood for Banana:

P(x1=Long | Y=Banana) = 400 / 500 = 0.80

P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70

P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90

Therefore, the overall probability of likelihood for banana will be the product of the above three,i.e. 0.8 * 0.7 * 0.9 = 0.504.

Step 4:

The last step is to substitute all the 3 equations into the mathematical expression of Naive Bayes to get the probability.

P(Banana|Long,Sweet and Yellow)  =   [P(Long|Banana)∗P(Sweet|Banana)∗P(Yellow|Banana) x P(Banana)] /                              [P(Long)∗P(Sweet)∗P(Yellow)]
=  0.8∗0.7∗0.9∗0.5/[P(Evidence)= 0.252/[P(Evidence)]

P(Apple|Long,Sweet and Yellow) = 0, because P(Long|Apple) = 0

P(Other|Long,Sweet and Yellow) = 0.01875/P(Evidence)

In a similar way, you can also compute the probabilities for ‘Apple’ and ‘Other’. The denominator is the same for all cases. 

Banana gets the highest probability, so that will be considered as the predicted class.

What are the types of Naive Bayes classifier?

The main types of Naive Bayes classifier are mentioned below:

  • Multinomial Naive Bayes — These types of classifiers are usually used for the problems of document classification.  It checks whether the document belongs to a particular category like sports or technology or political etc and then classifies them accordingly. The predictors used for classification in this technique are the frequency of words present in the document. 
  • Complement Naive Bayes — This is basically an adaptation of the multinomial naive bayes that is particularly suited for imbalanced datasets.  
  • Bernoulli Naive Bayes — This classifier is also analogous to multinomial naive bayes but instead of words, the predictors are Boolean values. The parameters used to predict the class variable accepts only yes or no values, for example, if a word occurs in the text or not. 
  • Out-of-Core Naive Bayes — This classifier is used to handle cases of large scale classification problems for which the complete training dataset might not fit in the memory. 
  • Gaussian Naive Bayes — In a Gaussian Naive Bayes, the predictors take a continuous value assuming that it has been sampled from a Gaussian Distribution. It is also called a Normal Distribution.

What are the types of Naive Bayes classifier

Since the likelihood of the features is assumed to be Gaussian, the conditional probability will change in the following manner:

P(xᵢ|y) = 1/(√2пσ²ᵧ) exp[ –(xᵢ - μᵧ)²/2σ²ᵧ]

What are the pros and cons of the Naive Bayes?

The naive Bayes algorithm has both its pros and its cons. 

Pros of Naive Bayes —

  • It is easy and fast to predict the class of the training data set. 
  • It performs well in multiclass prediction.
  • It performs better as compared to other models like logistic regression while assuming the independent variables.
  • It requires less training data. 
  • It performs better in the case of categorical input variables as compared to numerical variables.

Cons of Naive Bayes —

  • The model is not able to make a prediction in situations where the categorical variable has a category that was not observed in the training data set and assigns a 0 (zero) probability to it. This is known as the ‘Zero Frequency’. You can solve this using the Laplace estimation.
  • Since Naive Bayes is considered to be a bad estimator, the probability outputs are not taken seriously.
  • Naive Bayes works on the principle of assumption of independent predictors, but it is practically impossible to get a set of predictors that are completely independent.

What is Laplace Correction?

When you have a model with a lot of attributes, it is possible that the entire probability might become zero because one of the feature’s values is zero. To overcome this situation, you can increase the count of the variable with zero to a small value like in the numerator so that the overall probability doesn’t come as zero. 

This type of correction is called the Laplace Correction. Usually, all naive Bayes models use this implementation as a parameter.

What are the applications of Naive Bayes? 

There are a lot of real-life applications of the Naive Bayes classifier, some of which are mentioned below:

  • Real-time prediction — It is a fast and eager machine learning classifier, so it is used for making predictions in real-time. 
  • Multi-class prediction — It can predict the probability of multiple classes of the target variable. 
  • Text Classification/ Spam Filtering / Sentiment Analysis — They are mostly used in text classification problems because of its multi-class problems and the independence rule. They are used for identifying spam emails and also to identify negative and positive customer sentiments on social platforms. 
  • Recommendation Systems — A Recommendation system is built by combining Naive Bayes classifier and Collaborating Filtering. It filters unseen information and predicts whether the user would like a given resource or not using machine learning and data mining techniques. 

How to build a Naive Bayes Classifier in Python?

In Python, the Naive Bayes classifier is implemented in the scikit-learn library. Let us look into an example by importing the standard iris dataset to predict the Species of flowers:

# Import packages
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

# Import data
training = pd.read_csv('/content/iris_training.csv')
test = pd.read_csv('/content/iris_test.csv')

# Create the X, Y, Training and Test
X_Train = training.drop('Species', axis=1)
Y_Train = training.loc[:, 'Species']
X_Test = test.drop('Species', axis=1)
Y_Test = test.loc[:, 'Species']

# Init the Gaussian Classifier
model = GaussianNB()

# Train the model
model.fit(X_Train, Y_Train)

# Predict Output
pred = model.predict(X_Test)

# Plot Confusion Matrix
mat = confusion_matrix(pred, Y_Test)
names = np.unique(pred)
sns.heatmap(mat, square=True, annot=True, fmt='d', cbar=False,
        xticklabels=names, yticklabels=names)
plt.xlabel('Truth')
plt.ylabel('Predicted')

The output will be as follows:

Text(89.18, 0.5, 'Predicted')

How to build a Naive Bayes Classifier in Python

How to improve a Naive Bayes Model?

You can improve the power of a Naive Bayes model by following these tips:

  1. Transform variables using transformations like BoxCox and YeoJohnson to make continuous features to normal distribution.
  2. Use Laplace Correction for handling zero values in X variables and to predict the class of test data set for zero frequency issues. 
  3. Check for correlated features and remove the highly correlated ones because they are voted twice in the model and might lead to over inflation.
  4. Combine different features to make a new product which makes some intuitive sense. 
  5. Provide more realistic prior probabilities to the algorithm based on knowledge from business. Use ensemble methods like bagging and boosting to reduce the variance. 

Summary

Let us see what we have learned so far —

  • Naive Bayes and its types
  • Pros and Cons of Naive Bayes
  • Applications of Naive Bayes
  • How a Naive Bayes model makes the prediction
  • Creating a Naive Bayes classifier
  • Improving a Naive Bayes model

Naive Bayes is mostly used in real-world applications like sentiment analysis, spam filtering, recommendation systems, etc. They are extremely fast and easy to implement as compared to other machine learning models. However, the biggest drawback of Naive Bayes is the requirement of predictors to be independent. In most real-life cases, the predictors are dependent in nature which hinders the performance of the classifier.  

We have covered most of the topics related to algorithms in our series of machine learning blogs, click here. If you are inspired by the opportunities provided by machine learning, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.

Priyankur

Priyankur Sarkar

Data Science Enthusiast

Priyankur Sarkar loves to play with data and get insightful results out of it, then turn those data insights and results in business growth. He is an electronics engineer with a versatile experience as an individual contributor and leading teams, and has actively worked towards building Machine Learning capabilities for organizations.

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

Role of Unstructured Data in Data Science

Data has become the new game changer for businesses. Typically, data scientists categorize data into three broad divisions - structured, semi-structured, and unstructured data. In this article, you will get to know about unstructured data, sources of unstructured data, unstructured data vs. structured data, the use of structured and unstructured data in machine learning, and the difference between structured and unstructured data. Let us first understand what is unstructured data with examples. What is unstructured data? Unstructured data is a kind of data format where there is no organized form or type of data. Videos, texts, images, document files, audio materials, email contents and more are considered to be unstructured data. It is the most copious form of business data, and cannot be stored in a structured database or relational database. Some examples of unstructured data are the photos we post on social media platforms, the tagging we do, the multimedia files we upload, and the documents we share. Seagate predicts that the global data-sphere will expand to 163 zettabytes by 2025, where most of the data will be in the unstructured format. Characteristics of Unstructured DataUnstructured data cannot be organized in a predefined fashion, and is not a homogenous data model. This makes it difficult to manage. Apart from that, these are the other characteristics of unstructured data. You cannot store unstructured data in the form of rows and columns as we do in a database table. Unstructured data is heterogeneous in structure and does not have any specific data model. The creation of such data does not follow any semantics or habits. Due to the lack of any particular sequence or format, it is difficult to manage. Such data does not have an identifiable structure. Sources of Unstructured Data There are various sources of unstructured data. Some of them are: Content websites Social networking sites Online images Memos Reports and research papers Documents, spreadsheets, and presentations Audio mining, chatbots Surveys Feedback systems Advantages of Unstructured Data Unstructured data has become exceptionally easy to store because of MongoDB, Cassandra, or even using JSON. Modern NoSQL databases and software allows data engineers to collect and extract data from various sources. There are numerous benefits that enterprises and businesses can gain from unstructured data. These are: With the advent of unstructured data, we can store data that lacks a proper format or structure. There is no fixed schema or data structure for storing such data, which gives flexibility in storing data of different genres. Unstructured data is much more portable by nature. Unstructured data is scalable and flexible to store. Database systems like MongoDB, Cassandra, etc., can easily handle the heterogeneous properties of unstructured data. Different applications and platforms produce unstructured data that becomes useful in business intelligence, unstructured data analytics, and various other fields. Unstructured data analysis allows finding comprehensive data stories from data like email contents, website information, social media posts, mobile data, cache files and more. Unstructured data, along with data analytics, helps companies improve customer experience. Detection of the taste of consumers and their choices becomes easy because of unstructured data analysis. Disadvantages of Unstructured data Storing and managing unstructured data is difficult because there is no proper structure or schema. Data indexing is also a substantial challenge and hence becomes unclear due to its disorganized nature. Search results from an unstructured dataset are also not accurate because it does not have predefined attributes. Data security is also a challenge due to the heterogeneous form of data. Problems faced and solutions for storing unstructured data. Until recently, it was challenging to store, evaluate, and manage unstructured data. But with the advent of modern data analysis tools, algorithms, CAS (content addressable storage system), and big data technologies, storage and evaluation became easy. Let us first take a look at the various challenges used for storing unstructured data. Storing unstructured data requires a large amount of space. Indexing of unstructured data is a hectic task. Database operations such as deleting and updating become difficult because of the disorganized nature of the data. Storing and managing video, audio, image file, emails, social media data is also challenging. Unstructured data increases the storage cost. For solving such issues, there are some particular approaches. These are: CAS system helps in storing unstructured data efficiently. We can preserve unstructured data in XML format. Developers can store unstructured data in an RDBMS system supporting BLOB. We can convert unstructured data into flexible formats so that evaluating and storage becomes easy. Let us now understand the differences between unstructured data vs. structured data. Unstructured Data Vs. Structured Data In this section, we will understand the difference between structured and unstructured data with examples. STRUCTUREDUNSTRUCTUREDStructured data resides in an organized format in a typical database.Unstructured data cannot reside in an organized format, and hence we cannot store it in a typical database.We can store structured data in SQL database tables having rows and columns.Storing and managing unstructured data requires specialized databases, along with a variety of business intelligence and analytics applications.It is tough to scale a database schema.It is highly scalable.Structured data gets generated in colleges, universities, banks, companies where people have to deal with names, date of birth, salary, marks and so on.We generate or find unstructured data in social media platforms, emails, analyzed data for business intelligence, call centers, chatbots and so on.Queries in structured data allow complex joining.Unstructured data allows only textual queries.The schema of a structured dataset is less flexible and dependent.An unstructured dataset is flexible but does not have any particular schema.It has various concurrency techniques.It has no concurrency techniques.We can use SQL, MySQL, SQLite, Oracle DB, Teradata to store structured data.We can use NoSQL (Not Only SQL) to store unstructured data.Types of Unstructured Data Do you have any idea just how much of unstructured data we produce and from what sources? Unstructured data includes all those forms of data that we cannot actively manage in an RDBMS system that is a transactional system. We can store structured data in the form of records. But this is not the case with unstructured data. Before the advent of object-based storage, most of the unstructured data was stored in file-based systems. Here are some of the types of unstructured data. Rich media content: Entertainment files, surveillance data, multimedia email attachments, geospatial data, audio files (call center and other recorded audio), weather reports (graphical), etc., comes under this genre. Document data: Invoices, text-file records, email contents, productivity applications, etc., are included under this genre. Internet of Things (IoT) data: Ticker data, sensor data, data from other IoT devices come under this genre. Apart from all these, data from business intelligence and analysis, machine learning datasets, and artificial intelligence data training datasets are also a separate genre of unstructured data. Examples of Unstructured Data There are various sources from where we can obtain unstructured data. The prominent use of this data is in unstructured data analytics. Let us now understand what are some examples of unstructured data and their sources – Healthcare industries generate a massive volume of human as well as machine-generated unstructured data. Human-generated unstructured data could be in the form of patient-doctor or patient-nurse conversations, which are usually recorded in audio or text formats. Unstructured data generated by machines includes emergency video camera footage, surgical robots, data accumulated from medical imaging devices like endoscopes, laparoscopes and more.  Social Media is an intrinsic entity of our daily life. Billions of people come together to join channels, share different thoughts, and exchange information with their loved ones. They create and share such data over social media platforms in the form of images, video clips, audio messages, tagging people (this helps companies to map relations between two or more people), entertainment data, educational data, geolocations, texts, etc. Other spectra of data generated from social media platforms are behavior patterns, perceptions, influencers, trends, news, and events. Business and corporate documents generate a multitude of unstructured data such as emails, presentations, reports containing texts, images, presentation reports, video contents, feedback and much more. These documents help to create knowledge repositories within an organization to make better implicit operations. Live chat, video conferencing, web meeting, chatbot-customer messages, surveillance data are other prominent examples of unstructured data that companies can cultivate to get more insights into the details of a person. Some prominent examples of unstructured data used in enterprises and organizations are: Reports and documents, like Word files or PDF files Multimedia files, such as audio, images, designed texts, themes, and videos System logs Medical images Flat files Scanned documents (which are images that hold numbers and text – for example, OCR) Biometric data Unstructured Data Analytics Tools  You might be wondering what tools can come into use to gather and analyze information that does not have a predefined structure or model. Various tools and programming languages use structured and unstructured data for machine learning and data analysis. These are: Tableau MonkeyLearn Apache Spark SAS Python MS. Excel RapidMiner KNIME QlikView Python programming R programming Many cloud services (like Amazon AWS, Microsoft Azure, IBM Cloud, Google Cloud) also offer unstructured data analysis solutions bundled with their services. How to analyze unstructured data? In the past, the process of storage and analysis of unstructured data was not well defined. Enterprises used to carry out this kind of analysis manually. But with the advent of modern tools and programming languages, most of the unstructured data analysis methods became highly advanced. AI-powered tools use algorithms designed precisely to help to break down unstructured data for analysis. Unstructured data analytics tools, along with Natural language processing (NLP) and machine learning algorithms, help advanced software tools analyze and extract analytical data from the unstructured datasets. Before using these tools for analyzing unstructured data, you must properly go through a few steps and keep these points in mind. Set a clear goal for analyzing the data: It is essential to clear your intention about what insights you want to extract from your unstructured data. Knowing this will help you distinguish what type of data you are planning to accumulate. Collect relevant data: Unstructured data is available everywhere, whether it's a social media platform, online feedback or reviews, or a survey form. Depending on the previous point, that is your goal - you have to be precise about what data you want to collect in real-time. Also, keep in mind whether your collected details are relevant or not. Clean your data: Data cleaning or data cleansing is a significant process to detect corrupt or irrelevant data from the dataset, followed by modifying or deleting the coarse and sloppy data. This phase is also known as the data-preprocessing phase, where you have to reduce the noise, carry out data slicing for meaningful representation, and remove unnecessary data. Use Technology and tools: Once you perform the data cleaning, it is time to utilize unstructured data analysis tools to prepare and cultivate the insights from your data. Technologies used for unstructured data storage (NoSQL) can help in managing your flow of data. Other tools and programming libraries like Tableau, Matplotlib, Pandas, and Google Data Studio allows us to extract and visualize unstructured data. Data can be visualized and presented in the form of compelling graphs, plots, and charts. How to Extract information from Unstructured Data? With the growth in digitization during the information era, repetitious transactions in data cause data flooding. The exponential accretion in the speed of digital data creation has brought a whole new domain of understanding user interaction with the online world. According to Gartner, 80% of the data created by an organization or its application is unstructured. While extracting exact information through appropriate analysis of organized data is not yet possible, even obtaining a decent sense of this unstructured data is quite tough. Until now, there are no perfect tools to analyze unstructured data. But algorithms and tools designed using machine learning, Natural language processing, Deep learning, and Graph Analysis (a mathematical method for estimating graph structures) help us to get the upper hand in extracting information from unstructured data. Other neural network models like modern linguistic models follow unsupervised learning techniques to gain a good 'knowledge' about the unstructured dataset before going into a specific supervised learning step. AI-based algorithms and technologies are capable enough to extract keywords, locations, phone numbers, analyze image meaning (through digital image processing). We can then understand what to evaluate and identify information that is essential to your business. ConclusionUnstructured data is found abundantly from sources like documents, records, emails, social media posts, feedbacks, call-records, log-in session data, video, audio, and images. Manually analyzing unstructured data is very time-consuming and can be very boring at the same time. With the growth of data science and machine learning algorithms and models, it has become easy to gather and analyze insights from unstructured information.  According to some research, data analytics tools like MonkeyLearn Studio, Tableau, RapidMiner help analyze unstructured data 1200x faster than the manual approach. Analyzing such data will help you learn more about your customers as well as competitors. Text analysis software, along with machine learning models, will help you dig deep into such datasets and make you gain an in-depth understanding of the overall scenario with fine-grained analyses.
5737
Role of Unstructured Data in Data Science

Data has become the new game changer for busines... Read More

What Is Statistical Analysis and Its Business Applications?

Statistics is a science concerned with collection, analysis, interpretation, and presentation of data. In Statistics, we generally want to study a population. You may consider a population as a collection of things, persons, or objects under experiment or study. It is usually not possible to gain access to all of the information from the entire population due to logistical reasons. So, when we want to study a population, we generally select a sample. In sampling, we select a portion (or subset) of the larger population and then study the portion (or the sample) to learn about the population. Data is the result of sampling from a population.Major ClassificationThere are two basic branches of Statistics – Descriptive and Inferential statistics. Let us understand the two branches in brief. Descriptive statistics Descriptive statistics involves organizing and summarizing the data for better and easier understanding. Unlike Inferential statistics, Descriptive statistics seeks to describe the data, however, it does not attempt to draw inferences from the sample to the whole population. We simply describe the data in a sample. It is not developed on the basis of probability unlike Inferential statistics. Descriptive statistics is further broken into two categories – Measure of Central Tendency and Measures of Variability. Inferential statisticsInferential statistics is the method of estimating the population parameter based on the sample information. It applies dimensions from sample groups in an experiment to contrast the conduct group and make overviews on the large population sample. Please note that the inferential statistics are effective and valuable only when examining each member of the group is difficult. Let us understand Descriptive and Inferential statistics with the help of an example. Task – Suppose, you need to calculate the score of the players who scored a century in a cricket tournament.  Solution: Using Descriptive statistics you can get the desired results.   Task – Now, you need the overall score of the players who scored a century in the cricket tournament.  Solution: Applying the knowledge of Inferential statistics will help you in getting your desired results.  Top Five Considerations for Statistical Data AnalysisData can be messy. Even a small blunder may cost you a fortune. Therefore, special care when working with statistical data is of utmost importance. Here are a few key takeaways you must consider to minimize errors and improve accuracy. Define the purpose and determine the location where the publication will take place.  Understand the assets to undertake the investigation. Understand the individual capability of appropriately managing and understanding the analysis.  Determine whether there is a need to repeat the process.  Know the expectation of the individuals evaluating reviewing, committee, and supervision. Statistics and ParametersDetermining the sample size requires understanding statistics and parameters. The two being very closely related are often confused and sometimes hard to distinguish.  StatisticsA statistic is merely a portion of a target sample. It refers to the measure of the values calculated from the population.  A parameter is a fixed and unknown numerical value used for describing the entire population. The most commonly used parameters are: Mean Median Mode Mean :  The mean is the average or the most common value in a data sample or a population. It is also referred to as the expected value. Formula: Sum of the total number of observations/the number of observations. Experimental data set: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20  Calculating mean:   (2 + 4 + 6 + 8 + 10 + 12 + 14 + 16 + 18 + 20)/10  = 110/10   = 11 Median:  In statistics, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. It’s the mid-value obtained by arranging the data in increasing order or descending order. Formula:  Let n be the data set (increasing order) When data set is odd: Median = n+1/2th term Case-I: (n is odd)  Experimental data set = 1, 2, 3, 4, 5  Median (n = 5) = [(5 +1)/2]th term      = 6/2 term       = 3rd term   Therefore, the median is 3 When data set is even: Median = [n/2th + (n/2 + 1)th] /2 Case-II: (n is even)  Experimental data set = 1, 2, 3, 4, 5, 6   Median (n = 6) = [n/2th + (n/2 + 1)th]/2     = ( 6/2th + (6/2 +1)th]/2     = (3rd + 4th)/2      = (3 + 4)/2      = 7/2      = 3.5  Therefore, the median is 3.5 Mode: The mode is the value that appears most often in a set of data or a population. Experimental data set= 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,4,5, 6  Mode = 3 (Since 3 is the most repeated element in the sequence.) Terms Used to Describe DataWhen working with data, you will need to search, inspect, and characterize them. To understand the data in a tech-savvy and straightforward way, we use a few statistical terms to denote them individually or in groups.  The most frequently used terms used to describe data include data point, quantitative variables, indicator, statistic, time-series data, variable, data aggregation, time series, dataset, and database. Let us define each one of them in brief: Data points: These are the numerical files formed and organized for interpretations. Quantitative variables: These variables present the information in digit form.  Indicator: An indicator explains the action of a community's social-economic surroundings.  Time-series data: The time-series defines the sequential data.  Data aggregation: A group of data points and data set. Database: A group of arranged information for examination and recovery.  Time-series: A set of measures of a variable documented over a specified time. Step-by-Step Statistical Analysis ProcessThe statistical analysis process involves five steps followed one after another. Step 1: Design the study and find the population of the study. Step 2: Collect data as samples. Step 3: Describe the data in the sample. Step 4: Make inferences with the help of samples and calculations Step 5: Take action Data distributionData distribution is an entry that displays entire imaginable readings of data. It shows how frequently a value occurs. Distributed data is always in ascending order, charts, and graphs enabling visibility of measurements and frequencies. The distribution function displaying the density of values of reading is known as the probability density function. Percentiles in data distributionA percentile is the reading in a distribution with a specified percentage of clarifications under it.  Let us understand percentiles with the help of an example.  Suppose you have scored 90th percentile on a math test. A basic interpretation is that merely 4-5% of the scores were higher than your scores. Right? The median is 50th percentile because the assumed 50% of the values are higher than the median. Dispersion Dispersion explains the magnitude of distribution readings anticipated for a specific variable and multiple unique statistics like range, variance, and standard deviation. For instance, high values of a data set are widely scattered while small values of data are firmly clustered. Histogram The histogram is a pictorial display that arranges a group of data facts into user detailed ranges. A histogram summarizes a data series into a simple interpreted graphic by obtaining many data facts and combining them into reasonable ranges. It contains a variety of results into columns on the x-axis. The y axis displays percentages of data for each column and is applied to picture data distributions. Bell Curve distribution Bell curve distribution is a pictorial representation of a probability distribution whose fundamental standard deviation obtained from the mean makes the bell, shaped curving. The peak point on the curve symbolizes the maximum likely occasion in a pattern of data. The other possible outcomes are symmetrically dispersed around the mean, making a descending sloping curve on both sides of the peak. The curve breadth is therefore known as the standard deviation. Hypothesis testingHypothesis testing is a process where experts experiment with a theory of a population parameter. It aims to evaluate the credibility of a hypothesis using sample data. The five steps involved in hypothesis testing are:  Identify the no outcome hypothesis.  (A worthless or a no-output hypothesis has no outcome, connection, or dissimilarities amongst many factors.) Identify the alternative hypothesis.  Establish the importance level of the hypothesis.  Estimate the experiment statistic and equivalent P-value. P-value explains the possibility of getting a sample statistic.  Sketch a conclusion to interpret into a report about the alternate hypothesis. Types of variablesA variable is any digit, amount, or feature that is countable or measurable. Simply put, it is a variable characteristic that varies. The six types of variables include the following: Dependent variableA dependent variable has values that vary according to the value of another variable known as the independent variable.  Independent variableAn independent variable on the other side is controllable by experts. Its reports are recorded and equated.  Intervening variableAn intervening variable explicates fundamental relations between variables. Moderator variableA moderator variable upsets the power of the connection between dependent and independent variables.  Control variableA control variable is anything restricted to a research study. The values are constant throughout the experiment. Extraneous variableExtraneous variable refers to the entire variables that are dependent but can upset experimental outcomes. Chi-square testChi-square test records the contrast of a model to actual experimental data. Data is unsystematic, underdone, equally limited, obtained from independent variables, and a sufficient sample. It relates the size of any inconsistencies among the expected outcomes and the actual outcomes, provided with the sample size and the number of variables in the connection. Types of FrequenciesFrequency refers to the number of repetitions of reading in an experiment in a given time. Three types of frequency distribution include the following: Grouped, ungrouped Cumulative, relative Relative cumulative frequency distribution. Features of FrequenciesThe calculation of central tendency and position (median, mean, and mode). The measure of dispersion (range, variance, and standard deviation). Degree of symmetry (skewness). Peakedness (kurtosis). Correlation MatrixThe correlation matrix is a table that shows the correlation coefficients of unique variables. It is a powerful tool that summarises datasets points and picture sequences in the provided data. A correlation matrix includes rows and columns that display variables. Additionally, the correlation matrix exploits in aggregation with other varieties of statistical analysis. Inferential StatisticsInferential statistics use random data samples for demonstration and to create inferences. They are measured when analysis of each individual of a whole group is not likely to happen. Applications of Inferential StatisticsInferential statistics in educational research is not likely to sample the entire population that has summaries. For instance, the aim of an investigation study may be to obtain whether a new method of learning mathematics develops mathematical accomplishment for all students in a class. Marketing organizations: Marketing organizations use inferential statistics to dispute a survey and request inquiries. It is because carrying out surveys for all the individuals about merchandise is not likely. Finance departments: Financial departments apply inferential statistics for expected financial plan and resources expenses, especially when there are several indefinite aspects. However, economists cannot estimate all that use possibility. Economic planning: In economic planning, there are potent methods like index figures, time series investigation, and estimation. Inferential statistics measures national income and its components. It gathers info about revenue, investment, saving, and spending to establish links among them. Key TakeawaysStatistical analysis is the gathering and explanation of data to expose sequences and tendencies.   Two divisions of statistical analysis are statistical and non-statistical analyses.  Descriptive and Inferential statistics are the two main categories of statistical analysis. Descriptive statistics describe data, whereas Inferential statistics equate dissimilarities between the sample groups.  Statistics aims to teach individuals how to use restricted samples to generate intellectual and precise results for a large group.   Mean, median, and mode are the statistical analysis parameters used to measure central tendency.   Conclusion Statistical analysis is the procedure of gathering and examining data to recognize sequences and trends. It uses random samples of data obtained from a population to demonstrate and create inferences on a group. Inferential statistics applies economic planning with potent methods like index figures, time series investigation, and estimation.  Statistical analysis finds its applications in all the major sectors – marketing, finance, economic, operations, and data mining. Statistical analysis aids marketing organizations in disputing a survey and requesting inquiries concerning their merchandise. 
5876
What Is Statistical Analysis and Its Business Appl...

Statistics is a science concerned with collection,... Read More

Measures of Dispersion: All You Need to Know

What is Dispersion in StatisticsDispersion in statistics is a way of describing how spread out a set of data is. Dispersion is the state of data getting dispersed, stretched, or spread out in different categories. It involves finding the size of distribution values that are expected from the set of data for the specific variable. The statistical meaning of dispersion is “numeric data that is likely to vary at any instance of average value assumption”.Dispersion of data in Statistics helps one to easily understand the dataset by classifying them into their own specific dispersion criteria like variance, standard deviation, and ranging.Dispersion is a set of measures that helps one to determine the quality of data in an objectively quantifiable manner.The measure of dispersion contains almost the same unit as the quantity being measured. There are many Measures of Dispersion found which help us to get more insights into the data: Range Variance Standard Deviation Skewness IQR  Image SourceTypes of Measure of DispersionThe Measure of Dispersion is divided into two main categories and offer ways of measuring the diverse nature of data. It is mainly used in biological statistics. We can easily classify them by checking whether they contain units or not. So as per the above, we can divide the data into two categories which are: Absolute Measure of Dispersion Relative Measure of DispersionAbsolute Measure of DispersionAbsolute Measure of Dispersion is one with units; it has the same unit as the initial dataset. Absolute Measure of Dispersion is expressed in terms of the average of the dispersion quantities like Standard or Mean deviation. The Absolute Measure of Dispersion can be expressed  in units such as Rupees, Centimetre, Marks, kilograms, and other quantities that are measured depending on the situation. Types of Absolute Measure of Dispersion: Range: Range is the measure of the difference between the largest and smallest value of the data variability. The range is the simplest form of Measure of Dispersion. Example: 1,2,3,4,5,6,7 Range = Highest value – Lowest value   = ( 7 – 1 ) = 6 Mean (μ): Mean is calculated as the average of the numbers. To calculate the Mean, add all the outcomes and then divide it with the total number of terms. Example: 1,2,3,4,5,6,7,8 Mean = (sum of all the terms / total number of terms)                = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8) / 8                = 36 / 8                = 4.5 Variance (σ2): In simple terms, the variance can be calculated by obtaining the sum of the squared distance of each term in the distribution from the Mean, and then dividing this by the total number of the terms in the distribution.  It basically shows how far a number, for example, a student’s mark in an exam, is from the Mean of the entire class. Formula: (σ2) = ∑ ( X − μ)2 / N Standard Deviation: Standard Deviation can be represented as the square root of Variance. To find the standard deviation of any data, you need to find the variance first. Formula: Standard Deviation = √σ Quartile: Quartiles divide the list of numbers or data into quarters. Quartile Deviation: Quartile Deviation is the measure of the difference between the upper and lower quartile. This measure of deviation is also known as interquartile range. Formula: Interquartile Range: Q3 – Q1. Mean deviation: Mean Deviation is also known as an average deviation; it can be computed using the Mean or Median of the data. Mean deviation is represented as the arithmetic deviation of a different item that follows the central tendency. Formula: As mentioned, the Mean Deviation can be calculated using Mean and Median. Mean Deviation using Mean: ∑ | X – M | / N Mean Deviation using Median: ∑ | X – X1 | / N Relative Measure of DispersionRelative Measures of dispersion are the values without units. A relative measure of dispersion is used to compare the distribution of two or more datasets.  The definition of the Relative Measure of Dispersion is the same as the Absolute Measure of Dispersion; the only difference is the measuring quantity.  Types of Relative Measure of Dispersion: Relative Measure of Dispersion is the calculation of the co-efficient of Dispersion, where 2 series are compared, which differ widely in their average.  The main use of the co-efficient of Dispersion is when 2 series with different measurement units are compared.  1. Co-efficient of Range: it is calculated as the ratio of the difference between the largest and smallest terms of the distribution, to the sum of the largest and smallest terms of the distribution.  Formula: L – S / L + S  where L = largest value S= smallest value 2. Co-efficient of Variation: The coefficient of variation is used to compare the 2 data with respect to homogeneity or consistency.  Formula: C.V = (σ / X) 100 X = standard deviation  σ = mean 3. Co-efficient of Standard Deviation: The co-efficient of Standard Deviation is the ratio of standard deviation with the mean of the distribution of terms.  Formula: σ = ( √( X – X1)) / (N - 1) Deviation = ( X – X1)  σ = standard deviation  N= total number  4. Co-efficient of Quartile Deviation: The co-efficient of Quartile Deviation is the ratio of the difference between the upper quartile and the lower quartile to the sum of the upper quartile and lower quartile.  Formula: ( Q3 – Q3) / ( Q3 + Q1) Q3 = Upper Quartile  Q1 = Lower Quartile 5. Co-efficient of Mean Deviation: The co-efficient of Mean Deviation can be computed using the mean or median of the data. Mean Deviation using Mean: ∑ | X – M | / N Mean Deviation using Mean: ∑ | X – X1 | / N Why dispersion is important in a statisticThe knowledge of dispersion is vital in the understanding of statistics. It helps to understand concepts like the diversification of the data, how the data is spread, how it is maintained, and maintaining the data over the central value or central tendency. Moreover, dispersion in statistics provides us with a way to get better insights into data distribution. For example,  3 distinct samples can have the same Mean, Median, or Range but completely different levels of variability. How to Calculate DispersionDispersion can be easily calculated using various dispersion measures, which are already mentioned in the types of Measure of Dispersion described above. Before measuring the data, it is important to understand the diversion of the terms and variation. One can use the following method to calculate the dispersion: Mean Standard deviation Variance Quartile deviation For example, let us consider two datasets: Data A:97,98,99,100,101,102,103  Data B: 70,80,90,100,110,120,130 On calculating the mean and median of the two datasets, both have the same value, which is 100. However, the rest of the dispersion measures are totally different as measured by the above methods.  The range of B is 10 times higher, for instance. How to represent Dispersion in Statistics Dispersion in Statistics can be represented in the form of graphs and pie-charts. Some of the different ways used include: Dot Plots Box Plots Stems Leaf Plots Example: What is the variance of the values 3,8,6,10,12,9,11,10,12,7?  Variation of the values can be calculated using the following formula: (σ2) = ∑ ( X − μ)2 / N (σ2) = 7.36 What is an example of dispersion? One of the examples of dispersion outside the world of statistics is the rainbow- where white light is split into 7 different colours separated via wavelengths.  Some statistical ways of measuring it are- Standard deviation Range Mean absolute difference Median absolute deviation Interquartile change Average deviation Conclusion: Dispersion in statistics refers to the measure of variability of data or terms. Such variability may give random measurement errors where some of the instrumental measurements are found to be imprecise. It is a statistical way of describing how the terms are spread out in different data sets. The more sets of values, the more scattered data is found, and it is always directly proportional. This range of values can vary from 5 - 10 values to 1000 - 10,000 values. This spread of data is described by the range of descriptive range of statistics. The dispersion in statistics can be represented using a Dot Plot, Box Plot, and other different ways. 
9261
Measures of Dispersion: All You Need to Know

What is Dispersion in StatisticsDispersion in stat... Read More