Search

What are Decision Trees in Machine Learning (Classification And Regression)

Introduction to Machine Learning and its typesMachine Learning is an interdisciplinary field of study and is a sub-domain of Artificial Intelligence. It gives computers the ability to learn and infer from a huge amount of homogeneous data, without having to be programmed explicitly.Types of Machine Learning: Machine Learning can broadly be classified into three types:Supervised Learning: If the available dataset has predefined features and labels, on which the machine learning models are trained, then the type of learning is known as Supervised Machine Learning. Supervised Machine Learning Models can broadly be classified into two sub-parts: Classification and Regression. These have been discussed further in detail.Unsupervised Learning: If the available dataset has predefined features but lacks labels, then the Machine Learning algorithms perform operations on this data to assign labels to it or to reduce the dimensionality of the data. There are several types of Unsupervised Learning Models, the most common of them being: Principal Component Analysis (PCA) and Clustering.Reinforcement Learning: Reinforcement Learning is a more advanced type of learning, where, the model learns from “Experience”. Here, features and labels are not clearly defined. The model is just given a “Situation” and is rewarded or penalized based on the “Outcome”. The model thus learns to optimize the “Situation” to maximize the Rewards and hence improves the “Outcome” with “Experience”.ClassificationClassification is the process of determination/prediction of the category to which a data-point may belong to. It is the process by which a Supervised Learning Algorithm learns to draw inference from the features of a given dataset and predict which class or group or category does the particular data point belongs to.Example of Classification: Let’s assume that we are given a few images of handwritten digits (0-9). The problem statement is to “teach” the machine to classify correctly which image corresponds to which digit. A small sample of the dataset is given below:The machine has to be thus trained, such that, when given an input of any such hand-written digit, it has to correctly classify the digits and mention which digit the image represents. This is classed classification of hand-written digits.Looking at another example which is not image-based, we have 2D data (x1 and x2) which is plotted in the form of a graph shown below.The red and green dots represent two different classes or categories of data. The main goal of the classifier is that given one such “dot” of unknown class, based on its “features”, the algorithm should be able to correctly classify if that dot belongs to the red or green class. This is also shown by the line going through the middle, which correctly classifies the majority of the dots.Applications of Classification: Listed below are some of the real-world applications of classification Algorithms.Face Recognition: Face recognition finds its applications in our smartphones and any other place with Biometric security. Face Recognition is nothing but face detection followed by classification. The classification algorithm determines if the face in the image matches with the registered user or not.Medical Image Classification: Given the data of patients, a model that is well trained is often used to classify if the patient has a malignant tumor (cancer), heart ailments, fractures, etc.RegressionRegression is also a type of supervised learning. Unlike classification, it does not predict the class of the given data. Instead, it predicts the corresponding values of a given dataset based on the “features” it encounters.Example of Regression: For this, we will look at a dataset consisting of California Housing Prices. The contents of this dataset are shown below.Here, there are several columns. Each of the columns shows the “features” based on which the machine learning algorithm predicts the housing price (shown by yellow highlight). The primary goal of the regression algorithm is that, given the features of a given house, it should be able to correctly estimate the price of the house. This is called a regression problem. It is similar to curve fitting and is often confused with the same.Applications of Regression: Listed below are some of the real-world applications of regression Algorithms.Stock Market Prediction: Regression algorithms are used to predict the future price of stocks based on certain past features like time of the day or festival time, etc. Stock Market Prediction also falls under a subdomain of study called Time Series Analysis.Object Detection Algorithms: Object Detection is the process of detection of the location of a given object in an image or video. This process returns the coordinates of the pixel values stating the location of the object in the image. These coordinates are determined by using regression algorithms alongside classification.ClassificationRegressionAssign specific classes to the data based on its features.Predict values based on the features of the dataset.Prediction is discrete or categorical in nature.Prediction is continuous in nature.Introduction to the building blocks of Decision TreesIn order to get started with Decision Trees, it is important to understand the basic building blocks of decision trees. Hence, we start building the concepts slowly with some basic theory.1. EntropyDefinition: It is a commonly used concept in Information Theory and is a measure of “purity” of an arbitrary collection of information.Mathematical Equation:Here, given a collection S, containing positive and negative examples, the Entropy of S is given by the above equation, where, p represents the probability of occurrence of that example in the given data.In a more generalized form, Entropy is given by the following equation:Example: As an example, a sample S is taken, which contains 14 data samples and includes 9 positive and 5 negative samples. The same is denoted by the mathematical notion: [9+, 5­­–].Thus, Entropy of the given sample can be calculated as follows:2. Information GainDefinition: With the knowledge of Entropy, the amount of relevant information that is gained form a given random sample size can be calculated and is known as Information Gain.Mathematical Equation:Here, the Gain (S, A) is the Information Gain of an attribute A relative to a sample S. The Values(A) is a set of all possible values for attribute A.Example: As an example, let’s assume S is a collection of 14 training-examples. Here, in this example, we will consider the Attribute to be Wind and the values of that corresponding attribute will be Weak and Strong. In addition to the previous example information, we will assume that out of the previously mentioned 9 positives and 5 negative samples, 6 positive and 2 negative samples have the value of the attribute Wind=Weak, and the remaining have Wind=Strong. Thus, under such a circumstance, the information gained by the attribute Wind is shown below.Decision TreeIntroduction: Since we have the basic building blocks out of the way, let’s try to understand what exactly is a Decision Tree. As the name suggests, it is a Tree which is developed based on certain decisions taken by the algorithm in accordance with the given data that it has been trained on.In simple words, a Decision Tree uses the features in the given data to perform Supervised Learning and develop a tree-like structure (data structure) whose branches are developed in such a way that given the feature-set, the decision tree can predict the expected output relatively accurately.Example:  Let us look at the structure of a decision tree. For this, we will take up an example dataset called the “PlayTennis” dataset. A sample of the dataset is shown below.In summary, the target of the model is to predict if the weather conditions are suitable to play tennis or not, as guided by the dataset shown above.As it can be seen in the dataset, it contains certain information (features) for each day. In this, we have the feature-attributes: Outlook, Temperature, Humidity and Wind and the target-attribute PlayTennis. Each of these attributes can take up certain values, for example, the attribute Outlook has the values Sunny, Rain and Overcast.With a clear idea of the dataset, jumping a bit forward, let us look at the structure of the learned Decision Tree as developed from the above dataset.As shown above, it can clearly be seen that, given certain values for each of the attributes, the learned decision tree is capable of giving a clear answer as to whether the weather is suitable for Tennis or not.Algorithm: With the overall intuition of decision trees, let us look at the formal Algorithm:ID3(Samples, Target_attribute, Attributes):Create a root node for the TreeIf all the Samples are positive, Return a single-node tree Root, with label = +If all the Samples are negative, Return a single-node tree Root, with label = –If Attribute is empty, Return the single-node tree Root with the label = Most common value of the Target_attribute among the Samples.Otherwise:A ← the attribute from Attributes that best classifies the SamplesThe decision attribute for Root ← AFor each possible value of A:Add a new tree branch below Root, corresponding to the test A = viLet the Samplesvi be the subset of Samples that have value vi for AIf Samplesvi is empty:Then below the new branch add a leaf node with the label = most common value of Target_attribute in the samplesElse below the new branch add the subtree:ID3(Samplesvi, Target_attribute, Attributes – {A})EndReturn RootConnecting the dots: Since the overall idea of decision trees have been explained, let’s try to figure out how Entropy and Information Gain fits into this entire process.Entropy (E) is used to calculate Information Gain, which is used to identify which attribute of a given dataset provides the highest amount of information. The attribute which provides the highest amount of information for the given dataset is considered to have more contribution towards the outcome of the classifier and hence is given the higher priority in the tree.For Example, considering the PlayTennis Example, if we calculate the Information Gain for two corresponding Attributes: Humidity and Wind, we would find that Humidity plays a more important role in deciding whether to play tennis or not. Hence, in this case, Humidity is considered as a better classifier. The detailed calculation is shown in the figure below:Applications of Decision TreeWith the basic idea out of the way, let’s look at where decision trees can be used:Select a flight to travel: Decision trees are very good at classification and hence can be used to select which flight would yield the best “bang-for-the-buck”. There are a lot of parameters to consider, such as if the flight is connecting or non-stop, or how reliable is the service record of the given airliner, etc.Selecting alternative products: Often in companies, it is important to determine which product will be more profitable at launch. Given the sales attributes such as market conditions, competition, price, availability of raw materials, demand, etc. a Decision Tree classifier can be used to accurately determine which of the products would maximize the profits.Sentiment Analysis: Sentiment Analysis is the determination of the overall opinion of a given piece of text and is especially used to determine if the writer’s comment towards a given product/service is positive, neutral or negative. Decision trees are very versatile classifiers and are used for sentiment analysis in many Natural Language Processing (NLP) applications.Energy Consumption: It is very important for electricity supply boards to correctly predict the amount of energy consumption in the near future for a particular region. This is to make sure that un-used power can be diverted towards an area with a higher demand to keep a regular and uninterrupted supply of power throughout the grid. Decision Trees are often used to determine which region is expected to require more or less power in the up-coming time-frame.Fault Diagnosis: In the Engineering domain, one of the widely used applications of decision trees is the determination of faults. In the case of load-bearing rotatory machines, it is important to determine which of the component(s) have failed and which ones can directly or indirectly be affected by the failure. This is determined by a set of measurements that are taken. Unfortunately, there are numerous measurements to take and among them, there are some measurements which are not relevant to the detection of the fault. A Decision Tree classifier can be used to quickly determine which of these measurements are relevant in the determination of the fault.Advantages of Decision TreeListed below are some of the advantages of Decision Trees:Comprehensive: Another significant advantage of a decision tree is that it forces the algorithm to take into consideration all the possible outcomes of a decision and traces each path to a conclusion.Specific: The output of decision trees is very specific and reduces uncertainty in the prediction. Hence, they are considered as really good classifiers.Easy to use: Decision Trees are one of the simplest, yet most versatile algorithms in Machine Learning. It is based on simple math and no complex formulas. They are easy to visualize, understand and explain.Versatile: A lot of business problems can be solved using Decision Trees. They find their applications in the field of Engineering, Management, Medicine, etc. basically, any situation where data is available and a decision needs to be taken in uncertain conditions.Resistant to data abnormalities: Data is never perfect and there are always many abnormalities in the dataset. Some of the most common abnormalities are outliers, missing data and noise. While most Machine Learning algorithms fail with even a minor set of abnormalities, Decision Trees are very resilient and is able to handle a fair percentage of such abnormalities quite well without altering the results.Visualization of the decision taken: Often in Machine Learning models, data scientists struggle to reason as to why a certain model is giving a certain set of outputs. Unfortunately, for most of the algorithms, it is not possible to clearly determine and visualize the actual process of classification that leads to the final outcome. However, decision trees are very easy to visualize. Once the tree is trained, it can be visualized and the programmer can see exactly how and why the conclusion was reached. It is also easy to explain the outcome to a non-technical team with the “tree” type visualization. This is why many organizations prefer to use decision trees over other Machine Learning Algorithms.Limitations of Decision TreeListed below are some of the limitations of Decision Trees:Sensitivity to hyperparameter tuning: Decision Trees are very sensitive to hyperparameter tuning. Hyperparameters are those parameters which are in control of the programmer and can be tuned to get better performance out of a given model. Unfortunately, the output of a decision tree can vary drastically if the hyperparameters are inaccurately tuned.Overfitting: Decision trees are prone to overfitting. Overfitting is a concept where the model learns the data too well and hence performs well on training dataset but fails to perform on testing dataset. Decision trees are prone to overfitting if the breadth and depth of the tree is set to very high for a simpler dataset.Underfitting: Similar to overfitting, decision trees are also prone to underfitting. Underfitting is a concept where the model is too simple for it to learn the dataset effectively. Decision tree suffers from underfitting if the breadth and depth of the model or the number of nodes are set too low. This does not allow the model to fit the data properly and hence fails to learn.Code ExamplesWith the theory out of the way, let’s look at the practical implementation of decision tree classifiers and regressors.1. ClassificationIn order to conduct classification, a diabetes dataset from Kaggle has been used. It can be downloaded.The initial step for any data science application is data visualization. Hence, the dataset is shown below:The highlighted column is the target value that the model is expected to predict, given the parameters.Load the Libraries. We will be using pandas to load and manipulate data. Sklearn is used for applying Machine Learning models on the data.# Load libraries import pandas as pd from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier from sklearn.model_selection import train_test_split # Import train_test_split function from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation Load the data. Pandas is used to read the data from the CSV.col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label'] # load dataset pima = pd.read_csv("pima-indians-diabetes.csv", header=None, names=col_names)Feature Selection: The relevant features are selected for the classification.#split dataset in features and target variable feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree'] X = pima[feature_cols] # Features y = pima.label # Target variablesplitting the data: The dataset needs to be split into training and testing data. The training data is used to train the model, while the testing data is used to test the model’s performance on unseen data.# Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% testBuilding the decision tree. These few lines initialize, train and predict on the dataset.# Create Decision Tree classifier object clf = DecisionTreeClassifier() # Train Decision Tree Classifier clf = clf.fit(X_train,y_train) #Predict the response for test dataset y_pred = clf.predict(X_test)The model’s accuracy is evaluated by using Sklearn’s metrics library. # Model Accuracy, how often is the classifier correct? print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) Output: Accuracy: 0.6753246753246753This will generate the decision tree that is shown in the following image2. RegressionIn order to conduct classification, a diabetes dataset from Kaggle has been used.For this example, we will generate a Numpy Array which will simulate a scatter plot resembling a sine wave with a few randomly added noise elements. # Import the necessary modules and libraries import numpy as np from sklearn.tree import DecisionTreeRegressor import matplotlib.pyplot as plt # Create a random dataset rng = np.random.RandomState(1) X = np.sort(5 * rng.rand(80, 1), axis=0) y = np.sin(X).ravel() y[::5] += 3 * (0.5 - rng.rand(16))This time we create two regression models to experiment and see how overfitting looks like for a decision tree. Hence, we initialize the two Decision Tree Regression objects and train them on the given data.# Fit regression model regr_1 = DecisionTreeRegressor(max_depth=2) regr_2 = DecisionTreeRegressor(max_depth=5) regr_1.fit(X, y) regr_2.fit(X, y)After fitting the model, we predict on a custom test dataset and plot the results to see how it performed.# Predict X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis] y_1 = regr_1.predict(X_test) y_2 = regr_2.predict(X_test) # Plot the results plt.figure() plt.scatter(X, y, s=20, edgecolor="black",             c="darkorange", label="data") plt.plot(X_test, y_1, color="cornflowerblue",         label="max_depth=2", linewidth=2) plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2) plt.xlabel("data") plt.ylabel("target") plt.title("Decision Tree Regression") plt.legend() plt.show() The graph that is thus generated is shown below. Here we can clearly see that for a simple dataset  when we used max_depth=5 (green), the model started to overfit and learned the patterns of the noise along with the sine wave. Such kinds of models do not perform well. Meanwhile, for max_depth=3 (blue), it has fitted the dataset in a better way when compared to the other one.ConclusionIn this article, we tried to build an intuition, by starting from the basics of the theory behind the working of a decision tree classifier. However, covering every aspect of detail is beyond the scope of this article. Hence, it is suggested to go through this book to dive deeper into the specifics. Further, moving on, the code snippets introduces the “Hello World” of how to use both, real-world data and artificially generated data to train a Decision Tree model and predict using the same. This will allow any novice to get an overall balanced theoretical and practical idea about the workings of Classification and Regression Trees and their implementation.
What are Decision Trees in Machine Learning (Classification And Regression)
Animikh
Rated 4.5/5 based on 12 customer reviews
Animikh

Animikh Aich

Computer Vision Engineer

Animikh Aich is a Deep Learning enthusiast, currently working as a Computer Vision Engineer. His work includes three International Conference publications and several projects based on Computer Vision and Machine Learning.

Posts by Animikh Aich

What are Decision Trees in Machine Learning (Classification And Regression)

Introduction to Machine Learning and its typesMachine Learning is an interdisciplinary field of study and is a sub-domain of Artificial Intelligence. It gives computers the ability to learn and infer from a huge amount of homogeneous data, without having to be programmed explicitly.Types of Machine Learning: Machine Learning can broadly be classified into three types:Supervised Learning: If the available dataset has predefined features and labels, on which the machine learning models are trained, then the type of learning is known as Supervised Machine Learning. Supervised Machine Learning Models can broadly be classified into two sub-parts: Classification and Regression. These have been discussed further in detail.Unsupervised Learning: If the available dataset has predefined features but lacks labels, then the Machine Learning algorithms perform operations on this data to assign labels to it or to reduce the dimensionality of the data. There are several types of Unsupervised Learning Models, the most common of them being: Principal Component Analysis (PCA) and Clustering.Reinforcement Learning: Reinforcement Learning is a more advanced type of learning, where, the model learns from “Experience”. Here, features and labels are not clearly defined. The model is just given a “Situation” and is rewarded or penalized based on the “Outcome”. The model thus learns to optimize the “Situation” to maximize the Rewards and hence improves the “Outcome” with “Experience”.ClassificationClassification is the process of determination/prediction of the category to which a data-point may belong to. It is the process by which a Supervised Learning Algorithm learns to draw inference from the features of a given dataset and predict which class or group or category does the particular data point belongs to.Example of Classification: Let’s assume that we are given a few images of handwritten digits (0-9). The problem statement is to “teach” the machine to classify correctly which image corresponds to which digit. A small sample of the dataset is given below:The machine has to be thus trained, such that, when given an input of any such hand-written digit, it has to correctly classify the digits and mention which digit the image represents. This is classed classification of hand-written digits.Looking at another example which is not image-based, we have 2D data (x1 and x2) which is plotted in the form of a graph shown below.The red and green dots represent two different classes or categories of data. The main goal of the classifier is that given one such “dot” of unknown class, based on its “features”, the algorithm should be able to correctly classify if that dot belongs to the red or green class. This is also shown by the line going through the middle, which correctly classifies the majority of the dots.Applications of Classification: Listed below are some of the real-world applications of classification Algorithms.Face Recognition: Face recognition finds its applications in our smartphones and any other place with Biometric security. Face Recognition is nothing but face detection followed by classification. The classification algorithm determines if the face in the image matches with the registered user or not.Medical Image Classification: Given the data of patients, a model that is well trained is often used to classify if the patient has a malignant tumor (cancer), heart ailments, fractures, etc.RegressionRegression is also a type of supervised learning. Unlike classification, it does not predict the class of the given data. Instead, it predicts the corresponding values of a given dataset based on the “features” it encounters.Example of Regression: For this, we will look at a dataset consisting of California Housing Prices. The contents of this dataset are shown below.Here, there are several columns. Each of the columns shows the “features” based on which the machine learning algorithm predicts the housing price (shown by yellow highlight). The primary goal of the regression algorithm is that, given the features of a given house, it should be able to correctly estimate the price of the house. This is called a regression problem. It is similar to curve fitting and is often confused with the same.Applications of Regression: Listed below are some of the real-world applications of regression Algorithms.Stock Market Prediction: Regression algorithms are used to predict the future price of stocks based on certain past features like time of the day or festival time, etc. Stock Market Prediction also falls under a subdomain of study called Time Series Analysis.Object Detection Algorithms: Object Detection is the process of detection of the location of a given object in an image or video. This process returns the coordinates of the pixel values stating the location of the object in the image. These coordinates are determined by using regression algorithms alongside classification.ClassificationRegressionAssign specific classes to the data based on its features.Predict values based on the features of the dataset.Prediction is discrete or categorical in nature.Prediction is continuous in nature.Introduction to the building blocks of Decision TreesIn order to get started with Decision Trees, it is important to understand the basic building blocks of decision trees. Hence, we start building the concepts slowly with some basic theory.1. EntropyDefinition: It is a commonly used concept in Information Theory and is a measure of “purity” of an arbitrary collection of information.Mathematical Equation:Here, given a collection S, containing positive and negative examples, the Entropy of S is given by the above equation, where, p represents the probability of occurrence of that example in the given data.In a more generalized form, Entropy is given by the following equation:Example: As an example, a sample S is taken, which contains 14 data samples and includes 9 positive and 5 negative samples. The same is denoted by the mathematical notion: [9+, 5­­–].Thus, Entropy of the given sample can be calculated as follows:2. Information GainDefinition: With the knowledge of Entropy, the amount of relevant information that is gained form a given random sample size can be calculated and is known as Information Gain.Mathematical Equation:Here, the Gain (S, A) is the Information Gain of an attribute A relative to a sample S. The Values(A) is a set of all possible values for attribute A.Example: As an example, let’s assume S is a collection of 14 training-examples. Here, in this example, we will consider the Attribute to be Wind and the values of that corresponding attribute will be Weak and Strong. In addition to the previous example information, we will assume that out of the previously mentioned 9 positives and 5 negative samples, 6 positive and 2 negative samples have the value of the attribute Wind=Weak, and the remaining have Wind=Strong. Thus, under such a circumstance, the information gained by the attribute Wind is shown below.Decision TreeIntroduction: Since we have the basic building blocks out of the way, let’s try to understand what exactly is a Decision Tree. As the name suggests, it is a Tree which is developed based on certain decisions taken by the algorithm in accordance with the given data that it has been trained on.In simple words, a Decision Tree uses the features in the given data to perform Supervised Learning and develop a tree-like structure (data structure) whose branches are developed in such a way that given the feature-set, the decision tree can predict the expected output relatively accurately.Example:  Let us look at the structure of a decision tree. For this, we will take up an example dataset called the “PlayTennis” dataset. A sample of the dataset is shown below.In summary, the target of the model is to predict if the weather conditions are suitable to play tennis or not, as guided by the dataset shown above.As it can be seen in the dataset, it contains certain information (features) for each day. In this, we have the feature-attributes: Outlook, Temperature, Humidity and Wind and the target-attribute PlayTennis. Each of these attributes can take up certain values, for example, the attribute Outlook has the values Sunny, Rain and Overcast.With a clear idea of the dataset, jumping a bit forward, let us look at the structure of the learned Decision Tree as developed from the above dataset.As shown above, it can clearly be seen that, given certain values for each of the attributes, the learned decision tree is capable of giving a clear answer as to whether the weather is suitable for Tennis or not.Algorithm: With the overall intuition of decision trees, let us look at the formal Algorithm:ID3(Samples, Target_attribute, Attributes):Create a root node for the TreeIf all the Samples are positive, Return a single-node tree Root, with label = +If all the Samples are negative, Return a single-node tree Root, with label = –If Attribute is empty, Return the single-node tree Root with the label = Most common value of the Target_attribute among the Samples.Otherwise:A ← the attribute from Attributes that best classifies the SamplesThe decision attribute for Root ← AFor each possible value of A:Add a new tree branch below Root, corresponding to the test A = viLet the Samplesvi be the subset of Samples that have value vi for AIf Samplesvi is empty:Then below the new branch add a leaf node with the label = most common value of Target_attribute in the samplesElse below the new branch add the subtree:ID3(Samplesvi, Target_attribute, Attributes – {A})EndReturn RootConnecting the dots: Since the overall idea of decision trees have been explained, let’s try to figure out how Entropy and Information Gain fits into this entire process.Entropy (E) is used to calculate Information Gain, which is used to identify which attribute of a given dataset provides the highest amount of information. The attribute which provides the highest amount of information for the given dataset is considered to have more contribution towards the outcome of the classifier and hence is given the higher priority in the tree.For Example, considering the PlayTennis Example, if we calculate the Information Gain for two corresponding Attributes: Humidity and Wind, we would find that Humidity plays a more important role in deciding whether to play tennis or not. Hence, in this case, Humidity is considered as a better classifier. The detailed calculation is shown in the figure below:Applications of Decision TreeWith the basic idea out of the way, let’s look at where decision trees can be used:Select a flight to travel: Decision trees are very good at classification and hence can be used to select which flight would yield the best “bang-for-the-buck”. There are a lot of parameters to consider, such as if the flight is connecting or non-stop, or how reliable is the service record of the given airliner, etc.Selecting alternative products: Often in companies, it is important to determine which product will be more profitable at launch. Given the sales attributes such as market conditions, competition, price, availability of raw materials, demand, etc. a Decision Tree classifier can be used to accurately determine which of the products would maximize the profits.Sentiment Analysis: Sentiment Analysis is the determination of the overall opinion of a given piece of text and is especially used to determine if the writer’s comment towards a given product/service is positive, neutral or negative. Decision trees are very versatile classifiers and are used for sentiment analysis in many Natural Language Processing (NLP) applications.Energy Consumption: It is very important for electricity supply boards to correctly predict the amount of energy consumption in the near future for a particular region. This is to make sure that un-used power can be diverted towards an area with a higher demand to keep a regular and uninterrupted supply of power throughout the grid. Decision Trees are often used to determine which region is expected to require more or less power in the up-coming time-frame.Fault Diagnosis: In the Engineering domain, one of the widely used applications of decision trees is the determination of faults. In the case of load-bearing rotatory machines, it is important to determine which of the component(s) have failed and which ones can directly or indirectly be affected by the failure. This is determined by a set of measurements that are taken. Unfortunately, there are numerous measurements to take and among them, there are some measurements which are not relevant to the detection of the fault. A Decision Tree classifier can be used to quickly determine which of these measurements are relevant in the determination of the fault.Advantages of Decision TreeListed below are some of the advantages of Decision Trees:Comprehensive: Another significant advantage of a decision tree is that it forces the algorithm to take into consideration all the possible outcomes of a decision and traces each path to a conclusion.Specific: The output of decision trees is very specific and reduces uncertainty in the prediction. Hence, they are considered as really good classifiers.Easy to use: Decision Trees are one of the simplest, yet most versatile algorithms in Machine Learning. It is based on simple math and no complex formulas. They are easy to visualize, understand and explain.Versatile: A lot of business problems can be solved using Decision Trees. They find their applications in the field of Engineering, Management, Medicine, etc. basically, any situation where data is available and a decision needs to be taken in uncertain conditions.Resistant to data abnormalities: Data is never perfect and there are always many abnormalities in the dataset. Some of the most common abnormalities are outliers, missing data and noise. While most Machine Learning algorithms fail with even a minor set of abnormalities, Decision Trees are very resilient and is able to handle a fair percentage of such abnormalities quite well without altering the results.Visualization of the decision taken: Often in Machine Learning models, data scientists struggle to reason as to why a certain model is giving a certain set of outputs. Unfortunately, for most of the algorithms, it is not possible to clearly determine and visualize the actual process of classification that leads to the final outcome. However, decision trees are very easy to visualize. Once the tree is trained, it can be visualized and the programmer can see exactly how and why the conclusion was reached. It is also easy to explain the outcome to a non-technical team with the “tree” type visualization. This is why many organizations prefer to use decision trees over other Machine Learning Algorithms.Limitations of Decision TreeListed below are some of the limitations of Decision Trees:Sensitivity to hyperparameter tuning: Decision Trees are very sensitive to hyperparameter tuning. Hyperparameters are those parameters which are in control of the programmer and can be tuned to get better performance out of a given model. Unfortunately, the output of a decision tree can vary drastically if the hyperparameters are inaccurately tuned.Overfitting: Decision trees are prone to overfitting. Overfitting is a concept where the model learns the data too well and hence performs well on training dataset but fails to perform on testing dataset. Decision trees are prone to overfitting if the breadth and depth of the tree is set to very high for a simpler dataset.Underfitting: Similar to overfitting, decision trees are also prone to underfitting. Underfitting is a concept where the model is too simple for it to learn the dataset effectively. Decision tree suffers from underfitting if the breadth and depth of the model or the number of nodes are set too low. This does not allow the model to fit the data properly and hence fails to learn.Code ExamplesWith the theory out of the way, let’s look at the practical implementation of decision tree classifiers and regressors.1. ClassificationIn order to conduct classification, a diabetes dataset from Kaggle has been used. It can be downloaded.The initial step for any data science application is data visualization. Hence, the dataset is shown below:The highlighted column is the target value that the model is expected to predict, given the parameters.Load the Libraries. We will be using pandas to load and manipulate data. Sklearn is used for applying Machine Learning models on the data.# Load libraries import pandas as pd from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier from sklearn.model_selection import train_test_split # Import train_test_split function from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation Load the data. Pandas is used to read the data from the CSV.col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label'] # load dataset pima = pd.read_csv("pima-indians-diabetes.csv", header=None, names=col_names)Feature Selection: The relevant features are selected for the classification.#split dataset in features and target variable feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree'] X = pima[feature_cols] # Features y = pima.label # Target variablesplitting the data: The dataset needs to be split into training and testing data. The training data is used to train the model, while the testing data is used to test the model’s performance on unseen data.# Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% testBuilding the decision tree. These few lines initialize, train and predict on the dataset.# Create Decision Tree classifier object clf = DecisionTreeClassifier() # Train Decision Tree Classifier clf = clf.fit(X_train,y_train) #Predict the response for test dataset y_pred = clf.predict(X_test)The model’s accuracy is evaluated by using Sklearn’s metrics library. # Model Accuracy, how often is the classifier correct? print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) Output: Accuracy: 0.6753246753246753This will generate the decision tree that is shown in the following image2. RegressionIn order to conduct classification, a diabetes dataset from Kaggle has been used.For this example, we will generate a Numpy Array which will simulate a scatter plot resembling a sine wave with a few randomly added noise elements. # Import the necessary modules and libraries import numpy as np from sklearn.tree import DecisionTreeRegressor import matplotlib.pyplot as plt # Create a random dataset rng = np.random.RandomState(1) X = np.sort(5 * rng.rand(80, 1), axis=0) y = np.sin(X).ravel() y[::5] += 3 * (0.5 - rng.rand(16))This time we create two regression models to experiment and see how overfitting looks like for a decision tree. Hence, we initialize the two Decision Tree Regression objects and train them on the given data.# Fit regression model regr_1 = DecisionTreeRegressor(max_depth=2) regr_2 = DecisionTreeRegressor(max_depth=5) regr_1.fit(X, y) regr_2.fit(X, y)After fitting the model, we predict on a custom test dataset and plot the results to see how it performed.# Predict X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis] y_1 = regr_1.predict(X_test) y_2 = regr_2.predict(X_test) # Plot the results plt.figure() plt.scatter(X, y, s=20, edgecolor="black",             c="darkorange", label="data") plt.plot(X_test, y_1, color="cornflowerblue",         label="max_depth=2", linewidth=2) plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2) plt.xlabel("data") plt.ylabel("target") plt.title("Decision Tree Regression") plt.legend() plt.show() The graph that is thus generated is shown below. Here we can clearly see that for a simple dataset  when we used max_depth=5 (green), the model started to overfit and learned the patterns of the noise along with the sine wave. Such kinds of models do not perform well. Meanwhile, for max_depth=3 (blue), it has fitted the dataset in a better way when compared to the other one.ConclusionIn this article, we tried to build an intuition, by starting from the basics of the theory behind the working of a decision tree classifier. However, covering every aspect of detail is beyond the scope of this article. Hence, it is suggested to go through this book to dive deeper into the specifics. Further, moving on, the code snippets introduces the “Hello World” of how to use both, real-world data and artificially generated data to train a Decision Tree model and predict using the same. This will allow any novice to get an overall balanced theoretical and practical idea about the workings of Classification and Regression Trees and their implementation.
Rated 4.5/5 based on 12 customer reviews
8181
What are Decision Trees in Machine Learning (Class...

Introduction to Machine Learning and its typesMach... Read More

Overfitting and Underfitting With Algorithms in Machine Learning

Curve fitting is the process of determining the best fit mathematical function for a given set of data points. It examines the relationship between multiple independent variables (predictors) and a dependent variable (response) in order to determine the “best fit” line.In the figure shown, the red line represents the curve that is the best fit for the given purple data points. It can also be seen that curve fitting does not necessarily mean that the curve should pass over each and every data point. Instead, it is the most appropriate curve that represents all the data points adequately.Curve Fitting vs. Machine LearningAs discussed, curve fitting refers to finding the “best fit” curve or line for a given set of data points. Even though this is also what a part of Machine Learning or Data Science does, the applications of Machine Learning or Data Science far outweigh that of Curve Fitting.The major difference is that during Curve Fitting, the entire data is available to the developer. However, when it comes to Machine Learning, the amount of data available to the developer is only a part of the real-world data on which the Fitted Model will be applied.Even then, Machine Learning is a vast interdisciplinary field and it consists of a lot more than just “Curve Fitting”. Machine Learning can be broadly classified into Supervised, Unsupervised and Reinforcement Learning. Considering the fact that most of the real-world problems are solved by Supervised Learning, this article concentrates on Supervised Learning itself.Supervised learning can be further classified into Classification and Regression. In this case, the work done by Regression is similar to what Curve Fitting achieves. To get a broader idea, let’s look at the difference between Classification and Regression:ClassificationRegressionIt is the process of separating/classifying two or more types of data into separate categories or classes based on their characteristics.It is the process of determining the “Best Fit” curve for the given data such that, on unseen data, the data points lying on the curve accurately represent the desired result.The output values are discrete in nature (eg. 0, 1, 2, 3, etc) and are known as “Classes”.The output values are continuous in nature (eg. 0.1, 1.78, 9.54, etc).Here, the two classes (red and blue colored points) are clearly separated by the line(s) in the middle. This is an example of classification.Here, the curve represented by the magenta line is the “Best Fit” line for all the data points as shown. This is an example of Regression.Noise in DataThe data that is obtained from the real world is not ideal or noise-free. It contains a lot of noise, which needs to be filtered out before applying the Machine Learning Algorithms.As shown in the above image, the few extra data points in the top of the left graph represent unnecessary noise, which in technical terms is known as “Outliers”. As shown in the difference between the left and the right graphs, the presence of outliers makes a considerable amount of difference when it comes to the determination of the “Best Fit” line. Hence, it is of immense importance to apply preprocessing techniques in order to remove outliers from the data.Let us look at two of the most common types of noise in Data:Outliers: As already discussed, outliers are data points which do not belong to the original set of data. These data points are either too high or too low in value, such that they do not belong to the general distribution of the rest of the dataset. They are usually due to misrepresentation or an accidental entry of wrong data. There are several statistical algorithms which are used to detect and remove such outliers.Missing Data: In sharp contrast to outliers, missing data is another major challenge when it comes to the dataset. The occurrence is quite common in tabular datasets (eg. CSV files) and is a challenge if the number of missing data points exceeds 10% of the total size of the dataset. Most Machine Learning algorithms fail to perform on such datasets. However, certain algorithms such as Decision Trees are quite resilient when it comes to data with missing data and are able to provide accurate results even when supplied with such noisy datasets. Similar to Outliers, there are statistical methods to handle missing data or “NaN” (Not a Number) values. The most common of them is to remove or “drop” the row containing the missing data. Training of Data“Training” is terminology associated with Machine Learning and it basically means the “Fitting” of data or “Learning” from data. This is the step where the Model starts to learn from the given data in order to be able to predict on similar but unseen data. This step is crucial since the final output (or Prediction) of the model will be based on how well the model was able to acquire the patterns of the training data.Training in Machine Learning: Depending on the type of data, the training methodology varies. Hence, here we assume simple tabular (eg. CSV) text data. Before the model can be fitted on the data, there are a few steps that have to be followed:Data Cleaning/Preprocessing: The raw data that is thus obtained from the real-world is likely to contain a good amount of noise in it. In addition to that, the data might not be homogenous, which means, the values of different “features” might belong to different ranges. Hence, after the removal of noise, the data needs to be normalized or scaled in order to make it homogeneous.Feature Engineering: In a tabular dataset, all the columns that describe the data are called “Features”. These features are necessary to correctly predict the target value. However, data often contains columns which are irrelevant to the output of the model. Hence, these columns need to be removed or statistically processed to make sure that they do not interfere with the training of the model on features that are relevant. In addition to the removal of irrelevant features, it is often required to create new relevant features from the existing features. This allows the model to learn better and this process is also called “Feature Extraction”.Train, Validation and Test Split: After the data has been preprocessed and is ready for training, the data is split into Training Data, Validation Data and Testing Data in the ratio of 60:20:20 (usually). This ratio varies depending on the availability of data and on the application. This is done to ensure that the model does not unnecessarily “Overfit” or “Underfit”, and performs equally well when deployed in the real world.Training: Finally, as the last step,  the Training Data is fed into the model to train upon. Multiple models can be trained simultaneously and their performance can be measured against each other with the help of the Validation Set, based on which the best model is selected. This is called “Model Selection”. Finally, the selected model is used to predict on the Test Set to get a final test score, which more or less accurately defines the performance of the model on the given dataset.Training in Deep Learning: Deep Learning is a part of machine learning, but instead of relying on statistical methods, Deep Learning Techniques largely depend on calculus and aims to mimic the Neural structure of the biological brain, and hence, are often referred to as Neural Networks.The training process for Deep Learning is quite similar to that of Machine Learning except that there is no need for “Feature Engineering”. Since deep learning models largely rely on weights to specify the importance of given input (feature), the model automatically tends to learn which features are relevant and which feature is not. Hence, it assigns a “high” weight to the features that are relevant and assigns a “low” weight to the features that are not relevant. This removes the need for a separate Feature Engineering.This difference is correctly portrayed in the following figure:Improper Training of Data: As discussed above, the training of data is the most crucial step of any Machine Learning Algorithm. Improper training can lead to drastic performance degradation of the model on deployment. On a high level, there are two main types of outcomes of Improper Training: Underfitting and Overfitting.UnderfittingWhen the complexity of the model is too less for it to learn the data that is given as input, the model is said to “Underfit”. In other words, the excessively simple model fails to “Learn” the intricate patterns and underlying trends of the given dataset. Underfitting occurs for a model with Low Variance and High Bias.Underfitting data Visualization: With the initial idea out of the way, visualization of an underfitting model is important. This helps in determining if the model is underfitting the given data during training. As already discussed, supervised learning is of two types: Classification and Regression. The following graphs show underfitting for both of these cases:Classification: As shown in the figure below, the model is trained to classify between the circles and crosses. However, it is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.Regression: As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit” properly to the given data due to low model complexity.Detection of underfitting model: The model may underfit the data, but it is necessary to know when it does so. The following steps are the checks that are used to determine if the model is underfitting or not.Training and Validation Loss: During training and validation, it is important to check the loss that is generated by the model. If the model is underfitting, the loss for both training and validation will be significantly high. In terms of Deep Learning, the loss will not decrease at the rate that it is supposed to if the model has reached saturation or is underfitting.Over Simplistic Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is over-simplistic (as shown in the image above), then the model is suffering from underfitting. A more complex model is to be tried out.Classification: A lot of classes will be misclassified in the training set as well as the validation set. On data visualization, the graph would indicate that if there was a more complex model, more classes would have been correctly classified.Regression: The final “Best Fit” line will fail to fit the data points in an effective manner. On visualization, it would clearly seem that a more complex curve can fit the data better.Fix for an underfitting model: If the model is underfitting, the developer can take the following steps to recover from the underfitting state:Train Longer: Since underfitting means less model complexity, training longer can help in learning more complex patterns. This is especially true in terms of Deep Learning.Train a more complex model: The main reason behind the model to underfit is using a model of lesser complexity than required for the data. Hence, the most obvious fix is to use a more complex model. In terms of Deep Learning, a deeper network can be used.Obtain more features: If the data set lacks enough features to get a clear inference, then Feature Engineering or collecting more features will help fit the data better.Decrease Regularization: Regularization is the process that helps Generalize the model by avoiding overfitting. However, if the model is learning less or underfitting, then it is better to decrease or completely remove Regularization techniques so that the model can learn better.New Model Architecture: Finally, if none of the above approaches work, then a new model can be used, which may provide better results.OverfittingWhen the complexity of the model is too high as compared to the data that it is trying to learn from, the model is said to “Overfit”. In other words, with increasing model complexity, the model tends to fit the Noise present in data (eg. Outliers). The model learns the data too well and hence fails to Generalize. Overfitting occurs for a model with High Variance and Low Bias.Overfitting data Visualization: With the initial idea out of the way, visualization of an overfitting model is important. Similar to underfitting, overfitting can also be showcased in two forms of supervised learning: Classification and Regression. The following graphs show overfitting for both of these cases:Classification: As shown in the figure below, the model is trained to classify between the circles and crosses, and unlike last time, this time the model learns too well. It even tends to classify the noise in the data by creating an excessively complex model (right).Regression: As shown in the figure below, the data points are laid out in a given pattern, and instead of determining the least complex model that fits the data properly, the model on the right has fitted the data points too well when compared to the appropriate fitting (left).Detection of overfitting model: The parameters to look out for to determine if the model is overfitting or not is similar to those of underfitting ones. These are listed below:Training and Validation Loss: As already mentioned, it is important to measure the loss of the model during training and validation. A very low training loss but a high validation loss would signify that the model is overfitting. Additionally, in Deep Learning, if the training loss keeps on decreasing but the validation loss remains stagnant or starts to increase, it also signifies that the model is overfitting.Too Complex Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is too complex to be the simplest solution which fits the data points appropriately, then the model is overfitting.Classification: If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting.Regression: If the final “Best Fit” line crosses over every single data point by forming an unnecessarily complex curve, then the model is likely overfitting.Fix for an overfitting model: If the model is overfitting, the developer can take the following steps to recover from the overfitting state:Early Stopping during Training: This is especially prevalent in Deep Learning. Allowing the model to train for a high number of epochs (iterations) may lead to overfitting. Hence it is necessary to stop the model from training when the model has started to overfit. This is done by monitoring the validation loss and stopping the model when the loss stops decreasing over a given number of epochs (or iterations).Train with more data: Often, the data available for training is less when compared to the model complexity. Hence, in order to get the model to fit appropriately, it is often advisable to increase the training dataset size.Train a less complex model: As mentioned earlier, the main reason behind overfitting is excessive model complexity for a relatively less complex dataset. Hence it is advisable to reduce the model complexity in order to avoid overfitting. For Deep Learning, the model complexity can be reduced by reducing the number of layers and neurons.Remove features: As a contrast to the steps to avoid underfitting, if the number of features is too many, then the model tends to overfit. Hence, reducing the number of unnecessary or irrelevant features often leads to a better and more generalized model. Deep Learning models are usually not affected by this.Regularization: Regularization is the process of simplification of the model artificially, without losing the flexibility that it gains from having a higher complexity. With the increase in regularization, the effective model complexity decreases and hence prevents overfitting.Ensembling: Ensembling is a Machine Learning method which is used to combine the predictions from multiple separate models. It reduces the model complexity and reduces the errors of each model by taking the strengths of multiple models. Out of multiple ensembling methods, two of the most commonly used are Bagging and Boosting.GeneralizationThe term “Generalization” in Machine Learning refers to the ability of a model to train on a given data and be able to predict with a respectable accuracy on similar but completely new or unseen data. Model generalization can also be considered as the prevention of overfitting of data by making sure that the model learns adequately.Generalization and its effect on an Underfitting Model: If a model is underfitting a given dataset, then all efforts to generalize that model should be avoided. Generalization should only be the goal if the model has learned the patterns of the dataset properly and needs to generalize on top of that. Any attempt to generalize an already underfitting model will lead to further underfitting since it tends to reduce model complexity.Generalization and its effect on Overfitting Model: If a model is overfitting, then it is the ideal candidate to apply generalization techniques upon. This is primarily because an overfitting model has already learned the intricate details and patterns of the dataset. Applying generalization techniques on this kind of a model will lead to a reduction of model complexity and hence prevent overfitting. In addition to that, the model will be able to predict more accurately on unseen, but similar data.Generalization Techniques: There are no separate Generalization techniques as such, but it can easily be achieved if a model performs equally well in both training and validation data. Hence, it can be said that if we apply the techniques to prevent overfitting (eg. Regularization, Ensembling, etc.) on a model that has properly acquired the complex patterns, then a successful generalization of some degree can be achieved.Relationship between Overfitting and Underfitting with Bias-Variance TradeoffBias-Variance Tradeoff: Bias denotes the simplicity of the model. A high biased model will have a simpler architecture than that of a model with a lower bias. Similarly, complementing Bias, Variance denotes how complex the model is and how well it can fit the data with a high degree of diversity.An ideal model should have Low Bias and Low Variance. However, when it comes to practical datasets and models, it is nearly impossible to achieve a “zero” Bias and Variance. These two are complementary of each other, if one decreases beyond a certain limit, then the other starts increasing. This is known as the Bias-Variance Tradeoff. Under such circumstances, there is a “sweet spot” as shown in the figure, where both bias and variance are at their optimal values.Bias-Variance and Generalization: As it is clear from the above graph, the Bias and Variance are linked to Underfitting and Overfitting.  A model with high Bias means the model is Underfitting the given data and a model with High Variance means the model is Overfitting the given data.Hence, as it can be seen, at the optimal region of the Bias-Variance tradeoff, the model is neither underfitting nor overfitting. Hence, since there is neither underfitting nor overfitting, it can also be said that the model is most Generalized, as under these conditions the model is expected to perform equally well on Training and Validation Data. Thus, the graph depicts that the Generalization Error is minimum at the optimal value of the degree of Bias and Variance.ConclusionTo summarize, the learning capabilities of a model depend on both, model complexity and data diversity. Hence, it is necessary to keep a balance between both such that the Machine Learning Models thus trained can perform equally well when deployed in the real world.In most cases, Overfitting and Underfitting can be taken care of in order to determine the most appropriate model for the given dataset. However, even though there are certain rule-based steps that can be followed to improve a model, the insight to achieve a properly Generalized model comes with experience.
Rated 4.5/5 based on 3 customer reviews
5206
Overfitting and Underfitting With Algorithms in Ma...

Curve fitting is the process of determining the be... Read More

What is Bias-Variance Tradeoff in Machine Learning

What is Machine Learning? Machine Learning is a multidisciplinary field of study, which gives computers the ability to solve complex problems, which otherwise would be nearly impossible to be hand-coded by a human being. Machine Learning is a scientific field of study which involves the use of algorithms and statistics to perform a given task by relying on inference from data instead of explicit instructions. Machine Learning Process:The process of Machine Learning can be broken down into several parts, most of which is based around “Data”. The following steps show the Machine Learning Process. 1. Gathering Data from various sources: Since Machine Learning is basically the inference drawn from data before any algorithm can be used, data needs to be collected from some source. Data collected can be of any form, viz. Video data, Image data, Audio data, Text data, Statistical data, etc. 2. Cleaning data to have homogeneity: The data that is collected from various sources does not always come in the desired form. More importantly, data contains various irregularities like Missing data and Outliers.These irregularities may cause the Machine Learning Model(s) to perform poorly. Hence, the removal or processing of irregularities is necessary to promote data homogeneity. This step is also known as data pre-processing. 3. Model Building & Selecting the right Machine Learning Model: After the data has been correctly pre-processed, various Machine Learning Algorithms (or Models) are applied on the data to train the model to predict on unseen data, as well as to extract various insights from the data. After various models are “trained” to the data, the best performing model(s) that suit the application and the performance criteria are selected.4. Getting Insights from the model’s results: Once the model is selected, further data is used to validate the performance and accuracy of the model and get insights as to how the model performs under various conditions. 5. Data Visualization: This is the final step, where the model is used to predict unseen and real-world data. However, these predictions are not directly understandable to the user, and hence, data Visualization or converting the results into understandable visual graphs is necessary. At this stage, the model can be deployed to solve real-world problems.How is Machine Learning different from Curve Fitting? To get the similarities out of the way, both, Machine Learning and Curve Fitting rely on data to infer a model which, ideally, fits the data perfectly. The difference comes in the availability of the data. Curve Fitting is carried out with data, all of which is already available to the user. Hence, there is no question of the model to encounter unseen data.However, in Machine Learning, only a part of the data is available to the user at the time of training (fitting) the model, and then the model has to perform equally well on data that it has never encountered before. Which is, in other words, the generalization of the model over a given data, such that it is able to correctly predict when it is deployed.A high-level introduction to Bias and Variance through illustrative and applied examples Let’s initiate the idea of Bias and Variance with a case study. Let’s assume a simple dataset of predicting the price of a house based on its carpet area. Here, the x-axis represents the carpet area of the house, and the y-axis represents the price of the property. The plotted data (in a 2D graph) is shown in the graph below: The goal is to build a model to predict the price of the house, given the carpet area of the property. This is a rather easy problem to solve and can easily be achieved by fitting a curve to the given data points. But, for the time being, let’s concentrate on solving the same using Machine Learning.In order to keep this example simple and concentrate on Bias and Variance, a few assumptions are made:Adequate data is present in order to come up with a working model capable of making relatively accurate predictions.The data is homogeneous in nature and hence no major pre-processing steps are involved.There are no missing values or outliers, and hence they do not interfere with the outcome in any way. The y-axis data-points are independent of the order of the sequence of the x-axis data-points.With the above assumptions, the data is processed to train the model using the following steps: 1. Shuffling the data: Since the y-axis data-points are independent of the order of the sequence of the x-axis data-points, the dataset is shuffled in a pseudo-random manner. This is done to avoid unnecessary patterns from being learned by the model. During the shuffling, it is imperative to keep each x-y pair data point constant. Mixing them up will change the dataset itself and the model will learn inaccurate patterns. 2. Data Splitting: The dataset is split into three categories: Training Set (60%), Validation Set (20%), and Testing Set (20%). These three sets are used for different purposes:Training Set - This part of the dataset is used to train the model. It is also known as the Development Set. Validation Set - This is separate from the Training Set and is only used for model selection. The model does not train or learn from this part of the dataset.Testing Set - This part of the dataset is used for performance evaluation and is completely independent of the Training or Validation Sets. Similar to the Validation Set, the model does not train on this part of the dataset.3. Model Selection: Several Machine Learning Models are applied to the Training Set and their Training and Validation Losses are determined, which then helps determine the most appropriate model for the given dataset.During this step, we assume that a polynomial equation fits the data correctly. The general equation is given below: The process of “Training” mathematically is nothing more than figuring out the appropriate values for the parameters: a0, a1, ... ,an, which is done automatically by the model using the Training Set.The developer does have control over how high the degree of the polynomial can be. These parameters that can be tuned by the developer are called Hyperparameters. These hyperparameters play a key role in deciding how well would the model learn and how generalized will the learned parameters be. Given below are two graphs representing the prediction of the trained model on training data. The graph on the left represents a linear model with an error of 3.6, and the graph on the right represents a polynomial model with an error of 1.7. By looking at the errors, it can be concluded that the polynomial model performs significantly better when compared to the linear model (Lower the error, better is the performance of the model). However, when we use the same trained models on the Testing Set, the models perform very differently. The graph on the left represents the same linear model’s prediction on the Testing Set, and the graph on the right side represents the Polynomial model’s prediction on the Testing Set. It is clearly visible that the Polynomial model inaccurately predicts the outputs when compared to the Linear model.In terms of error, the total error for the Linear model is 3.6 and for the Polynomial model is a whopping 929.12. Such a big difference in errors between the Training and Testing Set clearly signifies that something is wrong with the Polynomial model. This drastic change in error is due to a phenomenon called Bias-Variance Tradeoff.What is “Error” in Machine Learning? Error in Machine Learning is the difference in the expected output and the predicted output of the model. It is a measure of how well the model performs over a given set of data.There are several methods to calculate error in Machine Learning. One of the most commonly used terminologies to represent the error is called the Loss/Cost Function. It is also known as the Mean Squared Error (or MSE) and is given by the following equation:The necessity of minimization of Errors: As it is obvious from the previously shown graphs, the higher the error, the worse the model performs. Hence, the error of the prediction of a model can be considered as a performance measure: Lower the error of a model, the better it performs. In addition to that, a model judges its own performance and trains itself based on the error created between its own output and the expected output. The primary target of the model is to minimize the error so as to get the best parameters that would fit the data perfectly. Total Error: The error mentioned above is the Total Error and consists of three types of errors: Bias + Variance + Irreducible Error. Total Error = Bias + Variance + Irreducible ErrorEven for an ideal model, it is impossible to get rid of all the types of errors. The “irreducible” error rate is caused by the presence of noise in the data and hence is not removable. However, the Bias and Variance errors can be reduced to a minimum and hence, the total error can also be reduced significantly. Why is the splitting of data important? Ideally, the complete dataset is not used to train the model. The dataset is split into three sets: Training, Validation and Testing Sets. Each of these serves a specific role in the development of a model which performs well under most conditions.Training Set (60-80%): The largest portion of the dataset is used for training the Machine Learning Model. The model extracts the features and learns to recognize the patterns in the dataset. The quality and quantity of the training set determines how well the model is going to perform. Testing Set (15-25%): The main goal of every Machine Learning Engineer is to develop a model which would generalize the best over a given dataset. This is achieved by training the model(s) on a portion of the dataset and testing its performance by applying the trained model on another portion of the same/similar dataset that has not been used during training (Testing Set). This is important since the model might perform too well on the training set, but perform poorly on unseen data, as was the case with the example given above. Testing set is primarily used for model performance evaluation.Validation Set (15-25%): In addition to the above, because of the presence of more than one Machine Learning Algorithm (model), it is often not recommended to test the performance of multiple models on the same dataset and then choose the best one. This process is called Model Selection, and for this, a separate part of the training set is used, which is also known as Validation Set. A validation set behaves similar to a testing set but is primarily used in model selection and not in performance evaluation.Bias and Variance - A Technical Introduction What is Bias?Bias is used to allow the Machine Learning Model to learn in a simplified manner. Ideally, the simplest model that is able to learn the entire dataset and predict correctly on it is the best model. Hence, bias is introduced into the model in the view of achieving the simplest model possible.Parameter based learning algorithms usually have high bias and hence are faster to train and easier to understand. However, too much bias causes the model to be oversimplified and hence underfits the data. Hence these models are less flexible and often fail when they are applied on complex problems.Mathematically, it is the difference between the model’s average prediction and the expected value.What is Variance?Variance in data is the variability of the model in a case where different Training Data is used. This would significantly change the estimation of the target function. Statistically, for a given random variable, Variance is the expectation of squared deviation from its mean. In other words, the higher the variance of the model, the more complex the model is and it is able to learn more complex functions. However, if the model is too complex for the given dataset, where a simpler solution is possible, a model with high Variance causes the model to overfit. When the model performs well on the Training Set and fails to perform on the Testing Set, the model is said to have Variance.Characteristics of a biased model A biased model will have the following characteristics:Underfitting: A model with high bias is simpler than it should be and hence tends to underfit the data. In other words, the model fails to learn and acquire the intricate patterns of the dataset. Low Training Accuracy: A biased model will not fit the Training Dataset properly and hence will have low training accuracy (or high training loss). Inability to solve complex problems: A Biased model is too simple and hence is often incapable of learning complex features and solving relatively complex problems.Characteristics of a model with Variance A model with high Variance will have the following characteristics:Overfitting: A model with high Variance will have a tendency to be overly complex. This causes the overfitting of the model.Low Testing Accuracy: A model with high Variance will have very high training accuracy (or very low training loss), but it will have a low testing accuracy (or a low testing loss). Overcomplicating simpler problems: A model with high variance tends to be overly complex and ends up fitting a much more complex curve to a relatively simpler data. The model is thus capable of solving complex problems but incapable of solving simple problems efficiently.What is Bias-Variance Tradeoff? From the understanding of bias and variance individually thus far, it can be concluded that the two are complementary to each other. In other words, if the bias of a model is decreased, the variance of the model automatically increases. The vice-versa is also true, that is if the variance of a model decreases, bias starts to increase.Hence, it can be concluded that it is nearly impossible to have a model with no bias or no variance since decreasing one increases the other. This phenomenon is known as the Bias-Variance TradeA graphical introduction to Bias-Variance Tradeoff In order to get a clear idea about the Bias-Variance Tradeoff, let us consider the bulls-eye diagram. Here, the central red portion of the target can be considered the location where the model correctly predicts the values. As we move away from the central red circle, the error in the prediction starts to increase. Each of the several hits on the target is achieved by repetition of the model building process. Each hit represents the individual realization of the model. As can be seen in the diagram below, the bias and the variance together influence the predictions of the model under different circumstances.Another way of looking at the Bias-Variance Tradeoff graphically is to plot the graphical representation for error, bias, and variance versus the complexity of the model. In the graph shown below, the green dotted line represents variance, the blue dotted line represents bias and the red solid line represents the error in the prediction of the concerned model. Since bias is high for a simpler model and decreases with an increase in model complexity, the line representing bias exponentially decreases as the model complexity increases. Similarly, Variance is high for a more complex model and is low for simpler models. Hence, the line representing variance increases exponentially as the model complexity increases. Finally, it can be seen that on either side, the generalization error is quite high. Both high bias and high variance lead to a higher error rate. The most optimal complexity of the model is right in the middle, where the bias and variance intersect. This part of the graph is shown to produce the least error and is preferred. Also, as discussed earlier, the model underfits for high-bias situations and overfits for high-variance situations.Mathematical Expression of Bias-Variance Tradeoff The expected values is a vector represented by y. The predicted output of the model is denoted by the vector y for input vector x. The relationship between the predicted values and the inputs can be taken as y = f(x) + e, where e is the normally distributed error given by:The third term in the above equation, irreducible_error represents the noise term and cannot be fundamentally reduced by any given model. If hypothetically, infinite data is available, it is possible to tune the model to reduce the bias and variance terms to zero but is not possible to do so practically. Hence, there is always a tradeoff between the minimization of bias and variance. Detection of Bias and Variance of a modelIn model building, it is imperative to have the knowledge to detect if the model is suffering from high bias or high variance. The methods to detect high bias and variance is given below:Detection of High Bias:The model suffers from a very High Training Error.The Validation error is similar in magnitude to the training error.The model is underfitting.Detection of High Variance:The model suffers from a very Low Training Error.The Validation error is very high when compared to the training error.The model is overfitting.A graphical method to Detect a model suffering from High Bias and Variance is shown below: The graph shows the change in error rate with respect to model complexity for training and validation error. The left portion of the graph suffers from High Bias. This can be seen as the training error is quite high along with the validation error. In addition to that, model complexity is quite low. The right portion of the graph suffers from High Variance. This can be seen as the training error is very low, yet the validation error is very high and starts increasing with increasing model complexity.A systematic approach to solve a Bias-Variance Problem by Dr. Andrew Ng:Dr. Andrew Ng proposed a very simple-to-follow step by step architecture to detect and solve a High Bias and High Variance errors in a model. The block diagram is shown below:Detection and Solution to High Bias problem - if the training error is high: Train longer: High bias means a usually less complex model, and hence it requires more training iterations to learn the relevant patterns. Hence, longer training solves the error sometimes.Train a more complex model: As mentioned above, high bias is a result of a less than optimal complexity in the model. Hence, to avoid high bias, the existing model can be swapped out with a more complex model. Obtain more features: It is often possible that the existing dataset lacks the required essential features for effective pattern recognition. To remedy this problem: More features can be collected for the existing data.Feature Engineering can be performed on existing features to extract more non-linear features. Decrease regularization: Regularization is a process to decrease model complexity by regularizing the inputs at different stages, promote generalization and prevent overfitting in the process. Decreasing regularization allows the model to learn the training dataset better. New model architecture: If all of the above-mentioned methods fail to deliver satisfactory results, then it is suggested to try out other new model architectures. Detection and Solution to High Variance problem - if a validation error is high: Obtain more data: High variance is often caused due to a lack of training data. The model complexity and quantity of training data need to be balanced. A model of higher complexity requires a larger quantity of training data. Hence, if the model is suffering from high variance, more datasets can reduce the variance. Decrease number of features: If the dataset consists of too many features for each data-point, the model often starts to suffer from high variance and starts to overfit. Hence, decreasing the number of features is recommended. Increase Regularization: As mentioned above, regularization is a process to decrease model complexity. Hence, if the model is suffering from high variance (which is caused by a complex model), then an increase in regularization can decrease the complexity and help to generalize the model better.New model architecture: Similar to the solution of a model suffering from high bias, if all of the above-mentioned methods fail to deliver satisfactory results, then it is suggested to try out other new model architectures.Conclusion To summarize, Bias and Variance play a major role in the training process of a model. It is necessary to reduce each of these parameters individually to the minimum possible value. However, it should be kept in mind that an effort to decrease one of these parameters beyond a certain limit increases the probability of the other getting increased. This phenomenon is called as the Bias-Variance Tradeoff and is a parameter to consider during model building. 
Rated 4.5/5 based on 1 customer reviews
7907
What is Bias-Variance Tradeoff in Machine Learning

What is Machine Learning? Machine Learning is a m... Read More

Machine Learning Algorithms: [With Essentials, Principles, Types & Examples covered]

The advancements in Science and Technology are making every step of our daily life more comfortable. Today, the use of Machine learning systems, which is an integral part of Artificial Intelligence, has spiked and is seen playing a remarkable role in every user’s life. For instance, the widely popular, Virtual Personal Assistant being used for playing a music track or setting an alarm, face detection or voice recognition applications are the awesome examples of the machine learning systems that we see everyday. Machine learning, a subset of artificial intelligence, is the ability of a system to learn or predict the user’s needs and perform an expected task without human intervention. The inputs for the desired predictions are taken from user’s previously performed tasks or from relative examples.Why should you choose Machine Learning?Wonder why one should choose Machine Learning? Simply put, machine learning makes complex tasks much easier.  It makes the impossible possible!The following scenarios explain why we should opt for machine learning:During facial recognition and speech processing, it would be tedious to write the codes manually to execute the process, that's where machine learning comes handy.For market analysis, figuring customer preferences or fraud detection, machine learning has become essential.For the dynamic changes that happen in real-time tasks, it would be a challenging ordeal to solve through human intervention alone.Essentials of Machine Learning AlgorithmsTo state simply, machine learning is all about predictions – a machine learning, thinking and predicting what’s next. Here comes the question – what will a machine learn, how will a machine analyze, what will it predict.You have to understand two terms clearly before trying to get answers to these questions:DataAlgorithmDataData is what that is fed to the machine. For example, if you are trying to design a machine that can predict the weather over the next few days, then you should input the past ‘data’ that comprise maximum and minimum air temperatures, the speed of the wind, amount of rainfall, etc. All these come under ‘data’ that your machine will learn, and then analyse later.If we observe carefully, there will always be some pattern or the other in the input data we have. For example, the maximum and minimum ranges of temperatures may fall in the same bracket; or speeds of the wind may be slightly similar for a given season, etc. But, machine learning helps analyse such patterns very deeply. And then it predicts the outcomes of the problem we have designed it for.AlgorithmWhile data is the ‘food’ to the machine, an algorithm is like its digestive system. An algorithm works on the data. It crushes it; analyses it; permutates it; finds the gaps and fills in the blanks.Algorithms are the methods used by machines to work on the data input to them.What to consider before finalizing a Machine Learning algorithm?Depending on the functionality expected from the machine, algorithms range from very basic to highly complex. You should be wise in making a selection of an algorithm that suits your ML needs. Careful consideration and testing are needed before finalizing an algorithm for a purpose.For example, linear regression works well for simple ML functions such as speech analysis. In case, accuracy is your first choice, then slightly higher level functionalities such as Neural networks will do.This concept is called ‘The Explainability- Accuracy Tradeoff’. The following diagram explains this better:Image SourceBesides, with regards to machine learning algorithms, you need to remember the following aspects very clearly:No algorithm is an all-in-one solution to any type of problem; an algorithm that fits a scenario is not destined to fit in another one.Comparison of algorithms mostly does not make sense as each one of it has its own features and functionality. Many factors such as the size of data, data patterns, accuracy needed, the structure of the dataset, etc. play a major role in comparing two algorithms.The Principle behind Machine Learning AlgorithmsAs we learnt, an algorithm churns the given data and finds a pattern among them. Thus, all machine learning algorithms, especially the ones used for supervised learning, follow one similar principle:If the input variables or the data is X and you expect the machine to give a prediction or output Y, the machine will work on as per learning a target function ‘f’, whose pattern is not known to us.Thus, Y= f(X) fits well for every supervised machine learning algorithm. This is otherwise also called Predictive Modeling or Predictive Analysis, which ultimately provides us with the best ever prediction possible with utmost accuracy.Types of Machine Learning AlgorithmsDiving further into machine learning, we will first discuss the types of algorithms it has. Machine learning algorithms can be classified as:Supervised, andUnsupervisedSemi-supervised algorithmsReinforcement algorithmsA brief description of the types of  algorithms is given below:1. Supervised machine learning algorithmsIn this method, to get the output for a new set of user’s input, a model is trained to predict the results by using an old set of inputs and its relative known set of outputs. In other words, the system uses the examples used in the past.A data scientist trains the system on identifying the features and variables it should analyze. After training, these models compare the new results to the old ones and update their data accordingly to improve the prediction pattern.An example: If there is a basket full of fruits, based on the earlier specifications like color, shape and size given to the system, the model will be able to classify the fruits.There are 2 techniques in supervised machine learning and a technique to develop a model is chosen based on the type of data it has to work on.A) Techniques used in Supervised learningSupervised algorithms use either of the following techniques to develop a model based on the type of data.RegressionClassification1. Regression Technique In a given dataset, this technique is used to predict a numeric value or continuous values (a range of numeric values) based on the relation between variables obtained from the dataset.An example would be guessing the price of a house based after a year, based on the current price, total area, locality and number of bedrooms.Another example is predicting the room temperature in the coming hours, based on the volume of the room and current temperature.2. Classification Technique This is used if the input data can be categorized based on patterns or labels.For example, an email classification like recognizing a spam mail or face detection which uses patterns to predict the output.In summary, the regression technique is to be used when predictable data is in quantity and Classification technique is to be used when predictable data is about predicting a label.B) Algorithms that use Supervised LearningSome of the machine learning algorithms which use supervised learning method are:Linear RegressionLogistic RegressionRandom ForestGradient Boosted TreesSupport Vector Machines (SVM)Neural NetworksDecision TreesNaive BayesWe shall discuss some of these algorithms in detail as we move ahead in this post.2. Unsupervised machine learning algorithmsThis method does not involve training the model based on old data, I.e. there is no “teacher” or “supervisor” to provide the model with previous examples.The system is not trained by providing any set of inputs and relative outputs.  Instead, the model itself will learn and predict the output based on its own observations.For example, consider a basket of fruits which are not labeled/given any specifications this time. The model will only learn and organize them by comparing Color, Size and shape.A. Techniques used in unsupervised learningWe are discussing these techniques used in unsupervised learning as under:ClusteringDimensionality ReductionAnomaly detectionNeural networks1. ClusteringIt is the method of dividing or grouping the data in the given data set based on similarities.Data is explored to make groups or subsets based on meaningful separations.Clustering is used to determine the intrinsic grouping among the unlabeled data present.An example where clustering principle is being used is in digital image processing where this technique plays its role in dividing the image into distinct regions and identifying image border and the object.2. Dimensionality reductionIn a given dataset, there can be multiple conditions based on which data has to be segmented or classified.These conditions are the features that the individual data element has and may not be unique.If a dataset has too many numbers of such features, it makes it a complex process to segregate the data.To solve such type of complex scenarios, dimensional reduction technique can be used, which is a process that aims to reduce the number of variables or features in the given dataset without loss of important data.This is done by the process of feature selection or feature extraction.Email Classification can be considered as the best example where this technique was used.3. Anomaly DetectionAnomaly detection is also known as Outlier detection.It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.Examples of the usage are identifying a structural defect, errors in text and medical problems.4. Neural NetworksA Neural network is a framework for many different machine learning algorithms to work together and process complex data inputs.It can be thought of as a “complex function” which gives some output when an input is given.The Neural Network consists of 3 parts which are needed in the construction of the model.Units or NeuronsConnections or Parameters.Biases.Neural networks are into a wide range of applications such as coastal engineering, hydrology and medicine where they are being used in identifying certain types of cancers.B. Algorithms that use unsupervised learningSome of the most common algorithms in unsupervised learning are:hierarchical clustering,k-meansmixture modelsDBSCANOPTICS algorithmAutoencodersDeep Belief NetsHebbian LearningGenerative Adversarial NetworksSelf-organizing mapWe shall discuss some of these algorithms in detail as we move ahead in this post.3.Semi Supervised AlgorithmsIn case of semi-supervised algorithms, as the name goes, it is a mix of both supervised and unsupervised algorithms. Here both labelled and unlabelled examples exist, and in many scenarios of semi-supervised learning, the count of unlabelled examples is more than that of labelled ones.Classification and regression form typical examples for semi-supervised algorithms.The algorithms under semi-supervised learning are mostly extensions of other methods, and the machines that are trained in the semi-supervised method make assumptions when dealing with unlabelled data.Examples of Semi Supervised Learning:Google Photos are the best example of this model of learning. You must have observed that at first, you define the user name in the picture and teach the features of the user by choosing a few photos. Then the algorithm sorts the rest of the pictures accordingly and asks you in case it gets any doubts during classification.Comparing with the previous supervised and unsupervised types of learning models, we can make the following inferences for semi-supervised learning:Labels are entirely present in case of supervised learning, while for unsupervised learning they are totally absent. Semi-supervised is thus a hybrid mix of both these two.The semi-supervised model fits well in cases where cost constraints are present for machine learning modelling. One can label the data as per cost requirements and leave the rest of the data to the machine to take up.Another advantage of semi-supervised learning methods is that they have the potential to exploit the unlabelled data of a group in cases where data carries important unexploited information.4. Reinforcement LearningIn this type of learning, the machine learns from the feedback it has received. It constantly learns and upgrades its existing skills by taking the feedback from the environment it is in.Markov’s Decision process is the best example of reinforcement learning.In this mode of learning, the machine learns iteratively the correct output. Based on the reward obtained from each iteration,the machine knows what is right and what is wrong. This iteration keeps going till the full range of probable outputs are covered.Process of Reinforcement LearningThe steps involved in reinforcement learning are as shown below:Input state is taken by the agentA predefined function indicates the action to be performedBased on the action, the reward is obtained by the machineThe resulting pair of feedback and action is stored for future purposesExamples of Reinforcement Learning AlgorithmsComputer based games such as chessArtificial hands that are based on roboticsDriverless cars/ self-driven carsMost Used Machine Learning Algorithms - ExplainedIn this section, let us discuss the following most widely used machine learning algorithms in detail:Decision TreesNaive Bayes ClassificationThe AutoencoderSelf-organizing mapHierarchical clusteringOPTICS algorithm1. Decision TreesThis algorithm is an example of supervised learning.A Decision tree is a pictorial representation or a graphical representation which depicts every possible outcome of a decision.The various elements involved here are node, branch and leaf where ‘node’ represents an ‘attribute’, ‘branch’ representing a ‘decision’ and ‘leaf’ representing an ‘outcome’ of the feature after applying that particular decision.A decision tree is just an analogy of how a human thinks to take a decision with yes/no questions.The below decision tree explains a school admission procedure rule, where Age is primarily checked, and if age is < 5, admission is not given to them. And for the kids who are eligible for admission, a check is performed on Annual income of parents where if it is < 3 L p.a. the students are further eligible to get a concession on the fees.2. Naive Bayes ClassificationThis supervised machine learning algorithm is a powerful and fast classifying algorithm, using the Bayes rule in determining the conditional probability and to predict the results.Its popular uses are, face recognition, filtering spam emails, predicting the user inputs in chat by checking communicated text and to label news articles as sports, politics etc.Bayes Rule: The Bayes theorem defines a rule in determining the probability of occurrence of an “Event” when information about “Tests” is provided.“Event” can be considered as the patient having a Heart disease while “tests” are the positive conditions that match with the event3. The AutoencoderIt comes under the category of unsupervised learning using neural networking techniques.An autoencoder is intended to learn or encode a representation for a given data set.This also involves the process of dimensional reduction which trains the network to remove the "noise" signal.In hand, with the reduction, it also works in reconstruction where the model tries to rebuild or generate a representation from the reduced encoding which is equivalent to the original input.I.e. without the loss of important and needed information from the given input, an Autoencoder removes or ignores the unnecessary noise and also works on rebuilding the output.Pic sourceThe most common use of Autoencoder is an application that converts black and white image to color. Based on the content and object in the image (like grass, water, sky, face, dress) coloring is processed.4. Self-organizing mapThis comes under the unsupervised learning method.Self-Organizing Map uses the data visualization technique by operating on a given high dimensional data.The Self-Organizing Map is a two-dimensional array of neurons: M = {m1,m2,......mn}It reduces the dimensions of the data to a map, representing the clustering concept by grouping similar data together.SOM reduces data dimensions and displays similarities among data.SOM uses clustering technique on data without knowing the class memberships of the input data where several units compete for the current object.In short, SOM converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display.5. Hierarchical clusteringHierarchical clustering uses one of the below clustering techniques to determine a hierarchy of clusters.Thus produced hierarchy resembles a tree structure which is called a “Dendrogram”.The techniques used in hierarchical clustering are:K-Means,DBSCAN,Gaussian Mixture Models.The 2 methods in finding hierarchical clusters are:Agglomerative clusteringDivisive clusteringAgglomerative clusteringThis is a bottom-up approach, where each data point starts in its own cluster.These clusters are then joined greedily, by taking the two most similar clusters together and merging them.Divisive clusteringInverse to Agglomerative, this uses a top-down approach, wherein all data points start in the same cluster after which a parametric clustering algorithm like K-Means is used to divide the cluster into two clusters.Each cluster is further divided into two clusters until a desired number of clusters are hit.6. OPTICS algorithmOPTICS is an abbreviation for ordering points to identify the clustering structure.OPTICS works in principle like an extended DB Scan algorithm for an infinite number for a distance parameter which is smaller than a generating distance.From a wide range of parameter settings, OPTICS outputs a linear list of all objects under analysis in clusters based on their density.How to Choose Machine Learning Algorithms in Real TimeWhen implementing algorithms in real time, you need to keep in mind three main aspects: Space, Time, and Output.Besides, you should clearly understand the aim of your algorithm:Do you want to make predictions for the future?Are you just categorizing the given data?Is your targeted task simple or comprises of multiple sub-tasks?The following table will show you certain real-time scenarios and help you to understand which algorithm is best suited to each scenario:Real time scenarioBest suited algorithmWhy this algorithm is the best fit?Simple straightforward data set with no complex computationsLinear RegressionIt takes into account all factors involved and predicts the result with simple error rate explanation.For simple computations, you need not spend much computational power; and linear regression runs with minimal computational power.Classifying already labeled data into sub-labelsLogistic RegressionThis algorithm looks at every data point into two subcategories, hence best for sub-labeling.Logistic regression model works best when you have multiple targets.Sorting unlabelled data into groupsK-Means clustering algorithmThis algorithm groups and clusters data by measuring the spatial distance between each point.You can choose from its sub-types - Mean-Shift algorithm and Density-Based Spatial Clustering of Applications with NoiseSupervised text classification (analyzing reviews, comments, etc.)Naive BayesSimplest model that can perform powerful pre-processing and cleaning of textRemoves filler stop words effectivelyComputationally in-expensiveLogistic regressionSorts words one by one and assigns a probabilityRanks next to Naïve Bayes in simplicityLinear Support Vector Machine algorithmCan be chosen when performance mattersBag-of-words modelSuits best when vocabulary and the measure of known words is known.Image classificationConvolutional neural networkBest suited for complex computations such as analyzing visual cortexesConsumes more computational power and gives the best resultsStock market predictionsRecurrent neural networkBest suited for time-series analysis with well-defined and supervised data.Works efficiently in taking into account the relation between data and its time distribution.How to Run Machine Learning Algorithms?Till now you have learned in detail about various algorithms of machine learning, their features, selection and application in real time.When implementing the algorithm in real time, you can do it in any programming language that works well for machine learning.All that you need to do is use the standard libraries of the programming language that you have chosen and work on them, or program everything from scratch.Need more help? You can check these links for more clarity on coding machine learning algorithms in various programming languages.How To Get Started With Machine Learning Algorithms in RHow to Run Your First Classifier in WekaMachine Learning Algorithm Recipes in scikit-learnWhere do we stand in Machine Learning?Machine learning is slowly making strides into as many fields in our daily life as possible. Some businesses are making it strict to have transparent algorithms that do not affect their business privacy or data security. They are even framing regulations and performing audit trails to check if there is any discrepancy in the above-said data policies.The point to note here is that a machine working on machine learning principles and algorithms give output after processing the data through many nonlinear computations. If one needs to understand how a machine predicts, perhaps it can be possible only through another machine learning algorithm!Applications of Machine LearningCurrently, the role of Machine learning and Artificial Intelligence in human life is intertwined. With the advent of evolving technologies, AI and ML have marked their existence in all possible aspects.Machine learning finds a plethora of applications in several domains of our day to day life. An exhaustive list of fields where machine learning is currently in use now is shown in the diagram here. An explanation for the same follows further below:Financial Services: Banks and financial services are increasingly relying on machine learning to identify financial fraud, portfolio management, identify and suggest good options for investment for customers.Police Department: Apps based on facial recognition and other techniques of machine learning are being used by the police to identify and get hold of criminals.Online Marketing and Sales: Machine learning is helping companies a great deal in studying the shopping and spending patterns of customers and in making personalized product recommendations to them. Machine learning also eases customer support, product recommendations and advertising ideas for e-commerce.Healthcare: Doctors are using machine learning to predict and analyze the health status and disease progress of patients. Machine learning has proven its accuracy in detecting health condition, heartbeat, blood pressure and in identifying certain types of cancer. Advanced techniques of machine learning are being implemented in robotic surgery too.Household Applications: Household appliances that use face detection and voice recognition are gaining popularity as security devices and personal virtual assistants at homes.Oil and Gas: In analyzing underground minerals and carrying out the exploration and mining, geologists and scientists are using machine learning for improved accuracy and reduced investments.Transport: Machine learning can be used to identify the vehicles that are moving in prohibited zones for traffic control and safety monitoring purposes.Social Media: In social media, spam is a big nuisance. Companies are using machine learning to filter spam. Machine learning also aptly solves the purpose of sentiment analysis in social media.Trading and Commerce: Machine learning techniques are being implemented in online trading to automate the process of trading. Machines learn from the past performances of trading and use this knowledge to make decisions about future trading options.Future of Machine LearningMachine learning is already making a difference in the way businesses are offering their services to us, the customers. Voice-based search and preferences based ads are just basic functionalities of how machine learning is changing the face of businesses.ML has already made an inseparable mark in our lives. With more advancement in various fields, ML will be an integral part of all AI systems. ML algorithms are going to be made continuously learning with the day-to-day updating information.With the rapid rate at which ongoing research is happening in this field, there will be more powerful machine learning algorithms to make the way we live even more sophisticated!From 2013- 2017, the patents in the field of machine learning has recorded a growth of 34%, according to IFI Claims Patent Services (Patent Analytics). Also, 60% of the companies in the world are using machine learning for various purposes.A peek into the future trends and growth of machine learning through the reports of Predictive Analytics and Machine Learning (PAML) market shows a 21% CAGR by 2021.ConclusionUltimately, machine learning should be designed as an aid that would support mankind. The notion that automation and machine learning are threats to jobs and human workforce is pretty prevalent. It should always be remembered that machine learning is just a technology that has evolved to ease the life of humans by reducing the needed manpower and to offer increased efficiency at lower costs that too in a shorter time span. The onus of using machine learning in a responsible manner lies in the hands of those who work on/with it.However, stay tuned to an era of artificial intelligence and machine learning that makes the impossible possible and makes you witness the unseen!AI is likely to be the best thing or the worst thing to happen to humanity. – Stephen Hawking
Rated 4.5/5 based on 16 customer reviews
9998
Machine Learning Algorithms: [With Essentials, Pri...

The advancements in Science and Technology are mak... Read More

What is Machine Learning and Why It Matters: Everything You Need to Know

If you are a machine learning enthusiast and stay in touch with the latest developments, you would have definitely come across the news “Machine learning identifies links between the world's oceans”. Wait, we all know how complex it would be to analyse a concept such as oceans and their behaviour which would undoubtedly involve billions of data points associated with many critical parameters such as wind velocities, temperatures, earth’s rotation and many such. Doesn’t this piece of information gives you a glimpse of the wondrous possibilities of machine learning and its potential uses? And this is just a drop in the ocean!As you move across this post, you would get a comprehensive idea of various aspects that you ought to know about machine learning.What is Machine Learning and Why It Matters?Machine learning is a segment of artificial intelligence. It is designed to make computers learn by themselves and perform operations without human intervention, when they are exposed to new data. It means a computer or a system designed with machine learning will identify, analyse and change accordingly and give the expected output when it comes across a new pattern of data, without any need of humans.The power behind machine learning’s self-identification and analysis of new patterns, lies in the complex and powerful ‘pattern recognition’ algorithms that guide them in where to look for what. Thus, the demand for machine learning programmers who have extensive knowledge on working with complex mathematical calculations and applying them to big data and AI is growing year after year.Machine learning, though a buzz word only since recent times, has conceptually been in existence since World War II when Alan Turing’s Bombe, an enigma deciphering machine was introduced to the world. However, it's only in the past decade or so that there has been such great progress made in context to machine learning and its uses, driven mainly by our quest for making this world more futuristic  with lesser human intervention and more precision. Pharma, education technology, industries, science and space, digital inventions, maps and navigation, robotics – you name the domain and you will have instances of machine learning innovations made in it.The Timeline of Machine Learning and the Evolution of MachinesVoice activated home appliances, self-driven cars and online marketing campaigns are some of the applications of machine learning that we experience and enjoy the benefit of in our day to day life. However, the development of such amazing inventions date back to decades. Many great mathematicians and futuristic thinkers were involved in the foundation and development of machine learning.A glimpse of the timeline of machine learning reveals many hidden facts and the efforts of great mathematicians and scientists to whom we should attribute all the fruits that we are enjoying today.1812- 1913: The century that laid the foundation of machine learningThis age laid the mathematical foundation for the development of machine learning. Bayes’ theorem and Markovs Chains took birth during this period.Late 1940s: First computers Computers were recognised as machines that can ‘store data’. The famous Manchester Small-Scale Experimental Machine (nicknamed 'The Manchester Baby') belongs to this era.1950: The official Birth of Machine LearningDespite many researches and theoretical studies done prior to this year, it was the year 1950 that is always remembered as the foundation of the machine learning that we are witnessing today. Alan Turing, researcher, mathematician, computer genius and thinker, submitted a paper where he mentioned something called ‘imitation game’ and astonished the world by questioning “Can Machines Think?”. His research grabbed the attention of the BBC which took an exclusive interview with Alan.1951: The First neural networkThe first artificial neural network was built by Marvin Minsky and Dean Edmonds this year. Today, we all know that artificial neural networks play a key role in the thinking process of computers and machines. This should be attributed to the invention made by these two scientists.1974: Coining of the term ‘Machine Learning’Though there were no specific terms till then for the things that machines did by thinking on their own, it was in 1974 that the term ‘machine learning’ was termed. Other words such as artificial intelligence, informatics and computational intelligence were also proposed the same year.1996: Machine beats man in a game of chessIBM developed its own computer called Deep Blue, that can think. This machine beat the world famous champion in chess, Garry Kasparov. It was then proved to the world that machines can really think like humans.2006-2017: Backpropagation, external memory access and AlphaGoBack propagation is an important technique that machines use for image recognition. This technique was developed in this period of time.Besides in 2014, a neural network developed by DeepMind, a British based company, developed a neural network that can access external memory and get things done.In 2016, AlphaGo was designed by DeepMind researchers. It beat the world famous Go players Lee Sedol and Ke Jie and proved that machines have come a long way.What’s next?Scientists are talking about ‘singularity’ –a phenomenon that would occur if humans develop a humanoid robot that could think better than humans and will recreate itself. So far, we have been witnessing how AI is entering our personal lives too in the form of voice activated devices, smart systems and many more. The results of this singularity – we shall have to wait and watch!Basics of Machine LearningTo put it simply, machine learning involves learning by machines. It means computers learn and there are many concepts, methods, algorithms and processes involved in making this happen. Let us try to understand some of the more important machine learning terms.Three concepts – artificial intelligence, machine learning and deep learning – are often thought to be synonymous. Though they belong to the same family, conceptually they are different.Machine LearningIt implies that machines can ‘learn on their own’ and give the output without any need of programming explicitly.Artificial IntelligenceThis term means machines can ‘think on their own’ just like humans and take decisions.Deep LearningThis involves creation of artificial neural networks which can think and act based on algorithms.How do machines learn?Quite simply, machines learn just like humans do. Humans learn from their training, experiences and through teachers. Sometimes they use knowledge that is fed into their brains, or sometimes take decisions by analysing the current situation using their past experiences.Similarly, machines learn from the inputs given to them which tell them which is right and which is wrong. Then they are given data that they would have to analyse based on the training they have received so far. In some other cases, they do not have any idea of which is right or wrong, but just take the decision based on their own experiences. We will analyse the various concepts of learning and the methods involved.How Machine Learning Works?The process of machine learning occurs in five steps as shown in the following diagram.The steps are explained in simple words below:Gathering the data includes data collection from varied, rich and dense content of various formats and types. In real time, this includes feeding the data from different sources such as text files, word documents or excel sheets.Data preparation involves extracting the actual data out of the entire content fed. Only the data that really makes sense to the machine is used for processing. This step also involves checking for missing data, unwanted data and treatment of outliers.Training involves using an appropriate algorithm and modelling the data. The data filtered in the second step is split into two parts and a part of it is used as training data and the second part is used as reference data. The training data is used to create the model.Evaluating the model includes testing its accuracy. To verify its accuracy to the fullest, the model so developed is tested on such data which is not present in the data during the second step.Finally, the performance of the machine is improved by choosing a different model that suits the different type of data that is present altogether. This is the step where the machine thinks and rethinks in selecting the model best suited for various types of data.Examples of Machine LearningThe below examples will help you understand where machine learning is used in real time:Speech RecognitionVoice based searching and call rerouting are the best examples for speech recognition using machine learning. The principle lies in translating  spoken words into text and then segmenting them on the basis of their frequencies.Image RecognitionWe all use this in day to day life in sorting our pictures on our Google drive or Photos. The main technique that is used here is classifying the pictures based on the intensity (in case of black and white pictures) and measurement of intensities of red, blue and green for coloured images.HealthcareVarious diagnoses are increasingly made using machine learning these days. Here, various clinical parameters are input to the machine which makes a prognosis  and then predicts the disease status and other health parameters of the person under study.Financial ServicesMachine learning helps in predicting chances of financial fraud, customer’s credit habits, spending patterns etc. The financial and banking sector is also doing market analysis using machine learning techniques.Machine Learning – MethodsMachine learning is all about machines learning through the inputs provided. This learning is carried out in the following ways:Supervised LearningAs the name says, the machine learns under supervision. Let’s see how this is done:The entire process of learning takes place in the presence or supervision of a teacher.This mode of learning contains basic steps as follows:First, the machine is trained using a predefined data also called ‘labeled’ data.Then, the correct answer is fed into the computer which allows it to understand what the right and wrong answers should be.Lastly, the system is given a new set of data or unlabelled data, which it would now analyse using techniques such as classification and regression to predict the correct outcome for the current unlabelled data.Example:Consider a shape sorting game that kids play. A bunch of different shapes of wooden pieces are given to kids, say of square shape, triangular shape, circular shape and star shape. Assume that all blocks of a similar shape are of a unique colour. First, you teach the kids which shape is what  and then you ask them to do the sorting on their own.Similarly, in machine learning, you teach the machine through labelled data. Then, the machine is given some unknown data, which it analyses based on the previous labelled data and gives the correct outcome.In this case, if you observe, two techniques have been used.Classification: Based on colors.Regression: Based on shapes.As a further explanation,Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”.Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.Unsupervised LearningIn this type of learning, there is no previous knowledge, no previous training, nor a teacher to supervise. This learning is all instantaneous based on the data that is available at that given time.Example:Consider a kid playing with a mix of tomatoes and capsicums. They would sort them involuntarily based on their shape or color. This is an instantaneous reaction without any predefined set of attributes or training.A machine working on unsupervised learning would produce the results based on a similar mechanism. For this purpose, it uses two algorithms as explained below:Clustering: This involves grouping a cluster of data. For example, this is used in analysing the online customer’s purchase patterns and shopping habits.Association: This involves associating the given items based on the portion of their sizes. For example, analysing that people who bought large number of a given item would also prefer other similar items. Semi-supervised LearningThe name itself says the pattern of this algorithm.It is a hybrid mix of both supervised and unsupervised learning and uses both labelled data and unlabelled data to predict the results.In most occurrences, unlabelled data is given more in quantity than labelled data, because of cost considerations.For example, in a folder of thousands of photographs, the machine sorts pictures based on the maximum number of common features (unsupervised) and already defined names of persons in the pictures, if any(supervised)Reinforcement LearningIn reinforcement learning, there is no correct answer known to the system. The system learns from its own experience through a reinforcement agent. Since the answer is not known, the reinforcement agent decides what to do with the given task and for this it uses its experience from the current situation only.Example: In a robotic game that involves earning the hidden treasure, the algorithm focuses on bringing out the best outcome through trial and error method. Mainly three components are observed in this type of learning: the user, the environment and the action the user is performing. The algorithm adjusts itself accordingly to guide the user towards the best result that can be achieved.The diagram shown below summarizes the four types of learning we have learnt so far:Machine Learning – AlgorithmsMachine learning is rich in algorithms that allow programmers to pick one that best suits the context. Some of the machine learning algorithms are:Neural networksDecision treesRandom forestsSupport vector machinesNearest-neighbor mappingk-means clusteringSelf-organizing mapsExpectation maximizationBayesian networksKernel density estimationPrincipal component analysisSingular value decompositionMachine Learning Tools and LibrariesTo start the journey with machine learning, a learner should have knowledge of tools and libraries that are quintessential to designing machine learning code. Here is a list of such tools and libraries:ToolsProgramming LanguageMachine learning can be coded either using R programming language or Python. Of late, Python has become more popular due to its rich libraries, ease of learning and coding friendliness.IDEMachine learning is widely coded in Jupyter Notebook. It simplifies writing of Python code and embedding plots and charts. Google Colab is another free tool that you can choose for the same purpose.LibrariesScikit-LearnA very popular and beginner friendly library.Supports most of the standard algorithms from supervised and unsupervised learning.Offers models for data pre-processing and result analysis.Limited support for deep learning.TensorFlowSupports Neural networks and deep learning.Bulky compared to scikit learnOffers best computational efficiencySupports many classical algorithms of machine learning.PandasThe data gathering and preparation part of machine learning that we have seen in the stages involved in machine learning is taken care of by Pandas. This library:Gathers and prepares data that other libraries of machine learning can use at a later point in time.Gathers data from any type of data source such as text, SQL DB, MS Excel or JSON files.Contains many statistical functionalities that can be used to work on the data that’s gathered.NumPy and SciPyNumPy supports all array based and linear algebraic functions needed while working on data, while SciPy offers many scientific calculations. NumPy is more widely used in many real time applications of machine learning as compared to SciPy.MatplotlibThis is a machine learning library that has an extensive collection of plots and charts. This library is a collection of many other packages. Of them, Seaborn is the most popular and is widely used to work on data structures.PyTorch and KerasThese are known for their usage in Deep learning.PyTorch library is extensively used for Deep Learning. It is known for its amazingly speedy calculations and is very popular among deep learning programmers.Keras uses other libraries such as Tensor flow and is apt for developing neural networks.Machine Learning – ProcessesBesides algorithms, machine learning offers many tools and processes to pair best with big data. Various such processes and tools that are at hand for developers are:Data quality and managementGUIs that ease models and process flowsData exploration in an interactive modeVisualized outputs for modelsChoosing the best learning model by comparisonModel evaluation done automatically that identifies the best performersUser friendly model deployment and data-to-decision processMachine Learning Use CasesHere is a list of five use cases that are based on machine learning:PayPal: The online money transfers giant uses machine learning for detecting any suspicious activities related to financial transactions.Amazon: The company’s Alexa, the digital assistant, is the best example of speech processing application of machine learning. The online retailing giant is also using machine learning to display recommendation to its customers.Facebook: The social media company is using machine learning extensively to filter out spam posts and forwards, and to shred out poor quality content.IBM: The company’s self-driven vehicle uses machine learning in taking a decision whether to give the driving control to a human or computer.Kaspersky: The anti-virus manufacturing company is using machine learning to detect security breaches, or unknown malware threats and also for high quality endpoint security for businesses.Which Industries Use Machine Learning?As we have seen just now, machine learning is being adopted in many industries for the potential advantages it offers. Machine learning can be applied to any industry that deals with huge volumes of data, and which has many challenges to be answered. For instance, machine learning has been found to be extremely useful to organizations in the following domains which are  making the best use of the technology:PharmaceuticalsPharma industry spends billions of dollars on drug design and testing every year across the globe. Machine learning helps in cutting down such costs and to obtain results with accuracy just by entering the entire data of the drugs and their chemical compounds and comparing with various other parameters.Banks and Financial ServicesThis industry has two major needs to be addressed: attracting investor attention and increasing investments, and staying alert and preventing financial frauds and cyber threats. Machine learning does these two major tasks with ease and accuracy.Health Care and TreatmentsBy predicting the possible  diseases that could affect a patient, based on the medical, genetic and lifestyle data, machine learning helps patients stay alert to probable health threats that they may encounter. Wearable smart devices are an example of the machine learning applications in health care.Online SalesCompanies study the patterns that online shoppers are adopting through machine learning and use the results to display related ads, offers and discounts. Personalisation of internet shopping experience, merchandise supply panning and marketing campaigns are all based on the outcomes of machine learning results themselves.Mining, Oil and GasMachine learning helps in predicting accurately the best location of availability of minerals, gas, oil and other such natural resources, which would otherwise need huge investments, manpower and time.Government SchemesMany governments are taking the help of machine learning to study the interests and needs of their people. They are accordingly using the results in plans and schemes, both for the betterment of people and optimum usage of financial resources.Space Exploration and Science StudiesMachine learning greatly helps in studying stars, planets and finding out the secrets of other celestial bodies with far lesser investments and manpower. Scientists are also maximising the use of machine learning to discover various fascinating facts about the earth and its components.Future of Machine LearningCurrently, machine learning is entering our lives with baby steps. By the next decade, radical changes can be expected in machine learning and the way it impacts our lives. Customers have already started trusting the power and comfort of machine learning, and would definitely welcome more such innovations in the near future.Gartner says:Artificial Intelligence and Machine Learning have reached a critical tipping point and will increasingly augment and extend virtually every technology enabled service, thing, or application.So, it would not be surprising if in the future, machine learning would:Make its entry in almost every aspect of human  lifeBe omnipresent in business and industries, irrespective of their sizeEnter  cloud based servicesBring drastic changes in CPU design keeping in mind the need for computational efficiencyAltogether change the shape of data, its processing and usageChange the way connected systems work and look  owing to the ever increasing data on the internet.ConclusionMachine learning is quite different in its own way. While many experts are raising concerns over the ever increasing dependence and presence of machine learning in our everyday lives, on the positive side, machine learning can work wonders. And the world is already witnessing its magic – in health care, finance industry, automotive industry, image processing and voice recognition and many other fields.While many of us worry that machines may take over the world, it is totally up to us, how we design effective, yet safe and controllable machines. There is no doubt that machine learning would change the way we do many things including education, business and health services making the world a safer and better place.
Rated 4.5/5 based on 3 customer reviews
9417
What is Machine Learning and Why It Matters: Every...

If you are a machine learning enthusiast and stay ... Read More