Search

Series List Filter

Machine Learning Algorithms: [With Essentials, Principles, Types & Examples covered]

The advancements in Science and Technology are making every step of our daily life more comfortable. Today, the use of Machine learning systems, which is an integral part of Artificial Intelligence, has spiked and is seen playing a remarkable role in every user’s life. For instance, the widely popular, Virtual Personal Assistant being used for playing a music track or setting an alarm, face detection or voice recognition applications are the awesome examples of the machine learning systems that we see everyday. Machine learning, a subset of artificial intelligence, is the ability of a system to learn or predict the user’s needs and perform an expected task without human intervention. The inputs for the desired predictions are taken from user’s previously performed tasks or from relative examples.Why should you choose Machine Learning?Wonder why one should choose Machine Learning? Simply put, machine learning makes complex tasks much easier.  It makes the impossible possible!The following scenarios explain why we should opt for machine learning:During facial recognition and speech processing, it would be tedious to write the codes manually to execute the process, that's where machine learning comes handy.For market analysis, figuring customer preferences or fraud detection, machine learning has become essential.For the dynamic changes that happen in real-time tasks, it would be a challenging ordeal to solve through human intervention alone.Essentials of ML AlgorithmsTo state simply, machine learning is all about predictions – a machine learning, thinking and predicting what’s next. Here comes the question – what will a machine learn, how will a machine analyze, what will it predict.You have to understand two terms clearly before trying to get answers to these questions:DataAlgorithmDataData is what that is fed to the machine. For example, if you are trying to design a machine that can predict the weather over the next few days, then you should input the past ‘data’ that comprise maximum and minimum air temperatures, the speed of the wind, amount of rainfall, etc. All these come under ‘data’ that your machine will learn, and then analyse later.If we observe carefully, there will always be some pattern or the other in the input data we have. For example, the maximum and minimum ranges of temperatures may fall in the same bracket; or speeds of the wind may be slightly similar for a given season, etc. But, machine learning helps analyse such patterns very deeply. And then it predicts the outcomes of the problem we have designed it for.AlgorithmWhile data is the ‘food’ to the machine, an algorithm is like its digestive system. An algorithm works on the data. It crushes it; analyses it; permutates it; finds the gaps and fills in the blanks.Algorithms are the methods used by machines to work on the data input to them.What to consider before finalizing an ML algorithm?Depending on the functionality expected from the machine, algorithms range from very basic to highly complex. You should be wise in making a selection of an algorithm that suits your ML needs. Careful consideration and testing are needed before finalizing an algorithm for a purpose.For example, linear regression works well for simple ML functions such as speech analysis. In case, accuracy is your first choice, then slightly higher level functionalities such as Neural networks will do.This concept is called ‘The Explainability- Accuracy Tradeoff’. The following diagram explains this better:Image SourceBesides, with regards to machine learning algorithms, you need to remember the following aspects very clearly:No algorithm is an all-in-one solution to any type of problem; an algorithm that fits a scenario is not destined to fit in another one.Comparison of algorithms mostly does not make sense as each one of it has its own features and functionality. Many factors such as the size of data, data patterns, accuracy needed, the structure of the dataset, etc. play a major role in comparing two algorithms.The Principle behind Machine Learning AlgorithmsAs we learnt, an algorithm churns the given data and finds a pattern among them. Thus, all machine learning algorithms, especially the ones used for supervised learning, follow one similar principle:If the input variables or the data is X and you expect the machine to give a prediction or output Y, the machine will work on as per learning a target function ‘f’, whose pattern is not known to us.Thus, Y= f(X) fits well for every supervised machine learning algorithm. This is otherwise also called Predictive Modeling or Predictive Analysis, which ultimately provides us with the best ever prediction possible with utmost accuracy.Types of Machine Learning AlgorithmsDiving further into machine learning, we will first discuss the types of algorithms it has. Machine learning algorithms can be classified as:Supervised, andUnsupervisedSemi-supervised algorithmsReinforcement algorithmsA brief description of the types of  algorithms is given below:1. Supervised machine learning algorithmsIn this method, to get the output for a new set of user’s input, a model is trained to predict the results by using an old set of inputs and its relative known set of outputs. In other words, the system uses the examples used in the past.A data scientist trains the system on identifying the features and variables it should analyze. After training, these models compare the new results to the old ones and update their data accordingly to improve the prediction pattern.An example: If there is a basket full of fruits, based on the earlier specifications like color, shape and size given to the system, the model will be able to classify the fruits.There are 2 techniques in supervised machine learning and a technique to develop a model is chosen based on the type of data it has to work on.A) Techniques used in Supervised learningSupervised algorithms use either of the following techniques to develop a model based on the type of data.RegressionClassification1. Regression Technique In a given dataset, this technique is used to predict a numeric value or continuous values (a range of numeric values) based on the relation between variables obtained from the dataset.An example would be guessing the price of a house based after a year, based on the current price, total area, locality and number of bedrooms.Another example is predicting the room temperature in the coming hours, based on the volume of the room and current temperature.2. Classification Technique This is used if the input data can be categorized based on patterns or labels.For example, an email classification like recognizing a spam mail or face detection which uses patterns to predict the output.In summary, the regression technique is to be used when predictable data is in quantity and Classification technique is to be used when predictable data is about predicting a label.B) Algorithms that use Supervised LearningSome of the machine learning algorithms which use supervised learning method are:Linear RegressionLogistic RegressionRandom ForestGradient Boosted TreesSupport Vector Machines (SVM)Neural NetworksDecision TreesNaive BayesWe shall discuss some of these algorithms in detail as we move ahead in this post.2. Unsupervised machine learning algorithmsThis method does not involve training the model based on old data, I.e. there is no “teacher” or “supervisor” to provide the model with previous examples.The system is not trained by providing any set of inputs and relative outputs.  Instead, the model itself will learn and predict the output based on its own observations.For example, consider a basket of fruits which are not labeled/given any specifications this time. The model will only learn and organize them by comparing Color, Size and shape.A. Techniques used in unsupervised learningWe are discussing these techniques used in unsupervised learning as under:ClusteringDimensionality ReductionAnomaly detectionNeural networks1. ClusteringIt is the method of dividing or grouping the data in the given data set based on similarities.Data is explored to make groups or subsets based on meaningful separations.Clustering is used to determine the intrinsic grouping among the unlabeled data present.An example where clustering principle is being used is in digital image processing where this technique plays its role in dividing the image into distinct regions and identifying image border and the object.2. Dimensionality reductionIn a given dataset, there can be multiple conditions based on which data has to be segmented or classified.These conditions are the features that the individual data element has and may not be unique.If a dataset has too many numbers of such features, it makes it a complex process to segregate the data.To solve such type of complex scenarios, dimensional reduction technique can be used, which is a process that aims to reduce the number of variables or features in the given dataset without loss of important data.This is done by the process of feature selection or feature extraction.Email Classification can be considered as the best example where this technique was used.3. Anomaly DetectionAnomaly detection is also known as Outlier detection.It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.Examples of the usage are identifying a structural defect, errors in text and medical problems.4. Neural NetworksA Neural network is a framework for many different machine learning algorithms to work together and process complex data inputs.It can be thought of as a “complex function” which gives some output when an input is given.The Neural Network consists of 3 parts which are needed in the construction of the model.Units or NeuronsConnections or Parameters.Biases.Neural networks are into a wide range of applications such as coastal engineering, hydrology and medicine where they are being used in identifying certain types of cancers.B. Algorithms that use unsupervised learningSome of the most common algorithms in unsupervised learning are:hierarchical clustering,k-meansmixture modelsDBSCANOPTICS algorithmAutoencodersDeep Belief NetsHebbian LearningGenerative Adversarial NetworksSelf-organizing mapWe shall discuss some of these algorithms in detail as we move ahead in this post.3.Semi Supervised AlgorithmsIn case of semi-supervised algorithms, as the name goes, it is a mix of both supervised and unsupervised algorithms. Here both labelled and unlabelled examples exist, and in many scenarios of semi-supervised learning, the count of unlabelled examples is more than that of labelled ones.Classification and regression form typical examples for semi-supervised algorithms.The algorithms under semi-supervised learning are mostly extensions of other methods, and the machines that are trained in the semi-supervised method make assumptions when dealing with unlabelled data.Examples of Semi Supervised Learning:Google Photos are the best example of this model of learning. You must have observed that at first, you define the user name in the picture and teach the features of the user by choosing a few photos. Then the algorithm sorts the rest of the pictures accordingly and asks you in case it gets any doubts during classification.Comparing with the previous supervised and unsupervised types of learning models, we can make the following inferences for semi-supervised learning:Labels are entirely present in case of supervised learning, while for unsupervised learning they are totally absent. Semi-supervised is thus a hybrid mix of both these two.The semi-supervised model fits well in cases where cost constraints are present for machine learning modelling. One can label the data as per cost requirements and leave the rest of the data to the machine to take up.Another advantage of semi-supervised learning methods is that they have the potential to exploit the unlabelled data of a group in cases where data carries important unexploited information.4. Reinforcement LearningIn this type of learning, the machine learns from the feedback it has received. It constantly learns and upgrades its existing skills by taking the feedback from the environment it is in.Markov’s Decision process is the best example of reinforcement learning.In this mode of learning, the machine learns iteratively the correct output. Based on the reward obtained from each iteration,the machine knows what is right and what is wrong. This iteration keeps going till the full range of probable outputs are covered.Process of Reinforcement LearningThe steps involved in reinforcement learning are as shown below:Input state is taken by the agentA predefined function indicates the action to be performedBased on the action, the reward is obtained by the machineThe resulting pair of feedback and action is stored for future purposesExamples of Reinforcement Learning AlgorithmsComputer based games such as chessArtificial hands that are based on roboticsDriverless cars/ self-driven carsMost Used Machine Learning Algorithms - ExplainedIn this section, let us discuss the following most widely used machine learning algorithms in detail:Decision TreesNaive Bayes ClassificationThe AutoencoderSelf-organizing mapHierarchical clusteringOPTICS algorithm1. Decision TreesThis algorithm is an example of supervised learning.A Decision tree is a pictorial representation or a graphical representation which depicts every possible outcome of a decision.The various elements involved here are node, branch and leaf where ‘node’ represents an ‘attribute’, ‘branch’ representing a ‘decision’ and ‘leaf’ representing an ‘outcome’ of the feature after applying that particular decision.A decision tree is just an analogy of how a human thinks to take a decision with yes/no questions.The below decision tree explains a school admission procedure rule, where Age is primarily checked, and if age is < 5, admission is not given to them. And for the kids who are eligible for admission, a check is performed on Annual income of parents where if it is < 3 L p.a. the students are further eligible to get a concession on the fees.2. Naive Bayes ClassificationThis supervised machine learning algorithm is a powerful and fast classifying algorithm, using the Bayes rule in determining the conditional probability and to predict the results.Its popular uses are, face recognition, filtering spam emails, predicting the user inputs in chat by checking communicated text and to label news articles as sports, politics etc.Bayes Rule: The Bayes theorem defines a rule in determining the probability of occurrence of an “Event” when information about “Tests” is provided.“Event” can be considered as the patient having a Heart disease while “tests” are the positive conditions that match with the event3. The AutoencoderIt comes under the category of unsupervised learning using neural networking techniques.An autoencoder is intended to learn or encode a representation for a given data set.This also involves the process of dimensional reduction which trains the network to remove the "noise" signal.In hand, with the reduction, it also works in reconstruction where the model tries to rebuild or generate a representation from the reduced encoding which is equivalent to the original input.I.e. without the loss of important and needed information from the given input, an Autoencoder removes or ignores the unnecessary noise and also works on rebuilding the output.Pic sourceThe most common use of Autoencoder is an application that converts black and white image to color. Based on the content and object in the image (like grass, water, sky, face, dress) coloring is processed.4. Self-organizing mapThis comes under the unsupervised learning method.Self-Organizing Map uses the data visualization technique by operating on a given high dimensional data.The Self-Organizing Map is a two-dimensional array of neurons: M = {m1,m2,......mn}It reduces the dimensions of the data to a map, representing the clustering concept by grouping similar data together.SOM reduces data dimensions and displays similarities among data.SOM uses clustering technique on data without knowing the class memberships of the input data where several units compete for the current object.In short, SOM converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display.5. Hierarchical clusteringHierarchical clustering uses one of the below clustering techniques to determine a hierarchy of clusters.Thus produced hierarchy resembles a tree structure which is called a “Dendrogram”.The techniques used in hierarchical clustering are:K-Means,DBSCAN,Gaussian Mixture Models.The 2 methods in finding hierarchical clusters are:Agglomerative clusteringDivisive clusteringAgglomerative clusteringThis is a bottom-up approach, where each data point starts in its own cluster.These clusters are then joined greedily, by taking the two most similar clusters together and merging them.Divisive clusteringInverse to Agglomerative, this uses a top-down approach, wherein all data points start in the same cluster after which a parametric clustering algorithm like K-Means is used to divide the cluster into two clusters.Each cluster is further divided into two clusters until a desired number of clusters are hit.6. OPTICS algorithmOPTICS is an abbreviation for ordering points to identify the clustering structure.OPTICS works in principle like an extended DB Scan algorithm for an infinite number for a distance parameter which is smaller than a generating distance.From a wide range of parameter settings, OPTICS outputs a linear list of all objects under analysis in clusters based on their density.How to Choose Machine Learning Algorithms in Real TimeWhen implementing algorithms in real time, you need to keep in mind three main aspects: Space, Time, and Output.Besides, you should clearly understand the aim of your algorithm:Do you want to make predictions for the future?Are you just categorizing the given data?Is your targeted task simple or comprises of multiple sub-tasks?The following table will show you certain real-time scenarios and help you to understand which algorithm is best suited to each scenario:Real time scenarioBest suited algorithmWhy this algorithm is the best fit?Simple straightforward data set with no complex computationsLinear RegressionIt takes into account all factors involved and predicts the result with simple error rate explanation.For simple computations, you need not spend much computational power; and linear regression runs with minimal computational power.Classifying already labeled data into sub-labelsLogistic RegressionThis algorithm looks at every data point into two subcategories, hence best for sub-labeling.Logistic regression model works best when you have multiple targets.Sorting unlabelled data into groupsK-Means clustering algorithmThis algorithm groups and clusters data by measuring the spatial distance between each point.You can choose from its sub-types - Mean-Shift algorithm and Density-Based Spatial Clustering of Applications with NoiseSupervised text classification (analyzing reviews, comments, etc.)Naive BayesSimplest model that can perform powerful pre-processing and cleaning of textRemoves filler stop words effectivelyComputationally in-expensiveLogistic regressionSorts words one by one and assigns a probabilityRanks next to Naïve Bayes in simplicityLinear Support Vector Machine algorithmCan be chosen when performance mattersBag-of-words modelSuits best when vocabulary and the measure of known words is known.Image classificationConvolutional neural networkBest suited for complex computations such as analyzing visual cortexesConsumes more computational power and gives the best resultsStock market predictionsRecurrent neural networkBest suited for time-series analysis with well-defined and supervised data.Works efficiently in taking into account the relation between data and its time distribution.How to Run Machine Learning Algorithms?Till now you have learned in detail about various algorithms of machine learning, their features, selection and application in real time.When implementing the algorithm in real time, you can do it in any programming language that works well for machine learning.All that you need to do is use the standard libraries of the programming language that you have chosen and work on them, or program everything from scratch.Need more help? You can check these links for more clarity on coding machine learning algorithms in various programming languages.How To Get Started With Machine Learning Algorithms in RHow to Run Your First Classifier in WekaMachine Learning Algorithm Recipes in scikit-learnWhere do we stand in Machine Learning?Machine learning is slowly making strides into as many fields in our daily life as possible. Some businesses are making it strict to have transparent algorithms that do not affect their business privacy or data security. They are even framing regulations and performing audit trails to check if there is any discrepancy in the above-said data policies.The point to note here is that a machine working on machine learning principles and algorithms give output after processing the data through many nonlinear computations. If one needs to understand how a machine predicts, perhaps it can be possible only through another machine learning algorithm!Applications of Machine LearningCurrently, the role of Machine learning and Artificial Intelligence in human life is intertwined. With the advent of evolving technologies, AI and ML have marked their existence in all possible aspects.Machine learning finds a plethora of applications in several domains of our day to day life. An exhaustive list of fields where machine learning is currently in use now is shown in the diagram here. An explanation for the same follows further below:Financial Services: Banks and financial services are increasingly relying on machine learning to identify financial fraud, portfolio management, identify and suggest good options for investment for customers.Police Department: Apps based on facial recognition and other techniques of machine learning are being used by the police to identify and get hold of criminals.Online Marketing and Sales: Machine learning is helping companies a great deal in studying the shopping and spending patterns of customers and in making personalized product recommendations to them. Machine learning also eases customer support, product recommendations and advertising ideas for e-commerce.Healthcare: Doctors are using machine learning to predict and analyze the health status and disease progress of patients. Machine learning has proven its accuracy in detecting health condition, heartbeat, blood pressure and in identifying certain types of cancer. Advanced techniques of machine learning are being implemented in robotic surgery too.Household Applications: Household appliances that use face detection and voice recognition are gaining popularity as security devices and personal virtual assistants at homes.Oil and Gas: In analyzing underground minerals and carrying out the exploration and mining, geologists and scientists are using machine learning for improved accuracy and reduced investments.Transport: Machine learning can be used to identify the vehicles that are moving in prohibited zones for traffic control and safety monitoring purposes.Social Media: In social media, spam is a big nuisance. Companies are using machine learning to filter spam. Machine learning also aptly solves the purpose of sentiment analysis in social media.Trading and Commerce: Machine learning techniques are being implemented in online trading to automate the process of trading. Machines learn from the past performances of trading and use this knowledge to make decisions about future trading options.Future of Machine LearningMachine learning is already making a difference in the way businesses are offering their services to us, the customers. Voice-based search and preferences based ads are just basic functionalities of how machine learning is changing the face of businesses.ML has already made an inseparable mark in our lives. With more advancement in various fields, ML will be an integral part of all AI systems. ML algorithms are going to be made continuously learning with the day-to-day updating information.With the rapid rate at which ongoing research is happening in this field, there will be more powerful machine learning algorithms to make the way we live even more sophisticated!From 2013- 2017, the patents in the field of machine learning has recorded a growth of 34%, according to IFI Claims Patent Services (Patent Analytics). Also, 60% of the companies in the world are using machine learning for various purposes.A peek into the future trends and growth of machine learning through the reports of Predictive Analytics and Machine Learning (PAML) market shows a 21% CAGR by 2021.ConclusionUltimately, machine learning should be designed as an aid that would support mankind. The notion that automation and machine learning are threats to jobs and human workforce is pretty prevalent. It should always be remembered that machine learning is just a technology that has evolved to ease the life of humans by reducing the needed manpower and to offer increased efficiency at lower costs that too in a shorter time span. The onus of using machine learning in a responsible manner lies in the hands of those who work on/with it.However, stay tuned to an era of artificial intelligence and machine learning that makes the impossible possible and makes you witness the unseen!AI is likely to be the best thing or the worst thing to happen to humanity. – Stephen Hawking
Rated 4.5/5 based on 16 customer reviews

Machine Learning Algorithms: [With Essentials, Principles, Types & Examples covered]

9682
Machine Learning Algorithms: [With Essentials, Principles, Types & Examples covered]

The advancements in Science and Technology are making every step of our daily life more comfortable. Today, the use of Machine learning systems, which is an integral part of Artificial Intelligence, has spiked and is seen playing a remarkable role in every user’s life. 

For instance, the widely popular, Virtual Personal Assistant being used for playing a music track or setting an alarm, face detection or voice recognition applications are the awesome examples of the machine learning systems that we see everyday. 

Machine learning, a subset of artificial intelligence, is the ability of a system to learn or predict the user’s needs and perform an expected task without human intervention. The inputs for the desired predictions are taken from user’s previously performed tasks or from relative examples.

Why should you choose Machine Learning?

Wonder why one should choose Machine Learning? Simply put, machine learning makes complex tasks much easier.  It makes the impossible possible!

The following scenarios explain why we should opt for machine learning:

Why should you choose Machine Learning

  1. During facial recognition and speech processing, it would be tedious to write the codes manually to execute the process, that's where machine learning comes handy.
  2. For market analysis, figuring customer preferences or fraud detection, machine learning has become essential.
  3. For the dynamic changes that happen in real-time tasks, it would be a challenging ordeal to solve through human intervention alone.

Essentials of ML Algorithms

To state simply, machine learning is all about predictions – a machine learning, thinking and predicting what’s next. Here comes the question – what will a machine learn, how will a machine analyze, what will it predict.

You have to understand two terms clearly before trying to get answers to these questions:

  • Data
  • Algorithm

Essentials of ML Algorithms

Data

Data is what that is fed to the machine. For example, if you are trying to design a machine that can predict the weather over the next few days, then you should input the past ‘data’ that comprise maximum and minimum air temperatures, the speed of the wind, amount of rainfall, etc. All these come under ‘data’ that your machine will learn, and then analyse later.

If we observe carefully, there will always be some pattern or the other in the input data we have. For example, the maximum and minimum ranges of temperatures may fall in the same bracket; or speeds of the wind may be slightly similar for a given season, etc. But, machine learning helps analyse such patterns very deeply. And then it predicts the outcomes of the problem we have designed it for.

Algorithm

 ML Algorithm

While data is the ‘food’ to the machine, an algorithm is like its digestive system. An algorithm works on the data. It crushes it; analyses it; permutates it; finds the gaps and fills in the blanks.

Algorithms are the methods used by machines to work on the data input to them.

What to consider before finalizing an ML algorithm?

Depending on the functionality expected from the machine, algorithms range from very basic to highly complex. You should be wise in making a selection of an algorithm that suits your ML needs. Careful consideration and testing are needed before finalizing an algorithm for a purpose.

For example, linear regression works well for simple ML functions such as speech analysis. In case, accuracy is your first choice, then slightly higher level functionalities such as Neural networks will do.

This concept is called ‘The Explainability- Accuracy Tradeoff’. The following diagram explains this better:

Explainability-accuracy tradeoff of Machine LearningImage Source

Besides, with regards to machine learning algorithms, you need to remember the following aspects very clearly:

  • No algorithm is an all-in-one solution to any type of problem; an algorithm that fits a scenario is not destined to fit in another one.
  • Comparison of algorithms mostly does not make sense as each one of it has its own features and functionality. Many factors such as the size of data, data patterns, accuracy needed, the structure of the dataset, etc. play a major role in comparing two algorithms.

The Principle behind Machine Learning Algorithms

As we learnt, an algorithm churns the given data and finds a pattern among them. Thus, all machine learning algorithms, especially the ones used for supervised learning, follow one similar principle:

If the input variables or the data is X and you expect the machine to give a prediction or output Y, the machine will work on as per learning a target function ‘f’, whose pattern is not known to us.

Thus, Y= f(X) fits well for every supervised machine learning algorithm. This is otherwise also called Predictive Modeling or Predictive Analysis, which ultimately provides us with the best ever prediction possible with utmost accuracy.

Types of Machine Learning Algorithms

Diving further into machine learning, we will first discuss the types of algorithms it has. Machine learning algorithms can be classified as:

  • Supervised, and
  • Unsupervised
  • Semi-supervised algorithms
  • Reinforcement algorithms

A brief description of the types of  algorithms is given below:

1. Supervised machine learning algorithms

In this method, to get the output for a new set of user’s input, a model is trained to predict the results by using an old set of inputs and its relative known set of outputs. In other words, the system uses the examples used in the past.

A data scientist trains the system on identifying the features and variables it should analyze. After training, these models compare the new results to the old ones and update their data accordingly to improve the prediction pattern.

An example: If there is a basket full of fruits, based on the earlier specifications like color, shape and size given to the system, the model will be able to classify the fruits.

There are 2 techniques in supervised machine learning and a technique to develop a model is chosen based on the type of data it has to work on.

A) Techniques used in Supervised learning

Supervised algorithms use either of the following techniques to develop a model based on the type of data.

  1. Regression
  2. Classification

1. Regression Technique 

  • In a given dataset, this technique is used to predict a numeric value or continuous values (a range of numeric values) based on the relation between variables obtained from the dataset.
  • An example would be guessing the price of a house based after a year, based on the current price, total area, locality and number of bedrooms.
  • Another example is predicting the room temperature in the coming hours, based on the volume of the room and current temperature.

2. Classification Technique 

  • This is used if the input data can be categorized based on patterns or labels.
  • For example, an email classification like recognizing a spam mail or face detection which uses patterns to predict the output.

In summary, the regression technique is to be used when predictable data is in quantity and Classification technique is to be used when predictable data is about predicting a label.

B) Algorithms that use Supervised Learning

Some of the machine learning algorithms which use supervised learning method are:

  • Linear Regression
  • Logistic Regression
  • Random Forest
  • Gradient Boosted Trees
  • Support Vector Machines (SVM)
  • Neural Networks
  • Decision Trees
  • Naive Bayes

We shall discuss some of these algorithms in detail as we move ahead in this post.

2. Unsupervised machine learning algorithms

This method does not involve training the model based on old data, I.e. there is no “teacher” or “supervisor” to provide the model with previous examples.

The system is not trained by providing any set of inputs and relative outputs.  Instead, the model itself will learn and predict the output based on its own observations.

For example, consider a basket of fruits which are not labeled/given any specifications this time. The model will only learn and organize them by comparing Color, Size and shape.

A. Techniques used in unsupervised learning

We are discussing these techniques used in unsupervised learning as under:

  • Clustering
  • Dimensionality Reduction
  • Anomaly detection
  • Neural networks

1. Clustering

  • It is the method of dividing or grouping the data in the given data set based on similarities.
  • Data is explored to make groups or subsets based on meaningful separations.
  • Clustering is used to determine the intrinsic grouping among the unlabeled data present.
  • An example where clustering principle is being used is in digital image processing where this technique plays its role in dividing the image into distinct regions and identifying image border and the object.

2. Dimensionality reduction

  • In a given dataset, there can be multiple conditions based on which data has to be segmented or classified.
  • These conditions are the features that the individual data element has and may not be unique.
  • If a dataset has too many numbers of such features, it makes it a complex process to segregate the data.
  • To solve such type of complex scenarios, dimensional reduction technique can be used, which is a process that aims to reduce the number of variables or features in the given dataset without loss of important data.
  • This is done by the process of feature selection or feature extraction.
  • Email Classification can be considered as the best example where this technique was used.

3. Anomaly Detection

  • Anomaly detection is also known as Outlier detection.
  • It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.
  • Examples of the usage are identifying a structural defect, errors in text and medical problems.

4. Neural NetworksNeural Networks in Machine learning

  • A Neural network is a framework for many different machine learning algorithms to work together and process complex data inputs.
  • It can be thought of as a “complex function” which gives some output when an input is given.
  • The Neural Network consists of 3 parts which are needed in the construction of the model.
    • Units or Neurons
    • Connections or Parameters.
    • Biases.

Neural networks are into a wide range of applications such as coastal engineering, hydrology and medicine where they are being used in identifying certain types of cancers.

B. Algorithms that use unsupervised learning

Some of the most common algorithms in unsupervised learning are:

  1. hierarchical clustering,
  2. k-means
  3. mixture models
  4. DBSCAN
  5. OPTICS algorithm
  6. Autoencoders
  7. Deep Belief Nets
  8. Hebbian Learning
  9. Generative Adversarial Networks
  10. Self-organizing map

We shall discuss some of these algorithms in detail as we move ahead in this post.

3.Semi Supervised Algorithms

In case of semi-supervised algorithms, as the name goes, it is a mix of both supervised and unsupervised algorithms. Here both labelled and unlabelled examples exist, and in many scenarios of semi-supervised learning, the count of unlabelled examples is more than that of labelled ones.

Classification and regression form typical examples for semi-supervised algorithms.

The algorithms under semi-supervised learning are mostly extensions of other methods, and the machines that are trained in the semi-supervised method make assumptions when dealing with unlabelled data.

Examples of Semi Supervised Learning:

Google Photos are the best example of this model of learning. You must have observed that at first, you define the user name in the picture and teach the features of the user by choosing a few photos. Then the algorithm sorts the rest of the pictures accordingly and asks you in case it gets any doubts during classification.

Comparing with the previous supervised and unsupervised types of learning models, we can make the following inferences for semi-supervised learning:

  • Labels are entirely present in case of supervised learning, while for unsupervised learning they are totally absent. Semi-supervised is thus a hybrid mix of both these two.
  • The semi-supervised model fits well in cases where cost constraints are present for machine learning modelling. One can label the data as per cost requirements and leave the rest of the data to the machine to take up.
  • Another advantage of semi-supervised learning methods is that they have the potential to exploit the unlabelled data of a group in cases where data carries important unexploited information.

4. Reinforcement Learning

In this type of learning, the machine learns from the feedback it has received. It constantly learns and upgrades its existing skills by taking the feedback from the environment it is in.

Markov’s Decision process is the best example of reinforcement learning.

In this mode of learning, the machine learns iteratively the correct output. Based on the reward obtained from each iteration,the machine knows what is right and what is wrong. This iteration keeps going till the full range of probable outputs are covered.

Process of Reinforcement Learning

The steps involved in reinforcement learning are as shown below:

  1. Input state is taken by the agent
  2. A predefined function indicates the action to be performed
  3. Based on the action, the reward is obtained by the machine
  4. The resulting pair of feedback and action is stored for future purposes

Examples of Reinforcement Learning Algorithms

  • Computer based games such as chess
  • Artificial hands that are based on robotics
  • Driverless cars/ self-driven cars

Most Used Machine Learning Algorithms - Explained

In this section, let us discuss the following most widely used machine learning algorithms in detail:

  1. Decision Trees
  2. Naive Bayes Classification
  3. The Autoencoder
  4. Self-organizing map
  5. Hierarchical clustering
  6. OPTICS algorithm

1. Decision Trees

  • This algorithm is an example of supervised learning.
  • A Decision tree is a pictorial representation or a graphical representation which depicts every possible outcome of a decision.
  • The various elements involved here are node, branch and leaf where ‘node’ represents an ‘attribute’, ‘branch’ representing a ‘decision’ and ‘leaf’ representing an ‘outcome’ of the feature after applying that particular decision.
  • A decision tree is just an analogy of how a human thinks to take a decision with yes/no questions.
  • The below decision tree explains a school admission procedure rule, where Age is primarily checked, and if age is < 5, admission is not given to them. And for the kids who are eligible for admission, a check is performed on Annual income of parents where if it is < 3 L p.a. the students are further eligible to get a concession on the fees.

Decision Trees in Machine Learning Algorithm

2. Naive Bayes Classification

  • This supervised machine learning algorithm is a powerful and fast classifying algorithm, using the Bayes rule in determining the conditional probability and to predict the results.
  • Its popular uses are, face recognition, filtering spam emails, predicting the user inputs in chat by checking communicated text and to label news articles as sports, politics etc.
  • Bayes Rule: The Bayes theorem defines a rule in determining the probability of occurrence of an “Event” when information about “Tests” is provided.

Bayes Rule

  • “Event” can be considered as the patient having a Heart disease while “tests” are the positive conditions that match with the event

3. The Autoencoder

  • It comes under the category of unsupervised learning using neural networking techniques.
  • An autoencoder is intended to learn or encode a representation for a given data set.
  • This also involves the process of dimensional reduction which trains the network to remove the "noise" signal.
  • In hand, with the reduction, it also works in reconstruction where the model tries to rebuild or generate a representation from the reduced encoding which is equivalent to the original input.
  • I.e. without the loss of important and needed information from the given input, an Autoencoder removes or ignores the unnecessary noise and also works on rebuilding the output.

 The Autoencoder

Pic source

  • The most common use of Autoencoder is an application that converts black and white image to color. Based on the content and object in the image (like grass, water, sky, face, dress) coloring is processed.

4. Self-organizing map

  • This comes under the unsupervised learning method.
  • Self-Organizing Map uses the data visualization technique by operating on a given high dimensional data.
  • The Self-Organizing Map is a two-dimensional array of neurons: M = {m1,m2,......mn}
  • It reduces the dimensions of the data to a map, representing the clustering concept by grouping similar data together.
  • SOM reduces data dimensions and displays similarities among data.
  • SOM uses clustering technique on data without knowing the class memberships of the input data where several units compete for the current object.
  • In short, SOM converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display.

5. Hierarchical clustering

  • Hierarchical clustering uses one of the below clustering techniques to determine a hierarchy of clusters.
  • Thus produced hierarchy resembles a tree structure which is called a “Dendrogram”.
  • The techniques used in hierarchical clustering are:
    • K-Means,
    • DBSCAN,
    • Gaussian Mixture Models.
  • The 2 methods in finding hierarchical clusters are:
  1. Agglomerative clustering
  2. Divisive clustering
  • Agglomerative clustering

    • This is a bottom-up approach, where each data point starts in its own cluster.
    • These clusters are then joined greedily, by taking the two most similar clusters together and merging them.
  • Divisive clustering

    • Inverse to Agglomerative, this uses a top-down approach, wherein all data points start in the same cluster after which a parametric clustering algorithm like K-Means is used to divide the cluster into two clusters.
    • Each cluster is further divided into two clusters until a desired number of clusters are hit.

6. OPTICS algorithm

  • OPTICS is an abbreviation for ordering points to identify the clustering structure.
  • OPTICS works in principle like an extended DB Scan algorithm for an infinite number for a distance parameter which is smaller than a generating distance.
  • From a wide range of parameter settings, OPTICS outputs a linear list of all objects under analysis in clusters based on their density.

How to Choose Machine Learning Algorithms in Real Time

When implementing algorithms in real time, you need to keep in mind three main aspects: Space, Time, and Output.

Besides, you should clearly understand the aim of your algorithm:

  • Do you want to make predictions for the future?
  • Are you just categorizing the given data?
  • Is your targeted task simple or comprises of multiple sub-tasks?

The following table will show you certain real-time scenarios and help you to understand which algorithm is best suited to each scenario:

Real time scenario
Best suited algorithmWhy this algorithm is the best fit?
Simple straightforward data set with no complex computationsLinear Regression
  • It takes into account all factors involved and predicts the result with simple error rate explanation.
  • For simple computations, you need not spend much computational power; and linear regression runs with minimal computational power.
Classifying already labeled data into sub-labelsLogistic Regression
  • This algorithm looks at every data point into two subcategories, hence best for sub-labeling.
  • Logistic regression model works best when you have multiple targets.
Sorting unlabelled data into groupsK-Means clustering algorithm
  • This algorithm groups and clusters data by measuring the spatial distance between each point.
  • You can choose from its sub-types - Mean-Shift algorithm and Density-Based Spatial Clustering of Applications with Noise
Supervised text classification (analyzing reviews, comments, etc.)Naive Bayes
  • Simplest model that can perform powerful pre-processing and cleaning of text
  • Removes filler stop words effectively
  • Computationally in-expensive

Logistic regression
  • Sorts words one by one and assigns a probability
  • Ranks next to Naïve Bayes in simplicity
Linear Support Vector Machine algorithm
  • Can be chosen when performance matters
Bag-of-words model
  • Suits best when vocabulary and the measure of known words is known.
Image classificationConvolutional neural network
  • Best suited for complex computations such as analyzing visual cortexes
  • Consumes more computational power and gives the best results
Stock market predictionsRecurrent neural network
  • Best suited for time-series analysis with well-defined and supervised data.
  • Works efficiently in taking into account the relation between data and its time distribution.

How to Run Machine Learning Algorithms?

Till now you have learned in detail about various algorithms of machine learning, their features, selection and application in real time.

When implementing the algorithm in real time, you can do it in any programming language that works well for machine learning.

All that you need to do is use the standard libraries of the programming language that you have chosen and work on them, or program everything from scratch.

Need more help? You can check these links for more clarity on coding machine learning algorithms in various programming languages.

How To Get Started With Machine Learning Algorithms in R

How to Run Your First Classifier in Weka

Machine Learning Algorithm Recipes in scikit-learn

Where do we stand in Machine Learning?

Machine learning is slowly making strides into as many fields in our daily life as possible. Some businesses are making it strict to have transparent algorithms that do not affect their business privacy or data security. They are even framing regulations and performing audit trails to check if there is any discrepancy in the above-said data policies.

The point to note here is that a machine working on machine learning principles and algorithms give output after processing the data through many nonlinear computations. If one needs to understand how a machine predicts, perhaps it can be possible only through another machine learning algorithm!

Applications of Machine Learning

Applications of Machine Learning

Currently, the role of Machine learning and Artificial Intelligence in human life is intertwined. With the advent of evolving technologies, AI and ML have marked their existence in all possible aspects.

Machine learning finds a plethora of applications in several domains of our day to day life. An exhaustive list of fields where machine learning is currently in use now is shown in the diagram here. An explanation for the same follows further below:

  1. Financial Services: Banks and financial services are increasingly relying on machine learning to identify financial fraud, portfolio management, identify and suggest good options for investment for customers.
  2. Police DepartmentApps based on facial recognition and other techniques of machine learning are being used by the police to identify and get hold of criminals.
  3. Online Marketing and Sales: Machine learning is helping companies a great deal in studying the shopping and spending patterns of customers and in making personalized product recommendations to them. Machine learning also eases customer support, product recommendations and advertising ideas for e-commerce.
  4. Healthcare: Doctors are using machine learning to predict and analyze the health status and disease progress of patients. Machine learning has proven its accuracy in detecting health condition, heartbeat, blood pressure and in identifying certain types of cancer. Advanced techniques of machine learning are being implemented in robotic surgery too.
  5. Household Applications: Household appliances that use face detection and voice recognition are gaining popularity as security devices and personal virtual assistants at homes.
  6. Oil and Gas: In analyzing underground minerals and carrying out the exploration and mining, geologists and scientists are using machine learning for improved accuracy and reduced investments.
  7. TransportMachine learning can be used to identify the vehicles that are moving in prohibited zones for traffic control and safety monitoring purposes.
  8. Social Media: In social media, spam is a big nuisance. Companies are using machine learning to filter spam. Machine learning also aptly solves the purpose of sentiment analysis in social media.
  9. Trading and Commerce: Machine learning techniques are being implemented in online trading to automate the process of trading. Machines learn from the past performances of trading and use this knowledge to make decisions about future trading options.

Future of Machine Learning

Machine learning is already making a difference in the way businesses are offering their services to us, the customers. Voice-based search and preferences based ads are just basic functionalities of how machine learning is changing the face of businesses.

ML has already made an inseparable mark in our lives. With more advancement in various fields, ML will be an integral part of all AI systems. ML algorithms are going to be made continuously learning with the day-to-day updating information.

With the rapid rate at which ongoing research is happening in this field, there will be more powerful machine learning algorithms to make the way we live even more sophisticated!

From 2013- 2017, the patents in the field of machine learning has recorded a growth of 34%, according to IFI Claims Patent Services (Patent Analytics). Also, 60% of the companies in the world are using machine learning for various purposes.

A peek into the future trends and growth of machine learning through the reports of Predictive Analytics and Machine Learning (PAML) market shows a 21% CAGR by 2021.

Conclusion

Ultimately, machine learning should be designed as an aid that would support mankind. The notion that automation and machine learning are threats to jobs and human workforce is pretty prevalent. It should always be remembered that machine learning is just a technology that has evolved to ease the life of humans by reducing the needed manpower and to offer increased efficiency at lower costs that too in a shorter time span. The onus of using machine learning in a responsible manner lies in the hands of those who work on/with it.

However, stay tuned to an era of artificial intelligence and machine learning that makes the impossible possible and makes you witness the unseen!

AI is likely to be the best thing or the worst thing to happen to humanity. – Stephen Hawking

KnowledgeHut

KnowledgeHut

Author

KnowledgeHut is a fast growing Management Consulting and Training firm that is a source of Intelligent Information support for businesses and professionals across the globe.


Website : https://www.knowledgehut.com/

Join the Discussion

Your email address will not be published. Required fields are marked *

Suggested Blogs

What is Gradient Descent For Machine Learning

In our day-to-day lives, we are optimizing variables based on our personal decisions and we don’t even recognize the process consciously. We are constantly using optimization techniques all day long, for example, while going to work, choosing a shorter route in order to minimize traffic woes, figuring out and managing a quick walk around the campus during a snack break, or scheduling a cab in advance to reach the airport on time.Optimization is the ultimate goal, whether you are dealing with actual events in real-life or creating a technology-based product. Optimization is at the heart of most of the statistical and machine learning techniques which are widely used in data science. To gain more knowledge and skills on data science and machine learning, join the  certification course now.Optimization for Machine LearningAccuracy is the word with which we are most concerned, while we are dealing with problems related to machine learning and artificial intelligence. Any rate of errors cannot be tolerated while dealing with real-world problems and neither should they be compromised.Let us consider a case of self-driving cars. The model fitted in the car detects any obstacles that come in the way and takes appropriate actions, which can be slowing down the speed or pulling on the brakes and so on. Now we need to keep this in mind that there is no human in the car to operate or withdraw the actions taken by the self-driving car. In such a scenario, suppose the model is not accurate. It will not be able to detect other cars or any pedestrians and end up crashing leading to several lives at risk.This is where we need optimization algorithms to evaluate our model and judge whether the model is performing according to our needs or not. The evaluation can be made easy by calculating the cost function (which we will look into in a while in this article in detail). It is basically a mapping function that tells us about the difference between the desired output and what our model is computing. We can accordingly correct the model and avoid any kind of undesired activities.Optimization may be defined as the process by which an optimum is achieved. It is all about designing an optimal output for your problems with the use of resources available. However, optimization in machine learning is slightly different. In most of the cases, we are aware of the data, the shape and size, which also helps us know the areas we need to improve. But in machine learning we do not know how the new data may look like, this is where optimization acts perfectly. Optimization techniques are performed on the training data and then the validation data set is used to check its performance.There are a lot of advanced applications of optimization which are widely used in airway routing, market basket analysis, face recognition and so on. Machine learning algorithms such as linear regression, KNN, neural networks completely depend on optimization techniques. Here, we are going to look into one such popular optimization technique called Gradient Descent.What is Gradient Descent?Gradient descent is an optimization algorithm which is mainly used to find the minimum of a function. In machine learning, gradient descent is used to update parameters in a model. Parameters can vary according to the algorithms, such as coefficients in Linear Regression and weights in Neural Networks.Let us relate gradient descent with a real-life analogy for better understanding. Think of a valley you would like to descend when you are blind-folded. Any sane human will take a step and look for the slope of the valley, whether it goes up or down. Once you are sure of the downward slope you will follow that and repeat the step again and again until you have descended completely (or reached the minima).Similarly, let us consider another analogy. Suppose you have a ball and you place it on an inclined plane (at position A). As per laws, it will start rolling until it travels to a gentle plane where it will be stationary (at position B as shown in the figure below).This is exactly what happens in gradient descent. The inclined and/or irregular is the cost function when it is plotted and the role of gradient descent is to provide direction and the velocity (learning rate)  of the movement in order to attain the minima of the function i.e where the cost is minimum.How does Gradient Descent work?The primary goal of machine learning algorithms is always to build a model, which is basically a hypothesis which can be used to find an estimation for Y based on X. Let us consider an example of a model based on certain housing data which comprises of the sale price of the house, the size of the house etc. Suppose we want to predict the pricing of the house based on its size. It is clearly a regression problem where given some inputs, we would like to predict a continuous output.The hypothesis is usually presented aswhere the theta values are the parameters.Let us look into some examples and visualize the hypothesis:This yields h(x) = 1.5 + 0x. 0x means no slope, and y will always be the constant 1.5. This looks like:Now let us consider,Where, h(x) = 1 + 0.5xCost FunctionThe objective in the case of gradient descent is to find a line of best fit for some given inputs, or X values, and any number of Y values, or outputs. A cost function is defined as “a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event.”With a known set of inputs and their corresponding outputs, a machine learning model attempts to make predictions according to the new set of inputs.Machine Learning ProcessThe Error would be the difference between the two predictions.This relates to the idea of a Cost function or Loss function.A Cost Function/Loss Function tells us “how good” our model is at making predictions for a given set of parameters. The cost function has a curve and a gradient, the slope of this curve helps us to update our parameters and make an accurate model.Minimizing the Cost FunctionIt is always the primary goal of any Machine Learning Algorithm to minimize the Cost Function. Minimizing cost functions will also result in a lower error between the predicted values and the actual values which also denotes that the algorithm has performed well in learning. How do we actually minimize any function?Generally, the cost function is in the form of Y = X². In a Cartesian coordinate system, this represents an equation for a parabola which can be graphically represented as :ParabolaNow in order to minimize the function mentioned above, firstly we need to find the value of X which will produce the lowest value of Y (in this case it is the red dot). With lower dimensions (like 2D in this case) it becomes easier to locate the minima but it is not the same while dealing with higher dimensions. For such cases, we need to use the Gradient Descent algorithm to locate the minima.Now a function is required which will minimize the parameters over a dataset. The most common function which is often used is the  mean squared error. It measures the difference between the estimated value (the prediction) and the estimator (the dataset).Mean Squared ErrorIt turns out we can adjust the equation a little to make the calculation down the track a little more simple. Now a question may arise, Why do we take the squared differences and simply not the absolute differences? Because the squared differences make it easier to derive a regression line. Indeed, to find that line we need to compute the first derivative of the Cost function, and it is much harder to compute the derivative of absolute values than squared values. Also, the squared differences increase the error distance, thus, making the bad predictions more pronounced than the good ones.The equation looks like -Mean Squared ErrorLet us apply this cost function to the following data:Here we will calculate some of the theta values and then plot the cost function by hand. Since this function passes through (0, 0), we will look only at a single value of theta. Also, let us refer to the cost function as J(ϴ) from now on.When the value of ϴ is 1, for J(1), we get a 0. You will notice the value of J(1) gives a straight line which fits the data perfectly. Now let us try with ϴ = 0.5J(0.5)The MSE function gives us a value of 0.58. Let’s plot both our values so far:J(1) = 0J(0.5) = 0.58With J(1) and J(0.5)Let us go ahead and calculate some more values of J(ϴ).Now if we join the dots carefully, we will get -Visualizing the cost function J(ϴ)As we can see, the cost function is at a minimum when theta = 1, which means the initial data is a straight line with a slope or gradient of 1 as shown by the orange line in the above figure.Using a trial and error method, we minimized J(ϴ). We did all of these by trying out a lot of values and with the help of visualizations. Gradient Descent does the same thing in a much better way, by changing the theta values or parameters until it descends to the minimum value.You may refer below for the Python code to find out cost function:import matplotlib.pyplot as plt import numpy as np # original data set X = [1, 2, 3] y = [1, 2, 3] # slope of best_fit_1 is 0.5 # slope of best_fit_2 is 1.0 # slope of best_fit_3 is 1.5 hyps = [0.5, 1.0, 1.5] # multiply the original X values by the theta # to produce hypothesis values for each X def multiply_matrix(mat, theta): mutated = [] for i in range(len(mat)):     mutated.append(mat[i] * theta) return mutated # calculate cost by looping each sample # subtract hyp(x) from y # square the result # sum them all together def calc_cost(m, X, y): total = 0 for i in range(m):     squared_error = (y[i] - X[i]) ** 2     total += squared_error     return total * (1 / (2*m)) # calculate cost for each hypothesis for i in range(len(hyps)): hyp_values = multiply_matrix(X, hyps[i])   print("Cost for ", hyps[i], " is ", calc_cost(len(X), y, hyp_values))Cost for 0.5 is 0.5833333333333333 Cost for 1.0 is 0.0 Cost for 1.5 is 0.5833333333333333 Learning RateLet us now start by initializing theta0 and theta1 to any two values, say 0 for both, and go from there. The algorithm is as follows:Gradient Descentwhere α, alpha, is the learning rate, or how rapidly do we want to move towards the minimum. We can always overshoot if the value of α is too large.The derivative which refers to the slope of the function is calculated. Here we calculate the partial derivative of the cost function. It helps us to know the direction (sign) in which the coefficient values should move so that they attain a lower cost on the following iteration. Partial Derivative of the Cost Function which we need to calculateOnce we know the direction from the derivative, we can update the coefficient values. Now you need to specify a learning rate parameter which will control how much the coefficients can change on each update.coefficient = coefficient – (alpha * delta)This particular process is repeated as long as the cost of the coefficients is 0.0 or close enough to zero.This turns out to be:Image from Andrew Ng’s machine learning courseWhich gives us linear regression!Linear RegressionTypes of Gradient Descent AlgorithmsGradient descent variants’ trajectory towards the minimum1. Batch Gradient Descent: In this type of gradient descent, all the training examples are processed for each iteration of gradient descent. It gets computationally expensive if the number of training examples is large. This is when batch gradient descent is not preferred, rather a stochastic gradient descent or mini-batch gradient descent is used.Algorithm for batch gradient descent:Let hθ(x) be the hypothesis for linear regression. Then, the cost function is given by:Let Σ represents the sum of all training examples from i=1 to m.Repeat {For every j =0 …n}Where xj(i) represents the jth feature of the ith training example. So if m is very large, then the derivative term fails to converge at the global minimum.2. Stochastic Gradient Descent: The word stochastic is related to a system or a process that is linked with a random probability. Therefore, in Stochastic Gradient Descent (SGD) samples are selected at random for each iteration instead of selecting the entire data set. When the number of training examples is too large, it becomes computationally expensive to use batch gradient descent, however, Stochastic Gradient Descent uses only a single sample, i.e., a batch size of one, to perform each iteration. The sample is randomly shuffled and selected for performing the iteration. The parameters are updated even after one iteration where only one has been processed. Thus, it gets faster than batch gradient descent.Algorithm for stochastic gradient descent:Firstly shuffle the data set randomly in order to train the parameters evenly for each type of data.As mentioned above, it takes into consideration one example per iteration.Hence,Let (x(i),y(i)) be the training exampleRepeat {For i=1 to m{        For every j =0 …n              }}3. Mini Batch gradient descent: This type of gradient descent is considered to be faster than both batch gradient descent and stochastic gradient descent. Even if the number of training examples is large, it processes it in batches in one go. Also, the number of iterations are lesser in spite of working with larger training samples.Algorithm for mini-batch gradient descent:Let us consider b be the number of examples in one batch, where b precision:          # change the value of x     x_prev = x_new      # get the derivation of the old value of x     d_x = - deriv(x_prev)          # get your new value of x by adding the previous, the multiplication of the derivative and the learning rate     x_new = x_prev + (l_r * d_x)          # append the new value of x to a list of all x-s for later visualization of path     x_list.append(x_new)          # append the new value of y to a list of all y-s for later visualization of path     y_list.append(function(x_new)) print ("Local minimum occurs at: "+ str(x_new)) print ("Number of steps: " + str(len(x_list)))           plt.subplot(1,2,2) plt.scatter(x_list,y_list,c="g") plt.plot(x_list,y_list,c="g") plt.plot(x,function(x), c="r") plt.title("Gradient descent") plt.show() plt.subplot(1,2,1) plt.scatter(x_list,y_list,c="g") plt.plot(x_list,y_list,c="g") plt.plot(x,function(x), c="r") plt.xlim([1.0,2.1]) plt.title("Zoomed in Gradient descent to Key Area") plt.show() #Implement gradient descent (all the arguments are arbitrarily chosen) step(0.5, 0, 0.001, 0.05)Local minimum occurs at: 1.9980265135950486Number of steps: 25 SummaryIn this article, you have learned about gradient descent for machine learning. Here we tried to cover most of the topics. To learn more about machine learning algorithms in-depth,  click here. Let us summarize all that we have covered in this article.Optimization is the heart and soul of machine learning.Gradient descent is a simple optimization technique which can be used with other machine learning algorithms.Batch gradient descent refers to calculating the derivative from all training data before calculating an update.Stochastic gradient descent refers to calculating the derivative from each training data instance and calculating the update immediately.If you are inspired by the opportunities provided by Data Science, enrol in our  Data Science and Machine Learning Courses for more lucrative career options in this landscape.
Rated 4.5/5 based on 34 customer reviews
12546
What is Gradient Descent For Machine Learning

In our day-to-day lives, we are optimizing variabl... Read More

Top 30 Machine Learning Skills required to get a Machine Learning Job

Machine learning has been making a silent revolution in our lives since the past decade. From capturing selfies with a blurry background and focused face capture to getting our queries answered by virtual assistants such as Siri and Alexa, we are increasingly depending on products and applications that implement machine learning at their core.In more basic terms, machine learning is one of the steps involved in artificial intelligence. Machines learn through machine learning. How exactly? Just like how humans learn – through training, experience, and feedback.Once machines learn through machine learning, they implement the knowledge so acquired for many purposes including, but not limited to, sorting, diagnosis, robotics, analysis and predictions in many fields.It is these implementations and applications that have made machine learning an in-demand skill in the field of programming and technology.Look at the stats that show a positive trend for machine learning projects and careers.Gartner’s report on artificial intelligence showed that as many as 2.3 million jobs in machine learning would be available across the globe by 2020.Another study from Indeed, the online job portal giant, revealed that machine learning engineers, data scientists and software engineers with these skills are topping the list of most in-demand professionals.High profile companies such as Univa, Microsoft, Apple, Google, and Amazon have invested millions of dollars on machine learning research and designing and are developing their future projects on it.With so much happening around machine learning, it is no surprise that any enthusiast who is keen on shaping their career in software programming and technology would prefer machine learning as a foundation to their career. This post is specifically aimed at guiding such enthusiasts and gives comprehensive information on skills that are needed to become a machine learning engineer, who is ready to dive into the real-time challenges.Machine Learning SkillsOrganizations are showing massive interest in using machine learning in their products, which would in turn bring plenty of opportunities for machine learning enthusiasts.When you ask machine learning engineers the question – “What do you do as a machine learning engineer?”, chances are high that individual answers would differ from one professional to another. This may sound a little puzzling, but yes, this is true!Hence, a beginner to machine learning needs to have a clear understanding that there are different roles that they can perform with machine learning skills. And accordingly the skill set that they should possess, would differ. This section will give clarity on machine learning skills that are needed to perform various machine learning roles.Broadly, three main roles come into the picture when you talk about machine learning skills:Data EngineerMachine Learning EngineerMachine Learning ScientistOne must understand that data science, machine learning and artificial intelligence are interlinked. The following quote explains this better:Data science produces insights. Machine learning produces predictions. Artificial intelligence produces actions.A machine learning engineer is someone who deals with huge volumes of data to train a machine and impart it with knowledge that it uses to perform a specified task. However, in practice, there may be a little more to add to this:Machine Learning RoleSkills RequiredRoles and ResponsibilitiesData EngineerPython, R, and DatabasesParallel and distributed Knowledge on quality and reliabilityVirtual machines and cloud environmentMapReduce and HadoopCleaning, manipulating and extracting the required data   Developing code for data analysis and manipulationPlays a major role in statistical analysis of dataMachine Learning EngineerConcepts of computer science and software engineeringData analysis and feature engineeringMetrics involved in MLML algorithm selection, and cross validationMath and StatisticsAnalyses and checks the suitability of an algorithm if it caters the needs of the current taskPlays main role in deciding and selecting machine learning libraries for given task.Machine Learning ScientistExpert knowledge in:Robotics and Machine LearningCognitive ScienceEngineeringMathematics and mathematical modelsDesigns new models and algorithms of machine learningResearches intensively on machine learning and publishes their research papers.Thus, gaining machine learning skills should be a task associated with clarity on the job role and of course the passion to learn them. As it is widely known, becoming a machine learning engineer is not a straightforward task like becoming a web developer or a tester.Irrespective of the role, a learner is expected to have solid knowledge on data science. Besides, many other subjects are intricately intertwined in learning machine learning and for a learner it requires a lot of patience and zeal to learn skills and build them up as they move ahead in their career.The following diagram shows the machine learning skills that are in demand year after year:AI - Artificial IntelligenceTensorFlowApache KafkaData ScienceAWS - Amazon Web Services                                                                                                                                                                                                                                                                                                                                Image SourceIn the coming sections, we would be discussing each of these skills in detail and how proficient you are expected to be in them.Technical skills required to become ML EngineerBecoming a machine learning engineer means preparing oneself to handle interesting and challenging tasks that would change the way humanity is experiencing things right now. It demands both technical and non-technical expertise. Firstly, let’s talk about the technical skills needed for a machine learning engineer. Here is a list of technical skills a machine learning engineer is expected to possess:Applied MathematicsNeural Network ArchitecturesPhysicsData Modeling and EvaluationAdvances Signal Processing TechniquesNatural Language ProcessingAudio and video ProcessingReinforcement LearningLet us delve into each skill in detail now:1.Applied MathematicsMathematics plays an important role in machine learning, and hence it is the first one on the list. If you wish to see yourself as a proven machine learning engineer, you ought to love math and be an expert in the following specializations of math.But first let us understand why a machine learning engineer would need math at all? There are many scenarios where a machine learning engineer should depend on math. For example:Choosing the right algorithm that suits the final needsUnderstanding and working with parameters and their settings.Deciding on validation strategiesApproximating the confidence intervals.How much proficiency in Math does a machine learning engineer need to have?It depends on the level at which a machine learning engineer works. The following diagram gives an idea about how important various concepts of math are for a machine learning enthusiast.A) Linear algebra: 15%B) Probability Theory and Statistics: 25%C) Multivariate Calculus: 15%D) Algorithms and Optimization: 15%F) Other concepts: 10%Data SourceA) Linear AlgebraThis concept plays a main role in machine learning. One has to be skilled in the following sub-topics of linear algebra:Principal Component Analysis (PCA), Singular Value Decomposition (SVD)Eigen decomposition of a matrixLU DecompositionQR Decomposition/FactorizationSymmetric MatricesOrthogonalization & OrthonormalizationMatrix OperationsProjectionsEigenvalues & EigenvectorsVector Spaces and NormsB) Probability Theory and StatisticsThe core aim of machine learning is to reduce the probability of error in the final output and decision making of the machine. Thus, it is no wonder that probability and statistics play a major role.The following topics are important in these subjects:CombinatoricsProbability Rules & AxiomsBayes’ TheoremRandom VariablesVariance and ExpectationConditional and Joint DistributionsStandard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian)Moment Generating Functions, Maximum Likelihood Estimation (MLE)Prior and PosteriorMaximum a Posteriori Estimation (MAP)Sampling Methods.C) CalculusIn calculus, the following concepts have notable importance in machine learning:Integral CalculusPartial Derivatives,Vector-Values FunctionsDirectional GradientHessian, Jacobian, Laplacian and Lagrangian Distributions.D) Algorithms and OptimizationThe scalability and the efficiency of computation of a machine learning algorithm depends on the chosen algorithm and optimization technique adopted. The following areas are important from this perspective:Data structures (Binary Trees, Hashing, Heap, Stack etc)Dynamic ProgrammingRandomized & Sublinear AlgorithmGraphsGradient/Stochastic DescentsPrimal-Dual methodsE) Other ConceptsBesides, the ones mentioned above, other concepts of mathematics are also important for a learner of machine learning. They are given below:Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions Limits, Cauchy Kernel, Fourier Transforms)Information Theory (Entropy, Information Gain)Function Spaces and Manifolds2.Neural Network ArchitecturesNeural networks are the predefined set of algorithms for implementing machine learning tasks. They offer a class of models and play a key role in machine learning.The following are the key reasons why a machine learning enthusiast needs to be skilled in neural networks:Neural networks let one understand how the human brain works and help to model and simulate an artificial one.Neural networks give a deeper insight of parallel computations and sequential computationsThe following are the areas of neural networks that are important for machine learning:Perceptrons Convolutional Neural Networks Recurrent Neural NetworkLong/Short Term Memory Network (LSTM)Hopfield Networks Boltzmann Machine NetworkDeep Belief NetworkDeep Auto-encoders3.PhysicsHaving an idea of physics definitely helps a machine learning engineer. It makes a difference in designing complex systems and is a skill that is a definite bonus for a machine learning enthusiast.4.Data Modeling and EvaluationA machine learning has to work with huge amounts of data and leverage them into predictive analytics. Data modeling and evaluation is important in working with such bulky volumes of data and estimating how good the final model is.For this purpose, the following concepts are worth learnable for a machine learning engineer:Classification AccuracyLogarithmic LossConfusion MatrixArea under CurveF1 ScoreMean Absolute ErrorMean Squared Error5.Advanced Signal Processing TechniquesThe crux of signal processing is to minimize noise and extract the best features of a given signal.For this purpose, it uses certain concepts such as:convex/greedy optimization theory and algorithmsspectral time-frequency analysis of signalsAlgorithms such as wavelets, shearlets, curvelets, contourlets, bandlets, etc.All these concepts find their application in machine learning as well.6. Natural language processingThe importance of natural language processing in artificial intelligence and machine learning is not to be forgotten. Various libraries and techniques of natural language processing used in machine learning are listed here:Gensim and NLTKWord2vecSentiment analysisSummarization7. Audio and Video ProcessingThis differs from natural language processing in the sense that we can apply audio and video processing on audio signals only. For achieving this, the following concepts are essential for a machine learning engineer:Fourier transformsMusic theoryTensorFlow8. Reinforcement LearningThough reinforcement learning plays a major role in learning and understanding deep learning and artificial intelligence, it is good for a beginner of machine learning to know the basic concepts of reinforcement learning.Programming skills required to become ML EngineerMachine learning, ultimately, is coding and feeding the code to the machines and getting them to do the tasks we intend them to do. As such, a machine learning engineer should have hands-on expertise in software programming and related concepts. Here is a list of programming skills a machine learning engineer is expected to have knowledge on:Computer Science Fundamentals and ProgrammingSoftware Engineering and System DesignMachine Learning Algorithms and LibrariesDistributed computingUnixLet us look into each of these programming skills in detail now:1.Computer Science Fundamentals and ProgrammingIt is important that a machine learning engineer apply the concepts of computer science and programming correctly as the situation demands. The following concepts play an important role in machine learning and are a must on the list of the skillsets a machine learning engineer needs to have:Data structures (stacks, queues, multi-dimensional arrays, trees, graphs)Algorithms (searching, sorting, optimization, dynamic programming)Computability and complexity (P vs. NP, NP-complete problems, big-O notation, approximate algorithms, etc.)Computer architecture (memory, cache, bandwidth, deadlocks, distributed processing, etc.)2.Software Engineering and System DesignWhatever a machine learning engineer does, ultimately it is a piece of software code – a beautiful conglomerate of many essential concepts and the one that is entirely different from coding in other software languages.Hence, it is quintessential that a machine learning engineer have solid knowledge of the following areas of software programming and system design:Scaling algorithms with the size of dataBasic best practices of software coding and design, such as requirement analysis, version control, and testing.Communicating with different modules and components of work using library calls, REST APIs and querying through databases.Best measures to avoid bottlenecks and designing the final product such that it is user-friendly.3. Machine Learning Algorithms and LibrariesA machine learning engineer may need to work with multiple packages, libraries, algorithms as a part of day-to-day tasks. It is important that a machine learning engineer is well-versed with the following aspects of machine learning algorithms and libraries:A thorough idea of various learning procedures including linear regression, gradient descent, genetic algorithms, bagging, boosting, and other model-specific methods.Sound knowledge in packages and APIs such as scikit-learn, Theano, Spark MLlib, H2O, TensorFlow, etc.Expertise in models such as decision trees, nearest neighbor, neural net, support vector machine and a knack to deciding which one fits the best.Deciding and choosing hyperparameters that affect learning model and the outcome.Comfortable to work with concepts such as gradient descent, convex optimization, quadratic programming, partial differential equations.Select an algorithm which yields the best performance from random forests, support vector machines (SVMs), and Naive Bayes Classifiers, etc.4   Distributed Computing Working as a machine learning engineer means working with huge sets of data, not just focused on one isolated system, but spread among a cluster of systems. For this purpose, it is important that a machine learning engineer knows the concepts of distributed computing.5. UnixMost clusters and servers that machine learning engineers need to work are variants of Linux(Unix). Though randomly they work on Windows and Mac, more than half of the time, they need to work on Unix systems only. Hence having sound knowledge on Unix and Linux is a key skill to become a machine learning engineer.Programming Languages for Machine LearningMachine learning engineers need to code to train machines. Several programming languages can be used to do this. The list of programming languages that a machine learning expert should essentially know are as under:C, C++ and JavaSpark and HadoopR ProgrammingApache KafkaPythonWeka PlatformMATLAB/OctaveIn this section, let us know in detail why each of these programming languages is important for a machine learning engineer:1.C, C++ and JavaThese languages give essentials of programming and teach many concepts in a simple manner that form a foundation stone for working on complex programming patterns of machine learning. Knowledge of C++ helps to improve the speed of the program, while Java is needed to work with Hadoop and Hive, and other tools that are essential for a machine learning engineer.2.Spark and HadoopHadoop skills are needed for working in a distributed computing environment. Spark, a recent variant of Hadoop is gaining popularity among the machine learning tribe. It is a framework to implement machine learning on a large scale.3.R ProgrammingR is a programming language built by statisticians specifically to work with programming that involves statistics. Many mathematical computations of machine learning are based on statistics; hence it is no wonder that a machine learning engineer needs to have sound knowledge in R programming.4.Apache KafkaApache Kafka concepts such as Kafka Streams and KSQL play a major role in pre-processing of data in machine learning. Also, a sound knowledge of Apache Kafka lets a machine learning engineer to design solutions that are both multi-cloud based or hybrid cloud-based.  Other concepts such as business information such as latency and model accuracy are also from Kafka and find use in Machine learning.5.PythonOf late, Python has become the unanimous programming language for machine learning. In fact, experts quote that humans communicate with machines through Python language.Why Python is preferred for Machine Learning?Python Programming Language has several key features and benefits that make it the monarch of programming languages for machine learning:It is an all-in-one purpose programming language that can do a lot more than dealing with statistics.It is beginner friendly and easy to learn.It boasts of rich libraries and APIs that solve various needs of machine learning pretty easily.Its productivity is higher than its other counterparts.It offers ease of integration and gets the workflow smoothly from the designing stage to the production stage.Python EcoSystemThere are various components of Python that make it preferred language for machine learning. Such components are discussed below:Jupyter NotebookNumpyPandasScikit-LearnTensorFlow1.Jupyter NotebookJupyter offers excellent computational environment for Python based data science applications. With the help of Jupyter notebook, a machine learning engineer can illustrate the flow of the process step-by-step very clearly.2.NumPyNumPy or Numerical Python is one of the components of Python that allows the following operations of machine learning in a smooth way:Fourier transformationLinear algebraic operationsLogical and numerical operations on arrays.Of late, NumPy is gaining attention because it makes an excellent substitute to MATLAB, as it coordinates with Matplotlib and SciPy very smoothly.3.PandasPandas is a Python library that offers various features for loading, manipulating, analysing, modeling and preparing data. It is entirely dedicated for data analysis and manipulation.4.Scikit-learnBuilt on NumPy, SciPy, and Matplotlib, it is an open-source library of Python. It offers excellent features and functionalities for major aspects of machine learning such as clustering, dimensionality reduction, model reduction, regression and classification.5.TensorFlowTensorFlow is another framework of Python. It finds its usage in deep learning and having a knowledge of its libraries such as Keras, helps a machine learning engineer to move ahead confidently in their career.6.Weka PlatformIt is widely known that machine learning is a non-linear process that involves many iterations. Weka or Waikato Environment for Knowledge Analysis is a recent platform that is designed specifically designed for applied machine learning. This tool is also slowing gaining its popularity and thus is a must-include on the list of skills for a machine learning engineer.7.MATLAB/OctaveThis is a basic programming language that was used for simulation of various engineering models. Though not popularly used in machine learning, having sound knowledge in MATLAB lets one learns the other mentioned libraries of Python easily.Soft skills or behavioural skills required to become ML engineerTechnical skills are relevant only when they are paired with good soft skills. And the machine learning profession is no exception to this rule. Here is a list of soft skills that a machine learning engineer should have:Domain knowledgeCommunication SkillsProblem-solving skillsRapid prototypingTime managementLove towards constant learningLet us move ahead and discuss how each of these skills make a difference to a machine learning engineer.1.Domain knowledgeMachine learning is such a subject that needs the best of its application in real-time. Choosing the best algorithm while solving a machine learning problem in your academia is far different from what you do in practice. Various aspects of business come into picture when you are a real-time machine learning engineer. Hence, a solid understanding of the business and domain of machine learning is of utmost importance to succeed as a good machine learning engineer.2.Communication SkillsAs a machine learning engineer, you need to communicate with offshore teams, clients and other business teams. Excellent communication skills are a must to boost your reputation and confidence and to bring up your work in front of peers.3.Problem-solving skillsMachine learning is all about solving real time challenges. One must have good problem-solving skills and be able to weigh the pros and cons of the given problem and apply the best possible methods to solve it.4.Rapid PrototypingChoosing the correct learning method or the algorithm are signs of a machine learning engineer’s good prototyping skills. These skills would be a great saviour in real time as they would show a huge impact on budget and time taken for successfully completing a machine learning project.5.Time managementTraining a machine is not a cake-walk. It takes huge time and patience to train a machine. But it’s not always that machine learning engineers are allotted ample time for completing tasks. Hence, time management is an essential skill a machine learning professional should have to effectively deal with bottlenecks and deadlines.6.Love towards constant learningSince its inception, machine learning has witnessed massive change – both in the way it is implemented and in its final form. As we have seen in the previous section, technical and programming skills that are needed for machine learning are constantly evolving. Hence, to prove oneself a successful machine learning expert, it is very crucial that they have a zeal to update themselves – constantly!ConclusionThe skills that one requires to begin their journey in machine learning are exactly what we have discussed in this post. The future for machine learning is undoubtedly bright with companies ready to offer millions of dollars as remuneration, irrespective of the country and the location.Machine learning and deep learning will create a new set of hot jobs in the next five years. – Dave WatersAll it takes to have an amazing career in machine learning is a strong will to hone one’s skills and gain a solid knowledge of them. All the best for an amazing career in machine learning!
Rated 4.5/5 based on 43 customer reviews
13787
Top 30 Machine Learning Skills required to get a M...

Machine learning has been making a silent revoluti... Read More

Overfitting and Underfitting With Algorithms

Curve fitting is the process of determining the best fit mathematical function for a given set of data points. It examines the relationship between multiple independent variables (predictors) and a dependent variable (response) in order to determine the “best fit” line.In the figure shown, the red line represents the curve that is the best fit for the given purple data points. It can also be seen that curve fitting does not necessarily mean that the curve should pass over each and every data point. Instead, it is the most appropriate curve that represents all the data points adequately.Curve Fitting vs. Machine LearningAs discussed, curve fitting refers to finding the “best fit” curve or line for a given set of data points. Even though this is also what a part of Machine Learning or Data Science does, the applications of Machine Learning or Data Science far outweigh that of Curve Fitting.The major difference is that during Curve Fitting, the entire data is available to the developer. However, when it comes to Machine Learning, the amount of data available to the developer is only a part of the real-world data on which the Fitted Model will be applied.Even then, Machine Learning is a vast interdisciplinary field and it consists of a lot more than just “Curve Fitting”. Machine Learning can be broadly classified into Supervised, Unsupervised and Reinforcement Learning. Considering the fact that most of the real-world problems are solved by Supervised Learning, this article concentrates on Supervised Learning itself.Supervised learning can be further classified into Classification and Regression. In this case, the work done by Regression is similar to what Curve Fitting achieves. To get a broader idea, let’s look at the difference between Classification and Regression:ClassificationRegressionIt is the process of separating/classifying two or more types of data into separate categories or classes based on their characteristics.It is the process of determining the “Best Fit” curve for the given data such that, on unseen data, the data points lying on the curve accurately represent the desired result.The output values are discrete in nature (eg. 0, 1, 2, 3, etc) and are known as “Classes”.The output values are continuous in nature (eg. 0.1, 1.78, 9.54, etc).Here, the two classes (red and blue colored points) are clearly separated by the line(s) in the middle. This is an example of classification.Here, the curve represented by the magenta line is the “Best Fit” line for all the data points as shown. This is an example of Regression.Noise in DataThe data that is obtained from the real world is not ideal or noise-free. It contains a lot of noise, which needs to be filtered out before applying the Machine Learning Algorithms.As shown in the above image, the few extra data points in the top of the left graph represent unnecessary noise, which in technical terms is known as “Outliers”. As shown in the difference between the left and the right graphs, the presence of outliers makes a considerable amount of difference when it comes to the determination of the “Best Fit” line. Hence, it is of immense importance to apply preprocessing techniques in order to remove outliers from the data.Let us look at two of the most common types of noise in Data:Outliers: As already discussed, outliers are data points which do not belong to the original set of data. These data points are either too high or too low in value, such that they do not belong to the general distribution of the rest of the dataset. They are usually due to misrepresentation or an accidental entry of wrong data. There are several statistical algorithms which are used to detect and remove such outliers.Missing Data: In sharp contrast to outliers, missing data is another major challenge when it comes to the dataset. The occurrence is quite common in tabular datasets (eg. CSV files) and is a challenge if the number of missing data points exceeds 10% of the total size of the dataset. Most Machine Learning algorithms fail to perform on such datasets. However, certain algorithms such as Decision Trees are quite resilient when it comes to data with missing data and are able to provide accurate results even when supplied with such noisy datasets. Similar to Outliers, there are statistical methods to handle missing data or “NaN” (Not a Number) values. The most common of them is to remove or “drop” the row containing the missing data. Training of Data“Training” is terminology associated with Machine Learning and it basically means the “Fitting” of data or “Learning” from data. This is the step where the Model starts to learn from the given data in order to be able to predict on similar but unseen data. This step is crucial since the final output (or Prediction) of the model will be based on how well the model was able to acquire the patterns of the training data.Training in Machine Learning: Depending on the type of data, the training methodology varies. Hence, here we assume simple tabular (eg. CSV) text data. Before the model can be fitted on the data, there are a few steps that have to be followed:Data Cleaning/Preprocessing: The raw data that is thus obtained from the real-world is likely to contain a good amount of noise in it. In addition to that, the data might not be homogenous, which means, the values of different “features” might belong to different ranges. Hence, after the removal of noise, the data needs to be normalized or scaled in order to make it homogeneous.Feature Engineering: In a tabular dataset, all the columns that describe the data are called “Features”. These features are necessary to correctly predict the target value. However, data often contains columns which are irrelevant to the output of the model. Hence, these columns need to be removed or statistically processed to make sure that they do not interfere with the training of the model on features that are relevant. In addition to the removal of irrelevant features, it is often required to create new relevant features from the existing features. This allows the model to learn better and this process is also called “Feature Extraction”.Train, Validation and Test Split: After the data has been preprocessed and is ready for training, the data is split into Training Data, Validation Data and Testing Data in the ratio of 60:20:20 (usually). This ratio varies depending on the availability of data and on the application. This is done to ensure that the model does not unnecessarily “Overfit” or “Underfit”, and performs equally well when deployed in the real world.Training: Finally, as the last step,  the Training Data is fed into the model to train upon. Multiple models can be trained simultaneously and their performance can be measured against each other with the help of the Validation Set, based on which the best model is selected. This is called “Model Selection”. Finally, the selected model is used to predict on the Test Set to get a final test score, which more or less accurately defines the performance of the model on the given dataset.Training in Deep Learning: Deep Learning is a part of machine learning, but instead of relying on statistical methods, Deep Learning Techniques largely depend on calculus and aims to mimic the Neural structure of the biological brain, and hence, are often referred to as Neural Networks.The training process for Deep Learning is quite similar to that of Machine Learning except that there is no need for “Feature Engineering”. Since deep learning models largely rely on weights to specify the importance of given input (feature), the model automatically tends to learn which features are relevant and which feature is not. Hence, it assigns a “high” weight to the features that are relevant and assigns a “low” weight to the features that are not relevant. This removes the need for a separate Feature Engineering.This difference is correctly portrayed in the following figure:Improper Training of Data: As discussed above, the training of data is the most crucial step of any Machine Learning Algorithm. Improper training can lead to drastic performance degradation of the model on deployment. On a high level, there are two main types of outcomes of Improper Training: Underfitting and Overfitting.UnderfittingWhen the complexity of the model is too less for it to learn the data that is given as input, the model is said to “Underfit”. In other words, the excessively simple model fails to “Learn” the intricate patterns and underlying trends of the given dataset. Underfitting occurs for a model with Low Variance and High Bias.Underfitting data Visualization: With the initial idea out of the way, visualization of an underfitting model is important. This helps in determining if the model is underfitting the given data during training. As already discussed, supervised learning is of two types: Classification and Regression. The following graphs show underfitting for both of these cases:Classification: As shown in the figure below, the model is trained to classify between the circles and crosses. However, it is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.Regression: As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit” properly to the given data due to low model complexity.Detection of underfitting model: The model may underfit the data, but it is necessary to know when it does so. The following steps are the checks that are used to determine if the model is underfitting or not.Training and Validation Loss: During training and validation, it is important to check the loss that is generated by the model. If the model is underfitting, the loss for both training and validation will be significantly high. In terms of Deep Learning, the loss will not decrease at the rate that it is supposed to if the model has reached saturation or is underfitting.Over Simplistic Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is over-simplistic (as shown in the image above), then the model is suffering from underfitting. A more complex model is to be tried out.Classification: A lot of classes will be misclassified in the training set as well as the validation set. On data visualization, the graph would indicate that if there was a more complex model, more classes would have been correctly classified.Regression: The final “Best Fit” line will fail to fit the data points in an effective manner. On visualization, it would clearly seem that a more complex curve can fit the data better.Fix for an underfitting model: If the model is underfitting, the developer can take the following steps to recover from the underfitting state:Train Longer: Since underfitting means less model complexity, training longer can help in learning more complex patterns. This is especially true in terms of Deep Learning.Train a more complex model: The main reason behind the model to underfit is using a model of lesser complexity than required for the data. Hence, the most obvious fix is to use a more complex model. In terms of Deep Learning, a deeper network can be used.Obtain more features: If the data set lacks enough features to get a clear inference, then Feature Engineering or collecting more features will help fit the data better.Decrease Regularization: Regularization is the process that helps Generalize the model by avoiding overfitting. However, if the model is learning less or underfitting, then it is better to decrease or completely remove Regularization techniques so that the model can learn better.New Model Architecture: Finally, if none of the above approaches work, then a new model can be used, which may provide better results.OverfittingWhen the complexity of the model is too high as compared to the data that it is trying to learn from, the model is said to “Overfit”. In other words, with increasing model complexity, the model tends to fit the Noise present in data (eg. Outliers). The model learns the data too well and hence fails to Generalize. Overfitting occurs for a model with High Variance and Low Bias.Overfitting data Visualization: With the initial idea out of the way, visualization of an overfitting model is important. Similar to underfitting, overfitting can also be showcased in two forms of supervised learning: Classification and Regression. The following graphs show overfitting for both of these cases:Classification: As shown in the figure below, the model is trained to classify between the circles and crosses, and unlike last time, this time the model learns too well. It even tends to classify the noise in the data by creating an excessively complex model (right).Regression: As shown in the figure below, the data points are laid out in a given pattern, and instead of determining the least complex model that fits the data properly, the model on the right has fitted the data points too well when compared to the appropriate fitting (left).Detection of overfitting model: The parameters to look out for to determine if the model is overfitting or not is similar to those of underfitting ones. These are listed below:Training and Validation Loss: As already mentioned, it is important to measure the loss of the model during training and validation. A very low training loss but a high validation loss would signify that the model is overfitting. Additionally, in Deep Learning, if the training loss keeps on decreasing but the validation loss remains stagnant or starts to increase, it also signifies that the model is overfitting.Too Complex Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is too complex to be the simplest solution which fits the data points appropriately, then the model is overfitting.Classification: If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting.Regression: If the final “Best Fit” line crosses over every single data point by forming an unnecessarily complex curve, then the model is likely overfitting.Fix for an overfitting model: If the model is overfitting, the developer can take the following steps to recover from the overfitting state:Early Stopping during Training: This is especially prevalent in Deep Learning. Allowing the model to train for a high number of epochs (iterations) may lead to overfitting. Hence it is necessary to stop the model from training when the model has started to overfit. This is done by monitoring the validation loss and stopping the model when the loss stops decreasing over a given number of epochs (or iterations).Train with more data: Often, the data available for training is less when compared to the model complexity. Hence, in order to get the model to fit appropriately, it is often advisable to increase the training dataset size.Train a less complex model: As mentioned earlier, the main reason behind overfitting is excessive model complexity for a relatively less complex dataset. Hence it is advisable to reduce the model complexity in order to avoid overfitting. For Deep Learning, the model complexity can be reduced by reducing the number of layers and neurons.Remove features: As a contrast to the steps to avoid underfitting, if the number of features is too many, then the model tends to overfit. Hence, reducing the number of unnecessary or irrelevant features often leads to a better and more generalized model. Deep Learning models are usually not affected by this.Regularization: Regularization is the process of simplification of the model artificially, without losing the flexibility that it gains from having a higher complexity. With the increase in regularization, the effective model complexity decreases and hence prevents overfitting.Ensembling: Ensembling is a Machine Learning method which is used to combine the predictions from multiple separate models. It reduces the model complexity and reduces the errors of each model by taking the strengths of multiple models. Out of multiple ensembling methods, two of the most commonly used are Bagging and Boosting.GeneralizationThe term “Generalization” in Machine Learning refers to the ability of a model to train on a given data and be able to predict with a respectable accuracy on similar but completely new or unseen data. Model generalization can also be considered as the prevention of overfitting of data by making sure that the model learns adequately.Generalization and its effect on an Underfitting Model: If a model is underfitting a given dataset, then all efforts to generalize that model should be avoided. Generalization should only be the goal if the model has learned the patterns of the dataset properly and needs to generalize on top of that. Any attempt to generalize an already underfitting model will lead to further underfitting since it tends to reduce model complexity.Generalization and its effect on Overfitting Model: If a model is overfitting, then it is the ideal candidate to apply generalization techniques upon. This is primarily because an overfitting model has already learned the intricate details and patterns of the dataset. Applying generalization techniques on this kind of a model will lead to a reduction of model complexity and hence prevent overfitting. In addition to that, the model will be able to predict more accurately on unseen, but similar data.Generalization Techniques: There are no separate Generalization techniques as such, but it can easily be achieved if a model performs equally well in both training and validation data. Hence, it can be said that if we apply the techniques to prevent overfitting (eg. Regularization, Ensembling, etc.) on a model that has properly acquired the complex patterns, then a successful generalization of some degree can be achieved.Relationship between Overfitting and Underfitting with Bias-Variance TradeoffBias-Variance Tradeoff: Bias denotes the simplicity of the model. A high biased model will have a simpler architecture than that of a model with a lower bias. Similarly, complementing Bias, Variance denotes how complex the model is and how well it can fit the data with a high degree of diversity.An ideal model should have Low Bias and Low Variance. However, when it comes to practical datasets and models, it is nearly impossible to achieve a “zero” Bias and Variance. These two are complementary of each other, if one decreases beyond a certain limit, then the other starts increasing. This is known as the Bias-Variance Tradeoff. Under such circumstances, there is a “sweet spot” as shown in the figure, where both bias and variance are at their optimal values.Bias-Variance and Generalization: As it is clear from the above graph, the Bias and Variance are linked to Underfitting and Overfitting.  A model with high Bias means the model is Underfitting the given data and a model with High Variance means the model is Overfitting the given data.Hence, as it can be seen, at the optimal region of the Bias-Variance tradeoff, the model is neither underfitting nor overfitting. Hence, since there is neither underfitting nor overfitting, it can also be said that the model is most Generalized, as under these conditions the model is expected to perform equally well on Training and Validation Data. Thus, the graph depicts that the Generalization Error is minimum at the optimal value of the degree of Bias and Variance.ConclusionTo summarize, the learning capabilities of a model depend on both, model complexity and data diversity. Hence, it is necessary to keep a balance between both such that the Machine Learning Models thus trained can perform equally well when deployed in the real world.In most cases, Overfitting and Underfitting can be taken care of in order to determine the most appropriate model for the given dataset. However, even though there are certain rule-based steps that can be followed to improve a model, the insight to achieve a properly Generalized model comes with experience.
Rated 4.5/5 based on 3 customer reviews
4873
Overfitting and Underfitting With Algorithms

Curve fitting is the process of determining the be... Read More

20% Discount