Takeaways from the article
Machine Learning has remained a hot topic since many years. Many know how to make sense of it, and where it can actually be used. It is not a universal solution to all the challenging problems out there (that are difficult to be solved) in the universe. It can only be used when certain conditions are satisfied. Only then does a problem qualify to be solved using a Machine Learning algorithm. In general, Python is the most preferred language to work with algorithms that involve Machine Learning.
Machine Learning, also known as ML in short, is a sub-topic that falls under Artificial Intelligence (AI), to achieve specific goals. ML is the art of understanding or designing an algorithm that can be used to process large or small amounts of data. This algorithm will not explicitly define or set the rules for the machine to learn from the data. The machine learns from the data on its own. There are no ‘if’ or ‘else’ statements to guide the machine.
This is very much similar to how humans learn from their experiences in day-to-day life, how a child learns to ride a bike, how a child learns to read letters, then words, then sentences, and conversations.
Python has been used to implement machine learning algorithms, since it is open-source, extremely popular and has gained immense support from the community as well. In addition to this, there are loads of packages in Python, and they support usage of machine learning algorithms for a variety of version of Python application.
These algorithms can be implemented in python by calling simple functions and these functions are placed inside classes. In turn, these classes are encapsulated in a module as a package.
The ‘scikit-learn’ package for Python is one of the most popular and has most of the machine learning algorithms pre-implemented, and housed inside packages. To implement an algorithm, the package can be imported (or a specific class from the package can be imported) and it can be bound with the variable or the class object using a dot operator and accessed. In general, to begin implementing any machine learning algorithm, the following steps can serve as a blue-print:
Define your problem, and confirm that it can be solved using machine learning (so that it is not a trivial “set of rules” related problem)
Prepare the data: In this step, the data needed for this model is collected from various resources. Another way is to generate data using the innumerable functions that are present in Python. In either case, the data has to be cleaned, structured, analysed, and the outliers have to be identified. Also, the data has to be pre-processed so that it is easy for the algorithm to build a model based on the data. Certain irrelevant columns maybe removed, and missing data should be handled.
The data needs to be trained and hyperparameters need to be tuned so as to get better prediction accuracy.
Note: It is understood that the users have Python 3.5 or a higher stable version installed on their workstations before beginning to execute the code in the upcoming sections. Other packages can be installed as and when required.
Let us jump into a simple problem of linear regression using Machine learning, Linear regression is a simple algorithm that predicts the value of a variable, based on certain other values. There are many variations to Linear Regression that includes Multi-variate regression, etc.
Before jumping into the algorithm, let us understand what linear regression means. ‘Linear’ basically means a straight line, and ‘regression’ which is a part of machine learning, talks about how tasks can be solved without explicitly being programmed.
There are various machine learning algorithms, and Linear Regression is just the beginning to it. This includes supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning.
Certain task needs intricate detailing, and patterns might not be fully unveiled if manual or simple methods are used to extract patterns. Machine learning, on the other hand, will be able to extract all important, hidden patterns, and work well even when the amount of data increases exponentially. It also becomes easy to improve pattern recognition. It will also be possible to deliver results in a time manner, get deeper and better insights into the data in hand.
The results computed using a Machine Learning algorithm would be more accurate in comparison to traditional methods, and the models build can serve as a foundation for other data as well. There are different classifications in machine learning, depending on various types. The 4 basic classifications are:
Machine learning algorithms can also be classified based on how they learn- on the fly or incrementally, into 2 types:
Machine learning algorithms can also be classified based on how they detect patterns- whether they detect patterns in data or compare new data values with previously seen data values:
Supervised learning involves human supervision. In real-time, supervision is present in the form of labelled features, feedback loop to the data (insights on whether the machine predicted correctly, and if not, what the correct prediction has to be) and so on.
Once the algorithm is trained on such data, it can predict good outputs with a high accuracy for never-before-seen inputs.
Applications of supervised learning:
Supervised algorithms can further be classified into two types:
Applications of semi-supervised learning algorithms:
Applications of unsupervised learning:
Unsupervised data can be classified into two categories:
Supervised learning algorithm is different from reinforcement, since the former has a comparable value, whereas the latter has to decide the next action and take it and bear the result and learn from it.
Applications of reinforcement learning:
Other types of learning algorithms
Machine learning models that are trained consistently and constantly on new data to predict output. On the other hand, during this period, the model is getting trained on new data in real time. Whenever the model sees a new example, it quickly has to learn from it and adapt to it. This way, even the newly learnt example will be a part of the trained model, and will be a part of giving the prediction/output.
This is also known as data learning in a group.
Data is grouped/classified into different batches.
There batches are used to extract different patterns since every batch would be considerably different from the other one. These patterns are learned by the model in time.
The specifications associated with a problem in a domain is converted into a model-format. When this model sees new data, it detects patterns from it, and these patterns are used to make predictions on the newly seen data.
It is the simplest form of clustering and regression algorithms.
They either result in grouping the algorithm into different classes (due to classification) or give continuous or discrete values as output (due to linear or logistic regression).
Classification and regression is based on how similar or different the queries are, with respect to the values in the data.
In this algorithm, we will understand the problems with two different variables in hand- one is an independent variable, and the other one- a dependant variable. We will take a basic problem of finding prices of a house when its area is given. Assume that we have the below dataset:
|Price of house (independent value)||Area of the house (dependant value)|
|356||500 sq m|
|578||1000 sq m|
|890||1500 sq m|
|1300||2000 sq m|
|1800||2500 sq m|
|?||3000 sq m|
When the above data is given, and the price of house is asked to be found (see last row), given the area of the house, simple linear regression (that gives a decent amount of accuracy) can be used. Below is how the data will look when plotted on a graph. It yields an almost straight line, which means the dependant value depends on the independent value, i.e the area of the house matters when the price of the house is being fixed.
The basic steps involved in a machine learning problem-
Code to implement linear regression using Python
import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import mean_squared_error, r2_score from sklearn.linear_model import LinearRegression #Random data set generated np.random.seed(0) x_dep = np.random.rand(100, 1) y_indep = 5.89 + (2.45)* x_dep + np.random.rand(100, 1) #The model is initialized using LinearRegression that is present in the scikit-learn package model_of_regression = LinearRegression() #The data is fit on the model, with the help of training model_of_regression.fit(x_dep, y_indep) #The output is predicted predicted_y_val = model_of_regression.predict(x_dep) #The model built is evaluated using mean squared error parameter rmse = mean_squared_error(y_indep, predicted_y_val) r2 = r2_score(y_indep, predicted_y_val) print("The value of slope is: ", model_of_regression.coef_) print("The intercept value is: ", model_of_regression.intercept_) print("The Root Mean Squared Error value (RMSE) is: ", rmse) #The data is visualized usign the matplotlib library plt.scatter(x_dep, y_indep, s=8) plt.xlabel('X-axis') plt.ylabel('Y-axis') #The values are predicted and plotted on a graph and displayed on the screen plt.plot(x_dep, predicted_y_val, color='r') plt.show()
Code review-Explanation of every step
In all, Machine Learning is a game changer when it comes to identifying its use cases, and applying the right kind of algorithm in the right place, with the right amount of data, and right computational resources and power. Linear Regression is just a simple algorithm of where Machine Learning begins to show its aspects. Usually, the Python language is used to implement Machine Learning algorithms, but other new languages could also be used.
Your email address will not be published. Required fields are marked *
Data Science has become one of the most popular in... Read More