top

Search

Machine Learning Tutorial

Regression is a part of machine learning that helps in solving tasks which can’t be explicitly programmed. There are various techniques that are used in machine learning. This includes supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. Supervised learning algorithms It is one of the most popular learning methods, since it is easy to understand and relatively easier to implement ad get relevant outputs. Consider this example: How does a child learn? It is taught how to walk, run, talk, and it is made to understand the difference between walking and running. Supervised learning works in a similar way, there is human supervision involved in the form of features being labelled, feedback given to the data (whether it predicted correctly, and if not what the right prediction has to be) and so on. Once the algorithm has been fully trained on such data, it can predict outputs for never-before-seen inputs in-line with the data on which the model was trained with good accuracy. It is also understood as a task-oriented algorithm since it focuses on a single task and is trained on huge number of examples until it predicts output accurately. Supervised learning algorithms can be classified into regression and classification problems. Regression problems include linear regression, logistic regression and classification problems include multi-class classification, decision trees, and much more. Regression problem basically means the model would yield a real value or a continuous value. The simplest model which is used to predict continuous variables is Linear Regression. Linear Regression Linear Regression refers to an approach/algorithm that helps establish a linear relationship between the dependant and the independent variable. As the name indicates, it is a linear process, which means it is 2 dimensional, i.e. it has 2 variables associated with it. These variables have continuous values (in contrast to 0s and 1s in logistic regression). The word ‘regression’ refers to finding relationship between two variables amongst which one is a dependant variable and the other one is independent. How can this relationship be established? In simple words, it goes like this- we will be provided with a basic linear equation, say y = 3x-1. Here ‘y’ is considered to be the dependant variable (since it depends on the value of x) and ‘x’ (trivially) is the independent variable. This means, as and when ‘x’ changes, the value of ‘y’ keeps changing according to the above-mentioned linear equation. Different values for ‘x’ are supplied, which helps calculate various values for ‘y’. The values for ‘x’ and ‘y’ have been shown in a table below: XY122538411514617720These values are plotted on a graph and we try to fit all these points (or most of them) to a straight line. During the process of fitting these values to a straight line, we try and grab most of the points whose vertical distance from the straight line (that is being fit) is minimum. Some points don’t make it on the straight line since they don’t contribute in forming a straight line. These are the ones whose vertical distance from the straight line isn’t the smallest. The idea is to grab all the points in the graph and fit them on a straight line that have minimum vertical distance from the line. Below is an example illustrating the same: When the number of points that don’t contribute to fitting a straight line are more in comparison to the ones that contribute to fitting the line, it is considered that the ‘prediction error’ is more. The ‘error’ basically refers to the shortest distance (vertical distance) between the line and the point. From the above graph, it can be observed that points 1,2,3 and 4 beginning from the bottom left corner don’t really fit the line, and don’t contribute to forming the straight line. When such a linear regression model is trained, it helps calculate an attribute called ‘cost function’ that helps in measuring the ‘Root Mean Squared Error’ or RMSE in short. RMSE basically gives the difference between the values that are predicted and the input values. These values are then normalized by squaring them so as to remove any negative values and calculating the average of these values (i.e. dividing them by the total number of observations) and taking the square root of this value. The resultant is a single number that is used to understand how well the regression algorithm has predicted output for a given input value and how close it is to the actual output. The ‘cost function’ needs to be minimal, thereby corresponding to a minimum difference between the actual value and the predicted value. Logistic Regression It is a supervised classification algorithm that is used to differentiate between different events or values. For example- filtering spam emails, classifying a transaction as legit or fraudulent, and much more. The variable in question is classified as 0 or 1, True or False, Yes or No depending on the input. It is a regression model that helps in building a model that predicts the probability of a data item belonging to a certain category. Logistic Regression uses a ‘sigmoid’ function, which has been defined below: g(z) = 1/ (1+  −  ) Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0. The logistic regression becomes a classification problem when a decision threshold comes into play. Other types of regression include: Polynomial regression Stepwise regression Ridge regression Lasso regression ElasticNet regression The sigmoid function/logistic function looks like below: Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0. The logistic regression becomes a classification problem when a decision threshold comes into play. Logistic Regression from scratch From scratch, it can be implemented without using the scikit-learn module. import numpy as np  import matplotlib.pyplot as plt  import pandas as pd  import scipy.optimize as opt  def data_loading(path, header):  marks_data_frame = pd.read_csv(path, header=header)  return marks_data_frame  if __name__ == "__main__":  # load data from the file  data = data_loading("path to marks.csv file",None) X = feature values, all columns except the last  one X_data = data.iloc[:, :-1]  y = target values, last column of data frame  y_data = data.iloc[:, -1]  filter out the applicants who were eligible admitted = data.loc[y_data == 1]  filter out the applicants who weren’t eligible not_admitted = data.loc[y_data == 0]  plot the insights plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible')  plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible')  plt.legend()  plt.show()  X_data = np.c_[np.ones((X.shape[0], 1)), X_data]  y_data = y_data[:, np.newaxis]  theta = np.zeros((X_data.shape[1], 1))  def sigmoid(x): Activation function that maps a real value between 0 and  1 return 1 / (1 + np.exp(-x))  def total_input(theta, x): Computes weighted sum ofinputs return np.dot(x, theta)  def probability(theta, x):  Returns probability after it goes through sigmoid  function return sigmoid(total_input(theta, x))  def cost_function( theta, x, y):  Cost function for all the training samples is  computed m = x.shape[0]  total_cost = -(1 / m) * np.sum(y * np.log(probability(theta, x)) + (1 - y) * np.log(1 -  probability(theta,x))) return total_cost  def gradient( theta, x, y):  Computes the gradient of the cost function at the point theta m = x.shape[0]  return (1 / m) * np.dot(x.T, sigmoid(total_input(theta, x)) - y)  def fit(x, y, theta):  opt_weights = opt.fmin_tnc(func=cost_function,x0=theta,fprime=gradient,args=(x, y.flatten())) return opt_weights[0]  parameters = fit(X_data, y_data, theta)  x_values = [np.min(X_data[:, 1] - 5), np.max(X_data[:, 2] + 5)]  y_values = - (parameters[0] + np.dot(parameters[1], x_values)) / parameters[2]  plt.plot(x_values, y_values, label='Decision Boundary')  plt.xlabel('Marks in 1st Exam')  plt.ylabel('Marks in 2nd Exam')  plt.legend()  plt.show()  def predict( x):  theta = parameters[:, np.newaxis]  return probability(theta, x)  def accuracy( x, actual_classes, prob_threshold=0.5):  predicted_classes = (predict(x) >= prob_threshold).astype(int)  predicted_classes = predicted_classes.flatten()  accuracy = np.mean(predicted_classes == actual_classes)  return accuracy * 100  accuracy(X_data, y_data.flatten()) 88.88888888888889 Logistic Regression implemented using scikit-learn module It is implemented using MLE (Maximum Likelihood Estimation), which is an iterative process. A random weight/value is provided for the independent variable and this process goes on until an optimal weight is reached after which there is less to no change in the output when the weights change. import numpy as np  import matplotlib.pyplot as plt  import pandas as pd  import scipy  def data_loading(path, header):  marks_data_frame = pd.read_csv(path, header=header)  return marks_data_frame  if __name__ == "__main__":  # load data from the file  data = data_loading("path-to-marks.csv file", None)  X = feature values, all columns except the last one X_data = data.iloc[:, :-1]  y = target values, last column of the data frame  y_data = data.iloc[:, -1]  filter out applicants who are eligible admitted = data.loc[y_data == 1]  filter out applicants who aren’t eligible not_admitted = data.loc[y_data == 0]  plot the insights  plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible')  plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible')  plt.legend()  plt.show()  from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression() model.fit(X_data, y_data)  predicted_classes = model.predict(X_data)  accuracy = accuracy_score(y_data,predicted_classes)  parameters = model.coef_ Output:Applications of logistic regression Weather forecasting Stock prediction Election poll results Conclusion In this post, we understood what Logistic Regression means, and its Python implementation using scikit-learn library as well as from scratch. 
logo

Machine Learning Tutorial

Overview - Regression & Logistic Regression

Regression is a part of machine learning that helps in solving tasks which can’t be explicitly programmed. 

There are various techniques that are used in machine learning. This includes supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. 

Supervised learning algorithms 

It is one of the most popular learning methods, since it is easy to understand and relatively easier to implement ad get relevant outputs. 

Consider this example: How does a child learn? It is taught how to walk, run, talk, and it is made to understand the difference between walking and running. 

Supervised learning works in a similar way, there is human supervision involved in the form of features being labelled, feedback given to the data (whether it predicted correctly, and if not what the right prediction has to be) and so on. 

Once the algorithm has been fully trained on such data, it can predict outputs for never-before-seen inputs in-line with the data on which the model was trained with good accuracy. It is also understood as a task-oriented algorithm since it focuses on a single task and is trained on huge number of examples until it predicts output accurately. 

Supervised learning algorithms can be classified into regression and classification problems. Regression problems include linear regression, logistic regression and classification problems include multi-class classification, decision trees, and much more. 

Regression problem basically means the model would yield a real value or a continuous value. The simplest model which is used to predict continuous variables is Linear Regression. 

Linear Regression 

Linear Regression refers to an approach/algorithm that helps establish a linear relationship between the dependant and the independent variable. 

As the name indicates, it is a linear process, which means it is 2 dimensional, i.e. it has 2 variables associated with it. These variables have continuous values (in contrast to 0s and 1s in logistic regression). The word ‘regression’ refers to finding relationship between two variables amongst which one is a dependant variable and the other one is independent. 

How can this relationship be established? 

In simple words, it goes like this- we will be provided with a basic linear equation, say y = 3x-1. Here ‘y’ is considered to be the dependant variable (since it depends on the value of x) and ‘x’ (trivially) is the independent variable. This means, as and when ‘x’ changes, the value of ‘y’ keeps changing according to the above-mentioned linear equation. Different values for ‘x’ are supplied, which helps calculate various values for ‘y’. The values for ‘x’ and ‘y’ have been shown in a table below: 

XY
12
25
38
411
514
617
720

These values are plotted on a graph and we try to fit all these points (or most of them) to a straight line. During the process of fitting these values to a straight line, we try and grab most of the points whose vertical distance from the straight line (that is being fit) is minimum. Some points don’t make it on the straight line since they don’t contribute in forming a straight line. These are the ones whose vertical distance from the straight line isn’t the smallest. 

The idea is to grab all the points in the graph and fit them on a straight line that have minimum vertical distance from the line. Below is an example illustrating the same: 

When the number of points that don’t contribute to fitting a straight line are more in comparison to the ones that contribute to fitting the line, it is considered that the ‘prediction error’ is more. The ‘error’ basically refers to the shortest distance (vertical distance) between the line and the point. 

From the above graph, it can be observed that points 1,2,3 and 4 beginning from the bottom left corner don’t really fit the line, and don’t contribute to forming the straight line. 

When such a linear regression model is trained, it helps calculate an attribute called ‘cost function’ that helps in measuring the ‘Root Mean Squared Error’ or RMSE in short. RMSE basically gives the difference between the values that are predicted and the input values. These values are then normalized by squaring them so as to remove any negative values and calculating the average of these values (i.e. dividing them by the total number of observations) and taking the square root of this value. 

The resultant is a single number that is used to understand how well the regression algorithm has predicted output for a given input value and how close it is to the actual output. The ‘cost function’ 

needs to be minimal, thereby corresponding to a minimum difference between the actual value and the predicted value. 

Logistic Regression 

It is a supervised classification algorithm that is used to differentiate between different events or values. For example- filtering spam emails, classifying a transaction as legit or fraudulent, and much more. The variable in question is classified as 0 or 1, True or False, Yes or No depending on the input. 

It is a regression model that helps in building a model that predicts the probability of a data item belonging to a certain category. Logistic Regression uses a ‘sigmoid’ function, which has been defined below: 

g(z) = 1/ (1+  −  

Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0. 

The logistic regression becomes a classification problem when a decision threshold comes into play. 

Other types of regression include: 

  • Polynomial regression 
  • Stepwise regression 
  • Ridge regression 
  • Lasso regression 
  • ElasticNet regression 

The sigmoid function/logistic function looks like below: 

Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0. 

The logistic regression becomes a classification problem when a decision threshold comes into play. 

Logistic Regression from scratch 

From scratch, it can be implemented without using the scikit-learn module. 

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import scipy.optimize as opt 
def data_loading(path, header): 
marks_data_frame = pd.read_csv(path, header=header) 
return marks_data_frame 
if __name__ == "__main__": 
# load data from the file 
data = data_loading("path to marks.csv file",None) 
X = feature values, all columns except the last 
one X_data = data.iloc[:, :-1] 
y = target values, last column of data frame 
y_data = data.iloc[:, -1] 
filter out the applicants who were eligible admitted = data.loc[y_data == 1] 
filter out the applicants who weren’t eligible not_admitted = data.loc[y_data == 0] 
plot the insights 
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible') 
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible') 
plt.legend() 
plt.show() 
X_data = np.c_[np.ones((X.shape[0], 1)), X_data] 
y_data = y_data[:, np.newaxis] 
theta = np.zeros((X_data.shape[1], 1)) 
def sigmoid(x): 

Activation function that maps a real value between 0 and  

1 return 1 / (1 + np.exp(-x)) 
def total_input(theta, x): 

Computes weighted sum of

inputs return np.dot(x, theta) 
def probability(theta, x): 
Returns probability after it goes through sigmoid 
function return sigmoid(total_input(theta, x)) 
def cost_function( theta, x, y): 
Cost function for all the training samples is 
computed m = x.shape[0] 
total_cost = -(1 / m) * np.sum(y * np.log(probability(theta, x)) + (1 - y) * np.log(1 - 
probability(theta,x))) return total_cost 
def gradient( theta, x, y): 
Computes the gradient of the cost function at the point
theta m = x.shape[0] 
return (1 / m) * np.dot(x.T, sigmoid(total_input(theta, x)) - y) 
def fit(x, y, theta): 
opt_weights = opt.fmin_tnc(func=cost_function,x0=theta,fprime=gradient,args=(x,
y.flatten())) return opt_weights[0] 
parameters = fit(X_data, y_data, theta) 
x_values = [np.min(X_data[:, 1] - 5), np.max(X_data[:, 2] + 5)] 
y_values = - (parameters[0] + np.dot(parameters[1], x_values)) / parameters[2] 
plt.plot(x_values, y_values, label='Decision Boundary') 
plt.xlabel('Marks in 1st Exam') 
plt.ylabel('Marks in 2nd Exam') 
plt.legend() 
plt.show() 
def predict( x): 
theta = parameters[:, np.newaxis] 
return probability(theta, x) 
def accuracy( x, actual_classes, prob_threshold=0.5): 
predicted_classes = (predict(x) >= prob_threshold).astype(int) 
predicted_classes = predicted_classes.flatten() 
accuracy = np.mean(predicted_classes == actual_classes) 
return accuracy * 100 
accuracy(X_data, y_data.flatten()) 

88.88888888888889 

Logistic Regression implemented using scikit-learn module 

It is implemented using MLE (Maximum Likelihood Estimation), which is an iterative process. A random weight/value is provided for the independent variable and this process goes on until an optimal weight is reached after which there is less to no change in the output when the weights change. 

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import scipy 
def data_loading(path, header): 
marks_data_frame = pd.read_csv(path, header=header) 
return marks_data_frame 
if __name__ == "__main__": 
# load data from the file 
data = data_loading("path-to-marks.csv file", None) 
X = feature values, all columns except the last one X_data = data.iloc[:, :-1] 
y = target values, last column of the data frame 
y_data = data.iloc[:, -1] 
filter out applicants who are eligible admitted = data.loc[y_data == 1] 
filter out applicants who aren’t eligible not_admitted = data.loc[y_data == 0] 
plot the insights 
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible') 
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible') 
plt.legend() 
plt.show() 
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression() model.fit(X_data, y_data) 
predicted_classes = model.predict(X_data) 
accuracy = accuracy_score(y_data,predicted_classes) 
parameters = model.coef_ 

Output:

Applications of logistic regression 

  • Weather forecasting 
  • Stock prediction 
  • Election poll results 

Conclusion 

In this post, we understood what Logistic Regression means, and its Python implementation using scikit-learn library as well as from scratch. 

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments

Vinu

After reading your article, I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article. Thanks for sharing.

Johnson M

Good and informative article.

Vinu

I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!

Vinu

Awesome blog. I enjoyed reading this article. This is truly a great read for me. Keep up the good work!

best data science courses in India

Thanks for sharing this article!! Machine learning is a branch of artificial intelligence (AI) and computer science that focus on the uses of data and algorithms. I came to know a lot of information from this article.