
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python TutorialRegression is a part of machine learning that helps in solving tasks which can’t be explicitly programmed.
There are various techniques that are used in machine learning. This includes supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning.
It is one of the most popular learning methods, since it is easy to understand and relatively easier to implement ad get relevant outputs.
Consider this example: How does a child learn? It is taught how to walk, run, talk, and it is made to understand the difference between walking and running.
Supervised learning works in a similar way, there is human supervision involved in the form of features being labelled, feedback given to the data (whether it predicted correctly, and if not what the right prediction has to be) and so on.
Once the algorithm has been fully trained on such data, it can predict outputs for never-before-seen inputs in-line with the data on which the model was trained with good accuracy. It is also understood as a task-oriented algorithm since it focuses on a single task and is trained on huge number of examples until it predicts output accurately.
Supervised learning algorithms can be classified into regression and classification problems. Regression problems include linear regression, logistic regression and classification problems include multi-class classification, decision trees, and much more.
Regression problem basically means the model would yield a real value or a continuous value. The simplest model which is used to predict continuous variables is Linear Regression.
Linear Regression refers to an approach/algorithm that helps establish a linear relationship between the dependant and the independent variable.
As the name indicates, it is a linear process, which means it is 2 dimensional, i.e. it has 2 variables associated with it. These variables have continuous values (in contrast to 0s and 1s in logistic regression). The word ‘regression’ refers to finding relationship between two variables amongst which one is a dependant variable and the other one is independent.
In simple words, it goes like this- we will be provided with a basic linear equation, say y = 3x-1. Here ‘y’ is considered to be the dependant variable (since it depends on the value of x) and ‘x’ (trivially) is the independent variable. This means, as and when ‘x’ changes, the value of ‘y’ keeps changing according to the above-mentioned linear equation. Different values for ‘x’ are supplied, which helps calculate various values for ‘y’. The values for ‘x’ and ‘y’ have been shown in a table below:
X | Y |
|---|---|
1 | 2 |
2 | 5 |
3 | 8 |
4 | 11 |
5 | 14 |
6 | 17 |
7 | 20 |
These values are plotted on a graph and we try to fit all these points (or most of them) to a straight line. During the process of fitting these values to a straight line, we try and grab most of the points whose vertical distance from the straight line (that is being fit) is minimum. Some points don’t make it on the straight line since they don’t contribute in forming a straight line. These are the ones whose vertical distance from the straight line isn’t the smallest.
The idea is to grab all the points in the graph and fit them on a straight line that have minimum vertical distance from the line. Below is an example illustrating the same:

When the number of points that don’t contribute to fitting a straight line are more in comparison to the ones that contribute to fitting the line, it is considered that the ‘prediction error’ is more. The ‘error’ basically refers to the shortest distance (vertical distance) between the line and the point.
From the above graph, it can be observed that points 1,2,3 and 4 beginning from the bottom left corner don’t really fit the line, and don’t contribute to forming the straight line.
When such a linear regression model is trained, it helps calculate an attribute called ‘cost function’ that helps in measuring the ‘Root Mean Squared Error’ or RMSE in short. RMSE basically gives the difference between the values that are predicted and the input values. These values are then normalized by squaring them so as to remove any negative values and calculating the average of these values (i.e. dividing them by the total number of observations) and taking the square root of this value.
The resultant is a single number that is used to understand how well the regression algorithm has predicted output for a given input value and how close it is to the actual output. The ‘cost function’
needs to be minimal, thereby corresponding to a minimum difference between the actual value and the predicted value.
It is a supervised classification algorithm that is used to differentiate between different events or values. For example- filtering spam emails, classifying a transaction as legit or fraudulent, and much more. The variable in question is classified as 0 or 1, True or False, Yes or No depending on the input.
It is a regression model that helps in building a model that predicts the probability of a data item belonging to a certain category. Logistic Regression uses a ‘sigmoid’ function, which has been defined below:
g(z) = 1/ (1+ − )
Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0.
The logistic regression becomes a classification problem when a decision threshold comes into play.
Other types of regression include:
The sigmoid function/logistic function looks like below:
Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0.
The logistic regression becomes a classification problem when a decision threshold comes into play.
From scratch, it can be implemented without using the scikit-learn module.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize as opt
def data_loading(path, header):
marks_data_frame = pd.read_csv(path, header=header)
return marks_data_frame
if __name__ == "__main__":
# load data from the file
data = data_loading("path to marks.csv file",None)
X = feature values, all columns except the last
one X_data = data.iloc[:, :-1]
y = target values, last column of data frame
y_data = data.iloc[:, -1]
filter out the applicants who were eligible admitted = data.loc[y_data == 1]
filter out the applicants who weren’t eligible not_admitted = data.loc[y_data == 0]
plot the insights
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible')
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible')
plt.legend()
plt.show()
X_data = np.c_[np.ones((X.shape[0], 1)), X_data]
y_data = y_data[:, np.newaxis]
theta = np.zeros((X_data.shape[1], 1))
def sigmoid(x):
Activation function that maps a real value between 0 and
1 return 1 / (1 + np.exp(-x))
def total_input(theta, x):
Computes weighted sum of
inputs return np.dot(x, theta)
def probability(theta, x):
Returns probability after it goes through sigmoid
function return sigmoid(total_input(theta, x))
def cost_function( theta, x, y):
Cost function for all the training samples is
computed m = x.shape[0]
total_cost = -(1 / m) * np.sum(y * np.log(probability(theta, x)) + (1 - y) * np.log(1 -
probability(theta,x))) return total_cost
def gradient( theta, x, y):
Computes the gradient of the cost function at the point
theta m = x.shape[0]
return (1 / m) * np.dot(x.T, sigmoid(total_input(theta, x)) - y)
def fit(x, y, theta):
opt_weights = opt.fmin_tnc(func=cost_function,x0=theta,fprime=gradient,args=(x,
y.flatten())) return opt_weights[0]
parameters = fit(X_data, y_data, theta)
x_values = [np.min(X_data[:, 1] - 5), np.max(X_data[:, 2] + 5)]
y_values = - (parameters[0] + np.dot(parameters[1], x_values)) / parameters[2]
plt.plot(x_values, y_values, label='Decision Boundary')
plt.xlabel('Marks in 1st Exam')
plt.ylabel('Marks in 2nd Exam')
plt.legend()
plt.show()
def predict( x):
theta = parameters[:, np.newaxis]
return probability(theta, x)
def accuracy( x, actual_classes, prob_threshold=0.5):
predicted_classes = (predict(x) >= prob_threshold).astype(int)
predicted_classes = predicted_classes.flatten()
accuracy = np.mean(predicted_classes == actual_classes)
return accuracy * 100
accuracy(X_data, y_data.flatten())

88.88888888888889
Logistic Regression implemented using scikit-learn module
It is implemented using MLE (Maximum Likelihood Estimation), which is an iterative process. A random weight/value is provided for the independent variable and this process goes on until an optimal weight is reached after which there is less to no change in the output when the weights change.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy
def data_loading(path, header):
marks_data_frame = pd.read_csv(path, header=header)
return marks_data_frame
if __name__ == "__main__":
# load data from the file
data = data_loading("path-to-marks.csv file", None)
X = feature values, all columns except the last one X_data = data.iloc[:, :-1]
y = target values, last column of the data frame
y_data = data.iloc[:, -1]
filter out applicants who are eligible admitted = data.loc[y_data == 1]
filter out applicants who aren’t eligible not_admitted = data.loc[y_data == 0]
plot the insights
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible')
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible')
plt.legend()
plt.show()
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression() model.fit(X_data, y_data)
predicted_classes = model.predict(X_data)
accuracy = accuracy_score(y_data,predicted_classes)
parameters = model.coef_
Output:

Applications of Logistic Regression
In this post, we understood what Logistic Regression means, and its Python implementation using scikit-learn library as well as from scratch.