All Courses

Overview - Regression and Logistic Regression

Updated on Aug 22, 2025

12,562 Views

Table of Content

supervised learning algorithms
linear regression
how can this relationship be established?
logistic regression
conclusion

Regression is a part of machine learning that helps in solving tasks which can’t be explicitly programmed.

There are various techniques that are used in machine learning. This includes supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning.

Supervised Learning Algorithms

It is one of the most popular learning methods, since it is easy to understand and relatively easier to implement ad get relevant outputs.

Consider this example: How does a child learn? It is taught how to walk, run, talk, and it is made to understand the difference between walking and running.

Supervised learning works in a similar way, there is human supervision involved in the form of features being labelled, feedback given to the data (whether it predicted correctly, and if not what the right prediction has to be) and so on.

Once the algorithm has been fully trained on such data, it can predict outputs for never-before-seen inputs in-line with the data on which the model was trained with good accuracy. It is also understood as a task-oriented algorithm since it focuses on a single task and is trained on huge number of examples until it predicts output accurately.

Supervised learning algorithms can be classified into regression and classification problems. Regression problems include linear regression, logistic regression and classification problems include multi-class classification, decision trees, and much more.

Regression problem basically means the model would yield a real value or a continuous value. The simplest model which is used to predict continuous variables is Linear Regression.

Linear Regression

Linear Regression refers to an approach/algorithm that helps establish a linear relationship between the dependant and the independent variable.

As the name indicates, it is a linear process, which means it is 2 dimensional, i.e. it has 2 variables associated with it. These variables have continuous values (in contrast to 0s and 1s in logistic regression). The word ‘regression’ refers to finding relationship between two variables amongst which one is a dependant variable and the other one is independent.

How Can This Relationship be Established?

In simple words, it goes like this- we will be provided with a basic linear equation, say y = 3x-1. Here ‘y’ is considered to be the dependant variable (since it depends on the value of x) and ‘x’ (trivially) is the independent variable. This means, as and when ‘x’ changes, the value of ‘y’ keeps changing according to the above-mentioned linear equation. Different values for ‘x’ are supplied, which helps calculate various values for ‘y’. The values for ‘x’ and ‘y’ have been shown in a table below:

X	Y
1	2
2	5
3	8
4	11
5	14
6	17
7	20

These values are plotted on a graph and we try to fit all these points (or most of them) to a straight line. During the process of fitting these values to a straight line, we try and grab most of the points whose vertical distance from the straight line (that is being fit) is minimum. Some points don’t make it on the straight line since they don’t contribute in forming a straight line. These are the ones whose vertical distance from the straight line isn’t the smallest.

The idea is to grab all the points in the graph and fit them on a straight line that have minimum vertical distance from the line. Below is an example illustrating the same:

When the number of points that don’t contribute to fitting a straight line are more in comparison to the ones that contribute to fitting the line, it is considered that the ‘prediction error’ is more. The ‘error’ basically refers to the shortest distance (vertical distance) between the line and the point.

From the above graph, it can be observed that points 1,2,3 and 4 beginning from the bottom left corner don’t really fit the line, and don’t contribute to forming the straight line.

When such a linear regression model is trained, it helps calculate an attribute called ‘cost function’ that helps in measuring the ‘Root Mean Squared Error’ or RMSE in short. RMSE basically gives the difference between the values that are predicted and the input values. These values are then normalized by squaring them so as to remove any negative values and calculating the average of these values (i.e. dividing them by the total number of observations) and taking the square root of this value.

The resultant is a single number that is used to understand how well the regression algorithm has predicted output for a given input value and how close it is to the actual output. The ‘cost function’

needs to be minimal, thereby corresponding to a minimum difference between the actual value and the predicted value.

Logistic Regression

It is a supervised classification algorithm that is used to differentiate between different events or values. For example- filtering spam emails, classifying a transaction as legit or fraudulent, and much more. The variable in question is classified as 0 or 1, True or False, Yes or No depending on the input.

It is a regression model that helps in building a model that predicts the probability of a data item belonging to a certain category. Logistic Regression uses a ‘sigmoid’ function, which has been defined below:

g(z) = 1/ (1+  −  )

Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0.

The logistic regression becomes a classification problem when a decision threshold comes into play.

Other types of regression include:

Polynomial regression
Stepwise regression
Ridge regression
Lasso regression
ElasticNet regression

The sigmoid function/logistic function looks like below:

Note: The outcome of a Logistic Regression lies between the values 0 and 1, it can’t be greater than 1,and can’t be less than 0.

The logistic regression becomes a classification problem when a decision threshold comes into play.

Logistic Regression from Scratch

From scratch, it can be implemented without using the scikit-learn module.

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import scipy.optimize as opt 
def data_loading(path, header): 
marks_data_frame = pd.read_csv(path, header=header) 
return marks_data_frame 
if __name__ == "__main__": 
# load data from the file 
data = data_loading("path to marks.csv file",None) 
X = feature values, all columns except the last 
one X_data = data.iloc[:, :-1] 
y = target values, last column of data frame 
y_data = data.iloc[:, -1] 
filter out the applicants who were eligible admitted = data.loc[y_data == 1] 
filter out the applicants who weren’t eligible not_admitted = data.loc[y_data == 0] 
plot the insights 
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible') 
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible') 
plt.legend() 
plt.show() 
X_data = np.c_[np.ones((X.shape[0], 1)), X_data] 
y_data = y_data[:, np.newaxis] 
theta = np.zeros((X_data.shape[1], 1)) 
def sigmoid(x): 

Activation function that maps a real value between 0 and

1 return 1 / (1 + np.exp(-x)) 
def total_input(theta, x): 

Computes weighted sum of

inputs return np.dot(x, theta) 
def probability(theta, x): 
Returns probability after it goes through sigmoid 
function return sigmoid(total_input(theta, x)) 
def cost_function( theta, x, y): 
Cost function for all the training samples is 
computed m = x.shape[0] 
total_cost = -(1 / m) * np.sum(y * np.log(probability(theta, x)) + (1 - y) * np.log(1 - 
probability(theta,x))) return total_cost 
def gradient( theta, x, y): 
Computes the gradient of the cost function at the point
theta m = x.shape[0] 
return (1 / m) * np.dot(x.T, sigmoid(total_input(theta, x)) - y) 
def fit(x, y, theta): 
opt_weights = opt.fmin_tnc(func=cost_function,x0=theta,fprime=gradient,args=(x,
y.flatten())) return opt_weights[0] 
parameters = fit(X_data, y_data, theta) 
x_values = [np.min(X_data[:, 1] - 5), np.max(X_data[:, 2] + 5)] 
y_values = - (parameters[0] + np.dot(parameters[1], x_values)) / parameters[2] 
plt.plot(x_values, y_values, label='Decision Boundary') 
plt.xlabel('Marks in 1st Exam') 
plt.ylabel('Marks in 2nd Exam') 
plt.legend() 
plt.show() 
def predict( x): 
theta = parameters[:, np.newaxis] 
return probability(theta, x) 
def accuracy( x, actual_classes, prob_threshold=0.5): 
predicted_classes = (predict(x) >= prob_threshold).astype(int) 
predicted_classes = predicted_classes.flatten() 
accuracy = np.mean(predicted_classes == actual_classes) 
return accuracy * 100 
accuracy(X_data, y_data.flatten()) 

88.88888888888889

Logistic Regression implemented using scikit-learn module

It is implemented using MLE (Maximum Likelihood Estimation), which is an iterative process. A random weight/value is provided for the independent variable and this process goes on until an optimal weight is reached after which there is less to no change in the output when the weights change.

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import scipy 
def data_loading(path, header): 
marks_data_frame = pd.read_csv(path, header=header) 
return marks_data_frame 
if __name__ == "__main__": 
# load data from the file 
data = data_loading("path-to-marks.csv file", None) 
X = feature values, all columns except the last one X_data = data.iloc[:, :-1] 
y = target values, last column of the data frame 
y_data = data.iloc[:, -1] 
filter out applicants who are eligible admitted = data.loc[y_data == 1] 
filter out applicants who aren’t eligible not_admitted = data.loc[y_data == 0] 
plot the insights 
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Eligible') 
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not eligible') 
plt.legend() 
plt.show() 
from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score model = LogisticRegression() model.fit(X_data, y_data) 
predicted_classes = model.predict(X_data) 
accuracy = accuracy_score(y_data,predicted_classes) 
parameters = model.coef_ 

Output:

Applications of Logistic Regression

Weather forecasting
Stock prediction
Election poll results

Conclusion

In this post, we understood what Logistic Regression means, and its Python implementation using scikit-learn library as well as from scratch.

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

15% OFF

Coupon Code "SELF15"

Coupon Expires 16/03

Copy

Get your free handbook for CSM!!

Recommended Courses