
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python TutorialData Science
4.6 Rating 40 Questions 25 mins read7 Readers

In a regression problem, we expect that when we define a solution or mathematical formula, it should explain all possible values or assumption is that most data points should get closer to the line if it is a linear regression.
R square is also known as “goodness of fit”. The higher the value of R square, the better it is. R square explains the amount to which input variables explain the variation of the target variable or predicted variable. If R square is 0.75, then it indicates that 75% of the variation in the target variable is explained by input variables. So higher the R-square value, better the explainability of variation in target, hence better the model performance.
Now the problem arises, where we add more input variables. The value of R-square keeps increasing. If additional variables do not have an influence in determining the variation of the target variable, then it is a problem and higher R-square value, in this case, is misleading. This is where the adjusted R square is being used. The Adjusted R square is an updated version of R square. It penalizes if the addition of more input variables does not improve the existing model and can’t explain the variation in target effectively.
So if we are adding more input variables, we need to ensure they influence target variable, else the gap between R-square and Adjusted R-square will increase. If there is only one input variable both value will be the same. If there are multiple input variables, it is suggested to consider Adjusted R-square value for the goodness of fit.
Tolerance is defined as 1/VIF where VIF stands for Variation Inflation Factor. VIF as the name suggests indicates the inflation in variation. It is a parameter that detects multicollinearity between variables. Based on VIF values, we can determine whether to remove or include all variables without comprising the Adjusted R-square value. Hence 1/VIF or Tolerance can be used to gauge which all parameters to be considered in the model to have a better performance.
Type I error is committed when the null hypothesis is true and we reject it, also known as a ‘False Positive’. Type II error is committed when the null hypothesis is false and we accept it, also known as ‘False Negative’.
In the context of the confusion matrix, we can say Type I error occurs when we classify a value as positive (1) when it is actually negative (0). Type II error occurs when we classify a value as negative (0) when it is actually positive(1).
Logistic Regression models can be evaluated as follows:
Machine learning can be of types - supervised, unsupervised and others such as semi-supervised, reinforcement learning, etc.
When we look at how to choose which algorithm to select, it depends on input data type primarily and what are we trying to accomplish out of it.
Other types of machine learning also used in different scenarios.
Generative, Graph-based and Heuristic approaches are part of semi-supervised learning while reinforcement learning can be active and passive categories.
This is how different machine learning algorithms, methods, approaches can be used at different scenarios at a high level.
Mathematically the error emerging from any model can be broken down into 3 major components.
Error(X) = Square(Bias) + Variance + Irreducible Error
It is important to handle or address the bias error and variance error which is in control. We can’t do much for irreducible error.
When we are trying to build a model with greater accuracy, for better performance of the model, it is critical to strike a balance between bias and variance so that errors can be minimized and the gap between actual and predicted outcomes can be reduced.
Hence balance between Bias and Variance needs to be maintained.
CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a methodology for data science programs. It has the following phases:
Some phases are iterative in nature and any data science project or program which is end to end typically follows this methodology.
Below is a diagrammatic view for better understanding

In univariate analysis, variables are explored one by one. Method to perform univariate analysis will depend on whether the variable type is categorical or continuous.
In the case of continuous variables, we need to understand the central tendency and spread of the variable. For example- central tendency – mean, median, mode, max, min, etc.; a measure of dispersion – range, quartile, IQR, variance, standard deviation, skewness, kurtosis etc; visualization methods – histogram, boxplot etc.
Univariate analysis is also used to highlight missing and outlier values.
The relationship between two variables can be determined using bivariate analysis. How the two variables are associated and/or dis-associated are looked into considering the significance level of comparison. Typically bivariate analysis can be performed for:
Different approaches/methods need to be used to handle the above scenarios. Scatter plot can be used irrespective of whether a relationship is linear or nonlinear. In order to figure out how loosely or tightly both variables are correlated, correlation can be performed where the correlation values indicate from -1 to 1. If the value indicates 0, then there is no correlation between the two variables. If it is -1, then there is a perfect -ve correlation and if it is a +1 then it is a perfect +ve correlation.
When we want to find out the statistical significance between two variables, then the chi-square test is used to understand the deviation between observed and expected frequency and divided by the expected frequency.
We use this between two Categorical variables.
When variables are categorical and continuous, and there are “many samples”, then we should not use the t-test. If sample size n>=30, then we can go for z-test. When there are too many samples and the mean/average of multiple groups are to be compared, then ANOVA can be chosen.
When we don’t have many samples and variance is unknown, then we will use the t-test. In a t-test, the expectation is that the sample size is smaller. Typical n<30, where n is the number of observations or sample size.
The t-test and z-test can be defined as follows. There is a very subtle difference between the two. z-test is used for n>=30 and t-test is used for n<30 scenarios mostly.
t-test = (x-bar - mu) / (sd / sqrt(n))
z-test = (x-bar - mu) / (sigma / sqrt(n))
ANOVA is an analysis of variance. For example, let’s say we are talking about 3 groups.
| Class 1 | Class 2 | Class 3 |
|---|---|---|
| 8 | 9 | 3 |
| 6 | 2 | 4 |
| 5 | 6 | 3 |
| 8 | 2 | 5 |
| 6 | 7 | 4 |
| 10 | 5 | 4 |
| 6 | 2 | 6 |
| 3 | 8 | 4 |
| 5 | 4 | 5 |
| 7 | 9 | 3 |
Figure ANOVA
In the “Figure ANOVA” above, we can consider ANOVA for analysis as there are more than 2 sample groups. i.e. 3 groups of samples. There can be many rows in each class. We have considered only 10 each for simple understanding.
| Class Group | Count | Sum | Average | Variance |
|---|---|---|---|---|
| Class 1 | 10 | 64 | 6.4 | 3.82 |
| Class 2 | 10 | 54 | 5.4 | 8.04 |
| Class 3 | 10 | 41 | 4.1 | 0.99 |
Missing data in the training data set can reduce the power/fit of a model or can lead to a biased model because we have not analyzed the behavior and relationship with other variables correctly. It can lead to incorrect prediction or classification. Below is a simple example to illustrate this.
| Name | Weight | Gender | Play Golf or Not |
|---|---|---|---|
| AA | 55 | M | Yes |
| BB | 62 | F | Yes |
| CC | 58 | F | No |
| DD | 54 | No | |
| EE | 54 | M | No |
| FF | 66 | F | Yes |
| GG | 56 | Yes | |
| HH | 56 | M | Yes |
Figure 1
| Gender | # Count | # Play Golf | % Play Golf |
|---|---|---|---|
| F | 3 | 2 | 66.67% |
| M | 3 | 2 | 66.67% |
| Missing/Blank | 2 | 1 | 50% |
Figure 2
Please note the missing values in the table shown above: in figure1, we have not treated missing values for our analysis in Figure 2. The inference from this data set is that the chances of playing golf by females and males are similar.
On the other hand, if you look at Figure. 4, which shows data after treatment of missing values (based on gender), we can see that females have higher chances of playing cricket compared to males.
| Name | Weight | Gender | Play Golf or Not |
|---|---|---|---|
| AA | 55 | M | Yes |
| BB | 62 | F | Yes |
| CC | 58 | F | No |
| DD | 54 | M | No |
| EE | 54 | M | No |
| FF | 66 | F | Yes |
| GG | 56 | M | Yes |
| HH | 56 | M | Yes |
Figure 3
| Gender | # Count | # Play Golf | % Play Golf |
|---|---|---|---|
| F | 3 | 2 | 66.67% |
| M | 5 | 3 | 60% |
Figure 4