10X Sale
kh logo
All Courses

Introduction

Ready to face your next Machine Learning interview? Be interview-ready with this list of Machine Learning interview questions and answers, carefully curated by industry experts. Be ready to answer different questions like CRISP-DM, difference between univariate and bivariate analysis, chi-square test, difference between Type 1 and Type 2 Error, Bias-Variance trade-off. We have gathered a set of interview questions for machine learning that will help you become a machine learning engineer, data engineer.

Machine Learning Interview Questions and Answers
Intermediate

1. There is an ask to evaluate a regression model based on parameters such as R square, Adjusted R square, and Tolerance? Explain what will be the criteria.

In a regression problem, we expect that when we define a solution or mathematical formula, it should explain all possible values or assumption is that most data points should get closer to the line if it is a linear regression.

R square is also known as “goodness of fit”. The higher the value of R square, the better it is. R square explains the amount to which input variables explain the variation of the target variable or predicted variable. If R square is 0.75, then it indicates that 75% of the variation in the target variable is explained by input variables. So higher the R-square value, better the explainability of variation in target, hence better the model performance.

Now the problem arises, where we add more input variables. The value of R-square keeps increasing. If additional variables do not have an influence in determining the variation of the target variable, then it is a problem and higher R-square value, in this case, is misleading. This is where the adjusted R square is being used. The Adjusted R square is an updated version of R square. It penalizes if the addition of more input variables does not improve the existing model and can’t explain the variation in target effectively.

So if we are adding more input variables, we need to ensure they influence target variable, else the gap between R-square and Adjusted R-square will increase. If there is only one input variable both value will be the same. If there are multiple input variables, it is suggested to consider Adjusted R-square value for the goodness of fit.

Tolerance is defined as 1/VIF where VIF stands for Variation Inflation Factor. VIF as the name suggests indicates the inflation in variation. It is a parameter that detects multicollinearity between variables. Based on VIF values, we can determine whether to remove or include all variables without comprising the Adjusted R-square value. Hence 1/VIF or Tolerance can be used to gauge which all parameters to be considered in the model to have a better performance.

2. What is the difference between Type 1 and Type 2 Error? Explain briefly.

Type I error is committed when the null hypothesis is true and we reject it, also known as a ‘False Positive’. Type II error is committed when the null hypothesis is false and we accept it, also known as ‘False Negative’.

In the context of the confusion matrix, we can say Type I error occurs when we classify a value as positive (1) when it is actually negative (0). Type II error occurs when we classify a value as negative (0) when it is actually positive(1).

3. How is the logistic regression model evaluated? Explain at least 3 points.

Logistic Regression models can be evaluated as follows:

  1. First and foremost key parameter for evaluation is AUC-ROC curve. This is the Area under Curve. The confusion matrix can be built or generated based on actual and predicted values from the model solution. Based on that, the AUC-ROC curve can be plotted to see the model performance. ROC stands for Receiver Operating Characteristic. For an ideal model, the perfect True positive rate score will be 1 and False Positive rate will be 0. The more inclined the ROC curve towards 1, the better it is.
  2. Secondly, another important metrics is AIC which stands for Akaike Information Criteria. This is related to the Adjusted R square value. When we look at R square and Adjusted R square, we understand that when there are more input variables being added without improving the variation explanation of target variable, then metric such as Adjusted R square penalizes if we add input variables just for the sake of adding and no value in terms of model performance. Hence in such cases, Adjusted R square is a better interpretation compared to R square and hence it is followed. AIC value is dependent on Adjusted R square. Hence, AIC is the goodness of fit and it penalizes if more variables are added to a model without adding value.
  3. Null deviance and Residual deviance are other metrics which are important to evaluate a logistic regression model. Both should be low which will indicate the model is better.

4. There are multiple algorithms available in machine learning – supervised, unsupervised and other learning. How do you determine which one to use?

Machine learning can be of types - supervised, unsupervised and others such as semi-supervised, reinforcement learning, etc.

When we look at how to choose which algorithm to select, it depends on input data type primarily and what are we trying to accomplish out of it.

  1. If the target variable is continuous, then we will use regression algorithms (which are part of supervised learning). e.g. Simple Linear Regression, Multiple Linear Regression, etc.
  2. If the target variable is categorical, then we will use classification algorithms (this is also part of supervised learning). e.g. Logistic Regression, Random Forest, Decision Trees, kNN, Neural Network, Support Vector Machine, Naive Bayes, etc.
  3. If the target variable is not available, then we will use any of the unsupervised learning such as Clustering or Association or Recommendation Algorithms.

Other types of machine learning also used in different scenarios.

Generative, Graph-based and Heuristic approaches are part of semi-supervised learning while reinforcement learning can be active and passive categories.

This is how different machine learning algorithms, methods, approaches can be used at different scenarios at a high level.

5. What is Bias-Variance trade-off? Explain.

Mathematically the error emerging from any model can be broken down into 3 major components.

Error(X) = Square(Bias) + Variance + Irreducible Error

It is important to handle or address the bias error and variance error which is in control. We can’t do much for irreducible error.

  • Low Bias - indicates fewer assumptions about the form of the target variable or function. In this case, when we test on new data, it does not give expected results and accuracy can be compromised. High Bias indicates high assumptions in a similar context.
  • High variance - indicates large changes to the estimate of target variable or target function with changes to the training data. Low variance indicates smaller changes to the estimate of the target variable or target function in a similar context.

When we are trying to build a model with greater accuracy, for better performance of the model, it is critical to strike a balance between bias and variance so that errors can be minimized and the gap between actual and predicted outcomes can be reduced.

Hence balance between Bias and Variance needs to be maintained.

Want to Know More?
+91

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Description

Machine Learning is the field of study that provides the computers the capability to learn without being explicitly programmed. It is one of the most exciting technologies that one would have never come across. Machine Learning has become one of the most popular career choices today. According to a recent report from Gartner, Artificial Intelligence will create more than 2.3 million jobs by 2020.

A LinkdeIn study suggests that there are currently 1,829 jobs opening for Machine Learning Engineering positions. Another study conducted by Analytical India Magazine reveals that there are more than 78,000 jobs in the Data Science and Machine Learning jobs lying across India. The demand for Machine Learning is growing at a faster pace. There are many factors contributing to increase in the demand of Machine Learning. Most companies are investing in machine learning. Companies are looking to hire more ML experts.

Jobs in machine learning rapidly increasing due to the increase in machine learning industry. The report from International Data Corporation estimates states that investing on Machine Learning and Artificial Intelligence will increase from $12B in 2017 to $57.6 B in 2021. Jobs in machine learning are highly paid since, the job is creative and unstructured, companies pay employees really well. The report from Glassdoor, states the average salary of machine learning engineers for freshers is between INR 4.5 lakhs to INR 7 lakhs, it might reach upto INR 16 lakhs for experienced professionals.

If you’re looking for interview questions and answers on machine learning for experienced and freshers, then you are at the right place. There are a lot of opportunities in many reputed companies across the globe. Good hands-on knowledge concepts will put you forward in the interview. You can find job opportunities everywhere. Our Machine Learning interview questions are exclusively designed for supporting employees in clearing interviews. We have tried to cover almost all the main topics related to Machine Learning.

Here, we have characterized the questions based on the level of expertise you’re looking for. Preparing for your interview with these interview questions on Machine Learning will give you an edge over other interviewees and will help you crack the Machine Learning interview easily. To get in-depth knowledge on Machine Learning you can also enroll for Machine Learning course.

All the best!

Recommended Courses

Learners Enrolled For
CTA
Got more questions? We've got answers.
Book Your Free Counselling Session Today.