Machine Learning Tutorial

By KnowledgeHut .

Whenever a machine learning algorithm is implemented on a specific dataset, the performance is judged based on how well it generalizes, i.e how it reacts to new, never-before-seen data. In case the performance of the learning algorithm is not satisfactory or there is room for improvement, certain parameters in the algorithm need to be changed/tuned/tweaked. These parameters are known as ‘hyperparameters’ and the process of varying these hyperparameters to better the learning algorithm’s performance is known as ‘hyperparameter tuning’. These hyperparameters are not learnt directly through the training of algorithms. These values are fixed before the training of the data begins. They deal with parameters such as learning_rate, i.e how quickly the model should be able to learn, how complicated the model is, and so on. There can be a wide variety of hyperparameters for every learning algorithm. Selecting the right set of hyperparameters so as to gain good performance is an important aspect of machine learning. In this post, we will look at the below-mentioned hyperparameter tuning strategies: RandomizedSearchCV GridSearchCV Before jumping into understanding how these two strategies work, let us assume that we will perform hyperparameter tuning on logistic regression algorithm and stochastic gradient descent algorithm. RandomizedSearchCV RandomizedSearch searches for the specific subset of data in a random manner, instead of searching continuously (like how GridSearch does). This reduces the processing time of the hyperparameters. The scikit-learn module has RandomizedSearchCV function that can be used to implement random search. The hyperparameters are defined before searching them. A parameter called ‘n_iter’ is used to specify the number of combinations that are randomly tried. If ‘n_iter’ is too less, finding the best combination is difficult, and if ‘n_iter’ is too large, the processing time increases. It is important to find a balanced value for ‘n_iter’: import pandas as pd train = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv") test = pd.read_csv("C:\\Users\\Vishal\\Desktop\\test_1.csv") X_train = train.drop(['id', 'target'], axis=1) y_train = train['target'] X_test = test.drop(['id'], axis=1) loss = ['hinge', 'log', 'modified_huber', 'squared_hinge', 'perceptron'] penalty = ['l1', 'l2', 'elasticnet'] alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] learning_rate = ['constant', 'optimal', 'invscaling', 'adaptive'] class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] eta0 = [1, 10, 100] param_distributions = dict(loss=loss, penalty=penalty, alpha=alpha, learning_rate=learning_rate, class_weight=class_weight, eta0=eta0) from sklearn.linear_model import SGDClassifier from sklearn.model_selection import RandomizedSearchCV sgd = SGDClassifier(loss="hinge", penalty="l2", max_iter=5) random = RandomizedSearchCV(estimator=sgd, param_distributions=param_distributions, scoring='roc_auc', verbose=1, n_jobs=-1, n_iter=1000) random_result = random.fit(X_train, y_train) print('Best Score: ', random_result.best_score_) print('Best Params: ', random_result.best_params_) Output: Best Score: 0.7981584905660377 Best Params: {'penalty': 'elasticnet', 'loss': 'log', 'learning_rate': 'optimal', 'eta0': 1, 'class_weight': {1: 0.5, 0: 0.5}, 'alpha': 0.1} GridSearchCV This approach is considered to be the traditional way of performing hyperparameter optimization. It searches the specific subset of the hyperparameters continuously until a condition is met or the end is reached. Scikit-learn module has the GridSearchCV that can be used to implement this approach. Our set of parameters is defined before searching over it. import pandas as pd train = pd.read_csv("path to train.csv") test = pd.read_csv("path to test_1.csv") X_train = train.drop(['id', 'target'], axis=1) y_train = train['target'] X_test = test.drop(['id'], axis=1) from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression penalty = ['l1', 'l2'] C = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] solver = ['liblinear', 'saga'] param_grid = dict(penalty=penalty, C=C, class_weight=class_weight, solver=solver) logistic = LogisticRegression() grid = GridSearchCV(estimator=logistic, param_grid=param_grid, scoring='roc_auc', verbose=1, n_jobs=-1) grid_result = grid.fit(X_train, y_train) print('Best Score: ', grid_result.best_score_) print('Best Params: ', grid_result.best_params_) Output: Best Score: 0.7860030747728861 Best Params: {'C': 1, 'class_weight': {1: 0.7, 0: 0.3}, 'penalty': 'l1', 'solver': 'liblinear'} The advantage of using grid search is that it guarantees in finding an optimal combination from the parameters that are supplied to it. The disadvantage is that it is time consuming when the size of the input dataset is large and it is computationally expensive. This can be overcome with the help of RandomSearch. Conclusion In this post, we understood the usage and significance of hyperparameter tuning, along with 2 important strategies which are used to tune the hyperparameters.

1. Machine Learning Overview

2. Machine Learning Terminologies

3. Demystifying Machine Learning

4. Applications of Machine Learning

5. Methods for Machine Learning

6. Underfitting and Overfitting in Machine Learning

7. Data Loading for ML Projects

8. Introduction to Data in Machine Learning

9. Why Data Pre-processing?

10. Normalization

11. Numpy

12. K-Nearest Neighbors (KNN)

13. Hyperparameter Tuning

14. Pre-procesing Data

15. What is Clustering in Machine Learning?

16. Overview - Regression & Logistic Regression

17. Linear Regression(Python Implementation)

18. Softmax Regression using TensorFlow

19. What is Linear Regression?

20. Linear Regression using PyTorch

21. Decision Trees

22. Introduction To Machine Learning using Python

23. Learning Model Building in Scikit-learn: A Python Machine Learning Library

24. Confusion matrix

25. Machine learning metrics

26. Improving Performance of ML Models

27. How to get synonyms/antonyms from NLTK WordNet in Python?

28. Removing stop words with NLTK in Python

29. Tokenize text using NLTK in Python

Hyperparameter Tuning

These hyperparameters are not learnt directly through the training of algorithms. These values are fixed before the training of the data begins. They deal with parameters such as learning_rate, i.e how quickly the model should be able to learn, how complicated the model is, and so on.

There can be a wide variety of hyperparameters for every learning algorithm. Selecting the right set of hyperparameters so as to gain good performance is an important aspect of machine learning.

In this post, we will look at the below-mentioned hyperparameter tuning strategies:

RandomizedSearchCV
GridSearchCV

Before jumping into understanding how these two strategies work, let us assume that we will perform hyperparameter tuning on logistic regression algorithm and stochastic gradient descent algorithm.

RandomizedSearchCV

RandomizedSearch searches for the specific subset of data in a random manner, instead of searching continuously (like how GridSearch does). This reduces the processing time of the hyperparameters. The scikit-learn module has RandomizedSearchCV function that can be used to implement random search. The hyperparameters are defined before searching them.

A parameter called ‘n_iter’ is used to specify the number of combinations that are randomly tried. If ‘n_iter’ is too less, finding the best combination is difficult, and if ‘n_iter’ is too large, the processing time increases. It is important to find a balanced value for ‘n_iter’:

import pandas as pd 
train = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv") 
test = pd.read_csv("C:\\Users\\Vishal\\Desktop\\test_1.csv") 
X_train = train.drop(['id', 'target'], axis=1) 
y_train = train['target'] 
X_test = test.drop(['id'], axis=1) 
loss = ['hinge', 'log', 'modified_huber', 'squared_hinge',  
'perceptron'] penalty = ['l1', 'l2', 'elasticnet'] 
alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] 
learning_rate = ['constant', 'optimal', 'invscaling', 'adaptive'] 
class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] eta0 = [1, 10, 100] 
param_distributions = dict(loss=loss, 
penalty=penalty, 
alpha=alpha, 
learning_rate=learning_rate, 
class_weight=class_weight, 
eta0=eta0) 
from sklearn.linear_model import SGDClassifier 
from sklearn.model_selection import RandomizedSearchCV 
sgd = SGDClassifier(loss="hinge", penalty="l2", max_iter=5) 
random = RandomizedSearchCV(estimator=sgd, 
param_distributions=param_distributions, 
scoring='roc_auc', 
verbose=1, n_jobs=-1, 
n_iter=1000) 
random_result = random.fit(X_train, y_train) 
print('Best Score: ', random_result.best_score_) 
print('Best Params: ', random_result.best_params_)

Output:

Best Score: 0.7981584905660377 
Best Params: {'penalty': 'elasticnet', 'loss': 'log', 'learning_rate': 'optimal', 'eta0': 1, 'class_weight': 
{1: 0.5, 0: 0.5}, 'alpha': 0.1}

GridSearchCV

This approach is considered to be the traditional way of performing hyperparameter optimization. It searches the specific subset of the hyperparameters continuously until a condition is met or the end is reached. Scikit-learn module has the GridSearchCV that can be used to implement this approach.

Our set of parameters is defined before searching over it.

import pandas as pd 
train = pd.read_csv("path to train.csv") 
test = pd.read_csv("path to test_1.csv") 
X_train = train.drop(['id', 'target'], axis=1) 
y_train = train['target'] 
X_test = test.drop(['id'], axis=1) 
from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression 
penalty = ['l1', 'l2'] 
C = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] 
class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] solver = ['liblinear', 'saga'] 
param_grid = dict(penalty=penalty, 
C=C, 
class_weight=class_weight, 
solver=solver) 
logistic = LogisticRegression() 
grid = GridSearchCV(estimator=logistic, 
param_grid=param_grid, 
scoring='roc_auc', 
verbose=1, 
n_jobs=-1) 
grid_result = grid.fit(X_train, y_train) 
print('Best Score: ', grid_result.best_score_) 
print('Best Params: ', grid_result.best_params_)

Output:

Best Score: 0.7860030747728861 
Best Params: {'C': 1, 'class_weight': {1: 0.7, 0: 0.3}, 'penalty': 'l1', 'solver': 'liblinear'}

The advantage of using grid search is that it guarantees in finding an optimal combination from the parameters that are supplied to it.

The disadvantage is that it is time consuming when the size of the input dataset is large and it is computationally expensive. This can be overcome with the help of RandomSearch.

Conclusion

In this post, we understood the usage and significance of hyperparameter tuning, along with 2 important strategies which are used to tune the hyperparameters.

12-A K-Nearest Neighbors (KNN)

14-A Pre-processing data

Your email address will not be published. Required fields are marked *

Comments

Vinu

After reading your article, I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article. Thanks for sharing.

Johnson M

Good and informative article.

Vinu

I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!

Vinu

Awesome blog. I enjoyed reading this article. This is truly a great read for me. Keep up the good work!

best data science courses in India

Thanks for sharing this article!! Machine learning is a branch of artificial intelligence (AI) and computer science that focus on the uses of data and algorithms. I came to know a lot of information from this article.

View More Comments

Search

Machine Learning Tutorial

By KnowledgeHut .

Machine Learning Tutorial

Hyperparameter Tuning

RandomizedSearchCV

GridSearchCV

Conclusion

Leave a Reply

Comments

Vinu

Johnson M

Vinu

Vinu

best data science courses in India