top

Search

Machine Learning Tutorial

Whenever a machine learning algorithm is implemented on a specific dataset, the performance is judged based on how well it generalizes, i.e how it reacts to new, never-before-seen data. In case the performance of the learning algorithm is not satisfactory or there is room for improvement, certain parameters in the algorithm need to be changed/tuned/tweaked. These parameters are known as ‘hyperparameters’ and the process of varying these hyperparameters to better the learning algorithm’s performance is known as ‘hyperparameter tuning’. These hyperparameters are not learnt directly through the training of algorithms. These values are fixed before the training of the data begins. They deal with parameters such as learning_rate, i.e how quickly the model should be able to learn, how complicated the model is, and so on. There can be a wide variety of hyperparameters for every learning algorithm. Selecting the right set of hyperparameters so as to gain good performance is an important aspect of machine learning. In this post, we will look at the below-mentioned hyperparameter tuning strategies: RandomizedSearchCV GridSearchCV Before jumping into understanding how these two strategies work, let us assume that we will perform hyperparameter tuning on logistic regression algorithm and stochastic gradient descent algorithm. RandomizedSearchCV RandomizedSearch searches for the specific subset of data in a random manner, instead of searching continuously (like how GridSearch does). This reduces the processing time of the hyperparameters. The scikit-learn module has RandomizedSearchCV function that can be used to implement random search. The hyperparameters are defined before searching them. A parameter called ‘n_iter’ is used to specify the number of combinations that are randomly tried. If ‘n_iter’ is too less, finding the best combination is difficult, and if ‘n_iter’ is too large, the processing time increases. It is important to find a balanced value for ‘n_iter’: import pandas as pd  train = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv")  test = pd.read_csv("C:\\Users\\Vishal\\Desktop\\test_1.csv")  X_train = train.drop(['id', 'target'], axis=1)  y_train = train['target']  X_test = test.drop(['id'], axis=1)  loss = ['hinge', 'log', 'modified_huber', 'squared_hinge',   'perceptron'] penalty = ['l1', 'l2', 'elasticnet']  alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000]  learning_rate = ['constant', 'optimal', 'invscaling', 'adaptive']  class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] eta0 = [1, 10, 100]  param_distributions = dict(loss=loss,  penalty=penalty,  alpha=alpha,  learning_rate=learning_rate,  class_weight=class_weight,  eta0=eta0)  from sklearn.linear_model import SGDClassifier  from sklearn.model_selection import RandomizedSearchCV  sgd = SGDClassifier(loss="hinge", penalty="l2", max_iter=5)  random = RandomizedSearchCV(estimator=sgd,  param_distributions=param_distributions,  scoring='roc_auc',  verbose=1, n_jobs=-1,  n_iter=1000)  random_result = random.fit(X_train, y_train)  print('Best Score: ', random_result.best_score_)  print('Best Params: ', random_result.best_params_) Output: Best Score: 0.7981584905660377 Best Params: {'penalty': 'elasticnet', 'loss': 'log', 'learning_rate': 'optimal', 'eta0': 1, 'class_weight': {1: 0.5, 0: 0.5}, 'alpha': 0.1} GridSearchCV This approach is considered to be the traditional way of performing hyperparameter optimization. It searches the specific subset of the hyperparameters continuously until a condition is met or the end is reached. Scikit-learn module has the GridSearchCV that can be used to implement this approach. Our set of parameters is defined before searching over it. import pandas as pd  train = pd.read_csv("path to train.csv")  test = pd.read_csv("path to test_1.csv")  X_train = train.drop(['id', 'target'], axis=1)  y_train = train['target']  X_test = test.drop(['id'], axis=1)  from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression  penalty = ['l1', 'l2']  C = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000]  class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] solver = ['liblinear', 'saga']  param_grid = dict(penalty=penalty,  C=C,  class_weight=class_weight,  solver=solver)  logistic = LogisticRegression()  grid = GridSearchCV(estimator=logistic,  param_grid=param_grid,  scoring='roc_auc',  verbose=1,  n_jobs=-1)  grid_result = grid.fit(X_train, y_train)  print('Best Score: ', grid_result.best_score_)  print('Best Params: ', grid_result.best_params_) Output: Best Score: 0.7860030747728861  Best Params: {'C': 1, 'class_weight': {1: 0.7, 0: 0.3}, 'penalty': 'l1', 'solver': 'liblinear'} The advantage of using grid search is that it guarantees in finding an optimal combination from the parameters that are supplied to it. The disadvantage is that it is time consuming when the size of the input dataset is large and it is computationally expensive. This can be overcome with the help of RandomSearch. Conclusion In this post, we understood the usage and significance of hyperparameter tuning, along with 2 important strategies which are used to tune the hyperparameters. 
logo

Machine Learning Tutorial

Hyperparameter Tuning

Whenever a machine learning algorithm is implemented on a specific dataset, the performance is judged based on how well it generalizes, i.e how it reacts to new, never-before-seen data. In case the performance of the learning algorithm is not satisfactory or there is room for improvement, certain parameters in the algorithm need to be changed/tuned/tweaked. These parameters are known as ‘hyperparameters’ and the process of varying these hyperparameters to better the learning algorithm’s performance is known as ‘hyperparameter tuning’. 

These hyperparameters are not learnt directly through the training of algorithms. These values are fixed before the training of the data begins. They deal with parameters such as learning_rate, i.e how quickly the model should be able to learn, how complicated the model is, and so on. 

There can be a wide variety of hyperparameters for every learning algorithm. Selecting the right set of hyperparameters so as to gain good performance is an important aspect of machine learning. 

In this post, we will look at the below-mentioned hyperparameter tuning strategies: 

  • RandomizedSearchCV 
  • GridSearchCV 

Before jumping into understanding how these two strategies work, let us assume that we will perform hyperparameter tuning on logistic regression algorithm and stochastic gradient descent algorithm. 

RandomizedSearchCV 

RandomizedSearch searches for the specific subset of data in a random manner, instead of searching continuously (like how GridSearch does). This reduces the processing time of the hyperparameters. The scikit-learn module has RandomizedSearchCV function that can be used to implement random search. The hyperparameters are defined before searching them. 

A parameter called ‘n_iter’ is used to specify the number of combinations that are randomly tried. If ‘n_iter’ is too less, finding the best combination is difficult, and if ‘n_iter’ is too large, the processing time increases. It is important to find a balanced value for ‘n_iter’: 

import pandas as pd 
train = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv") 
test = pd.read_csv("C:\\Users\\Vishal\\Desktop\\test_1.csv") 
X_train = train.drop(['id', 'target'], axis=1) 
y_train = train['target'] 
X_test = test.drop(['id'], axis=1) 
loss = ['hinge', 'log', 'modified_huber', 'squared_hinge',  
'perceptron'] penalty = ['l1', 'l2', 'elasticnet'] 
alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] 
learning_rate = ['constant', 'optimal', 'invscaling', 'adaptive'] 
class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] eta0 = [1, 10, 100] 
param_distributions = dict(loss=loss, 
penalty=penalty, 
alpha=alpha, 
learning_rate=learning_rate, 
class_weight=class_weight, 
eta0=eta0) 
from sklearn.linear_model import SGDClassifier 
from sklearn.model_selection import RandomizedSearchCV 
sgd = SGDClassifier(loss="hinge", penalty="l2", max_iter=5) 
random = RandomizedSearchCV(estimator=sgd, 
param_distributions=param_distributions, 
scoring='roc_auc', 
verbose=1, n_jobs=-1, 
n_iter=1000) 
random_result = random.fit(X_train, y_train) 
print('Best Score: ', random_result.best_score_) 
print('Best Params: ', random_result.best_params_) 

Output: 

Best Score: 0.7981584905660377 
Best Params: {'penalty': 'elasticnet', 'loss': 'log', 'learning_rate': 'optimal', 'eta0': 1, 'class_weight': 
{1: 0.5, 0: 0.5}, 'alpha': 0.1} 

GridSearchCV 

This approach is considered to be the traditional way of performing hyperparameter optimization. It searches the specific subset of the hyperparameters continuously until a condition is met or the end is reached. Scikit-learn module has the GridSearchCV that can be used to implement this approach. 

Our set of parameters is defined before searching over it. 

import pandas as pd 
train = pd.read_csv("path to train.csv") 
test = pd.read_csv("path to test_1.csv") 
X_train = train.drop(['id', 'target'], axis=1) 
y_train = train['target'] 
X_test = test.drop(['id'], axis=1) 
from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression 
penalty = ['l1', 'l2'] 
C = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] 
class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] solver = ['liblinear', 'saga'] 
param_grid = dict(penalty=penalty, 
C=C, 
class_weight=class_weight, 
solver=solver) 
logistic = LogisticRegression() 
grid = GridSearchCV(estimator=logistic, 
param_grid=param_grid, 
scoring='roc_auc', 
verbose=1, 
n_jobs=-1) 
grid_result = grid.fit(X_train, y_train) 
print('Best Score: ', grid_result.best_score_) 
print('Best Params: ', grid_result.best_params_) 

Output: 

Best Score: 0.7860030747728861 
Best Params: {'C': 1, 'class_weight': {1: 0.7, 0: 0.3}, 'penalty': 'l1', 'solver': 'liblinear'} 

The advantage of using grid search is that it guarantees in finding an optimal combination from the parameters that are supplied to it. 

The disadvantage is that it is time consuming when the size of the input dataset is large and it is computationally expensive. This can be overcome with the help of RandomSearch. 

Conclusion 

In this post, we understood the usage and significance of hyperparameter tuning, along with 2 important strategies which are used to tune the hyperparameters. 

Leave a Reply

Your email address will not be published. Required fields are marked *