top

Search

Machine Learning Tutorial

It is one of the simplest metrics that helps in finding how correct and how accurate the model is. It is used with classification problems wherein the output (or the class into which the dataset has been classified into) could belong any of the classes (one of the 2 in binary classification and one of the many in multi-class classification). The confusion matrix is not a performance measure on its own, but most of the performance metrics are based on this matrix and the value this matrix gives out. Terminologies associated with the confusion matrix: True positives: Let us understand this with respect to a binary classification example- There are 2 classesnamely, True and False. True positive is the case wherein the predicted class is ‘True’ and the actual class to which the data item belongs to is also ‘True’. A non-spam email was correctly identified as ‘non-spam’. True positive rate, which is also known as sensitivity or recall can be defined as the ratio of truepositives and sum of true positives and false negatives: TPR = True positives/ (True positives + False negatives) True negatives: This also can be understood with respect to a binary classification example. Considertwo classes namely ‘True’ and ‘False’. True negative is the case wherein the predicted class is ‘False’, and the actual class to which the model belongs to is also ‘False’. A spam email was correctly identified as ‘spam’. True negative rate, which is also known as specificity or selectivity can be defined as the ratio of truenegatives and sum of true negatives and false positives: TNR = True negatives/ (True negatives + False positives) False positives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False positive values are those which have been predicted as belonging to the ‘True’ class, but they actually belong to the ‘False class. A non-spam email was incorrectly identified as a spam email. False positive rate, which is also known as fall-out can be defined as the ratio of false positives and sumof false positives and true negatives: FPR = False positives / (False positives + True negatives) False negatives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False negative values are those which have been predicted as belonging to the ‘False class, but they actually belong to the ‘True’ class. A spam email was incorrectly identified as a non-spam email. False negative rate, which is also known as miss rate can be defined as the ratio of false negatives andsum of false negatives and true positives: FNR = False negatives / (False negatives + True positives) The ideal situation is when the model gives 0 false positive values and 0 false negative values. But this wouldn’t be the case in real-life. The confusion matrix contains enough information so as to calculate precision and recall values as well. The below image shows what a confusion matrix would look like while classifying an animal as a cat or a dog. ConclusionIn this post, we understood about confusion matrix and how it can be used to determine the performance of a model.
logo

Machine Learning Tutorial

Confusion matrix

It is one of the simplest metrics that helps in finding how correct and how accurate the model is. It is used with classification problems wherein the output (or the class into which the dataset has been classified into) could belong any of the classes (one of the 2 in binary classification and one of the many in multi-class classification). 

The confusion matrix is not a performance measure on its own, but most of the performance metrics are based on this matrix and the value this matrix gives out. 

Terminologies associated with the confusion matrix: 

  • True positives: Let us understand this with respect to a binary classification example- There are 2 classesnamely, True and False. True positive is the case wherein the predicted class is ‘True’ and the actual class to which the data item belongs to is also ‘True’. A non-spam email was correctly identified as ‘non-spam’. 
  • True positive rate, which is also known as sensitivity or recall can be defined as the ratio of truepositives and sum of true positives and false negatives: 

TPR = True positives/ (True positives + False negatives) 

  • True negatives: This also can be understood with respect to a binary classification example. Considertwo classes namely ‘True’ and ‘False’. True negative is the case wherein the predicted class is ‘False’, and the actual class to which the model belongs to is also ‘False’. A spam email was correctly identified as ‘spam’. 
  • True negative rate, which is also known as specificity or selectivity can be defined as the ratio of truenegatives and sum of true negatives and false positives: 

TNR = True negatives/ (True negatives + False positives) 

  • False positives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False positive values are those which have been predicted as belonging to the ‘True’ class, but they actually belong to the ‘False class. A non-spam email was incorrectly identified as a spam email. 
  • False positive rate, which is also known as fall-out can be defined as the ratio of false positives and sumof false positives and true negatives: 

FPR = False positives / (False positives + True negatives) 

  • False negatives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False negative values are those which have been predicted as belonging to the ‘False class, but they actually belong to the ‘True’ class. A spam email was incorrectly identified as a non-spam email. 
  • False negative rate, which is also known as miss rate can be defined as the ratio of false negatives andsum of false negatives and true positives: 

FNR = False negatives / (False negatives + True positives) 

The ideal situation is when the model gives 0 false positive values and 0 false negative values. But this wouldn’t be the case in real-life. The confusion matrix contains enough information so as to calculate precision and recall values as well. 

The below image shows what a confusion matrix would look like while classifying an animal as a cat or a dog. 

Conclusion

In this post, we understood about confusion matrix and how it can be used to determine the performance of a model.

Leave a Reply

Your email address will not be published. Required fields are marked *