It is one of the simplest metrics that helps in finding how correct and how accurate the model is. It is used with classification problems wherein the output (or the class into which the dataset has been classified into) could belong any of the classes (one of the 2 in binary classification and one of the many in multi-class classification).
The confusion matrix is not a performance measure on its own, but most of the performance metrics are based on this matrix and the value this matrix gives out.
Terminologies associated with the confusion matrix:
- True positives: Let us understand this with respect to a binary classification example- There are 2 classesnamely, True and False. True positive is the case wherein the predicted class is ‘True’ and the actual class to which the data item belongs to is also ‘True’. A non-spam email was correctly identified as ‘non-spam’.
- True positive rate, which is also known as sensitivity or recall can be defined as the ratio of truepositives and sum of true positives and false negatives:
TPR = True positives/ (True positives + False negatives)
- True negatives: This also can be understood with respect to a binary classification example. Considertwo classes namely ‘True’ and ‘False’. True negative is the case wherein the predicted class is ‘False’, and the actual class to which the model belongs to is also ‘False’. A spam email was correctly identified as ‘spam’.
- True negative rate, which is also known as specificity or selectivity can be defined as the ratio of truenegatives and sum of true negatives and false positives:
TNR = True negatives/ (True negatives + False positives)
- False positives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False positive values are those which have been predicted as belonging to the ‘True’ class, but they actually belong to the ‘False class. A non-spam email was incorrectly identified as a spam email.
- False positive rate, which is also known as fall-out can be defined as the ratio of false positives and sumof false positives and true negatives:
FPR = False positives / (False positives + True negatives)
- False negatives: This can be understood with the help of a binary classification example that consists oftwo classes namely True and False. False negative values are those which have been predicted as belonging to the ‘False class, but they actually belong to the ‘True’ class. A spam email was incorrectly identified as a non-spam email.
- False negative rate, which is also known as miss rate can be defined as the ratio of false negatives andsum of false negatives and true positives:
FNR = False negatives / (False negatives + True positives)
The ideal situation is when the model gives 0 false positive values and 0 false negative values. But this wouldn’t be the case in real-life. The confusion matrix contains enough information so as to calculate precision and recall values as well.
The below image shows what a confusion matrix would look like while classifying an animal as a cat or a dog.
In this post, we understood about confusion matrix and how it can be used to determine the performance of a model.
Awesome blog. I enjoyed reading this article. This is truly a great read for me. Keep up the good work!
best data science courses in India
Thanks for sharing this article!! Machine learning is a branch of artificial intelligence (AI) and computer science that focus on the uses of data and algorithms. I came to know a lot of information from this article.
Hey, Good effort to educate people on Machine Learning, it feels more like a course curriculum in here. Good job!.
Checking for course
Wow what great post about machine learning I never read such a blog before, this is very interesting!