Data pre-processing is considered as one of the most important steps that needs to be achieved in any machine learning tasks.
Data pre-processing simply refers to the task of getting all the data (that has been collected from various resources) into a single format or into uniform sets of data (based on the type of data) so that it becomes easier for the learning algorithm to learn and predict results with high accuracy.
Real-world data is never ideal, it will have missing data cells, errors, outliers, discrepancies in names, and much more.
Data pre-processing isn’t a single task, but multiple different tasks, that need to be performed step by step. The output of one step would be the input of the next step and so on.
The steps are listed below:
Once the redundancy from the data is removed, relationship between these records is analyzed and matched so that they can be represented in one format.
When data has been collected from multiple resources (or even a single resource), it is never ideal (if it is real-time data). It will have some missing values, irrelevant data or unidentified characters as well.
This occurs due to humans not collecting data properly, or labelling data incorrectly. These missing and irrelevant parts of the data need to either be corrected or removed completely. Failure in doing so will result in the machine learning algorithm predicting output on new data which will not be highly accurate. This would be because the irrelevant and unidentified data (which is considered as noise) will also be considered as relevant data by the learning algorithm.
Noisy data can be handled in any different ways:
In this post, we understood the significance of pre-processing data and a few methods involved in pre-processing data.
Wow what great post about machine learning I never read such a blog before, this is very interesting!
Machine learning will definitely change the future. I like most of you blogs. I love this topic & especially the way you have explained it is really awesome. Thanks for sharing this info. It a nice time to spend on these interesting blogs. Thanks you.
This is my first comment here, so I just wanted to give a quick shout out and say I genuinely enjoy reading your articles. Your blog provided us useful information. You have done an outstanding job.
The information I got through this blog has really helped me in understanding this Machine Learning. That was something, I was desperately looking for, thankfully I found this at the right time.