Machine Learning Tutorial

By KnowledgeHut .

Data which is fed to the learning algorithm as input should be consistent and structured. All the features of the input data should be on a single scale. But in real-world, data is unstructured, and most of the times, data is not on the same scale. This is where normalization comes into play. What is normalization? It is one of the most important data-preparation processes that helps in changing the values of the numerical columns of the input dataset to be on a same scale. It is also made sure that during the process of normalization, the range of values is not distorted. Note: Not all machine learning input datasets need to be normalized. Normalization is required onlywhen different features in a dataset have entirely different range of values. Consider this example: A person’s weight and their height. It is not necessary that the heights and weights need to be proportional. While predicting weight given the height, if the data is normalized, the patterns can be learned and predictions can be produced based on that. If the data is not normalized, unusual heights and their respective weights will influence the predictions which might not be accurate. There are different kinds of normalization and some of them have been listed below: Min-max normalization Z Normalization Unit vector normalizationMin-max Normalization: It helps in rescaling the data to fall between the range of 0 and 1. Most ofthe times, this is used on a specific feature or a set of features. Z Normalization: It is alsoknown as standardization, and it doesn’t change the type of distribution ofthe dataset. It makes sure that the mean of the dataset becomes 0 and the standard deviation of the dataset becomes 1. It can be applied on single feature or a set of features. Unit vector Normalization: When data is scaled, it either shrinks or expands. Every row of data canbe visualized as an n-dimensional vector. When normalization is applied on the entire dataset, this transformed data can be visualized as a set of vectors that have different directions. Let us look at an example of normalizing data: from sklearn import preprocessing import numpy as np import pandas as pd #Obtain the dataset df = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv", sep=",") Normalize the column- total_bedrooms x_array = np.array(df['0'] normalized_X = preprocessing.normalize([x_array]) Compare the ‘train.csv’ file and the normalized_X dataframe to see how the specific column ‘0’s data was normalized. Why should data be normalized? Normalized data makes sure that upon training the data, it is less sensitive to the feature’s scale, which means the value of coefficients can be found efficiently and effectively. Once data is normalized, if we need to find out which machine learning model would yield good results, normalized data would help in the analysis of these models much more efficiently. Optimization will be a feasible process, since the problem of convergence will not have a great effect on the variance. Conclusion In this post, we understood why normalization is important, how it affects the input dataset and how it can be used on simple CSV files.

1. Machine Learning Overview

2. Machine Learning Terminologies

3. Demystifying Machine Learning

4. Applications of Machine Learning

5. Methods for Machine Learning

6. Underfitting and Overfitting in Machine Learning

7. Data Loading for ML Projects

8. Introduction to Data in Machine Learning

9. Why Data Pre-processing?

10. Normalization

11. Numpy

12. K-Nearest Neighbors (KNN)

13. Hyperparameter Tuning

14. Pre-procesing Data

15. What is Clustering in Machine Learning?

16. Overview - Regression & Logistic Regression

17. Linear Regression(Python Implementation)

18. Softmax Regression using TensorFlow

19. What is Linear Regression?

20. Linear Regression using PyTorch

21. Decision Trees

22. Introduction To Machine Learning using Python

23. Learning Model Building in Scikit-learn: A Python Machine Learning Library

24. Confusion matrix

25. Machine learning metrics

26. Improving Performance of ML Models

27. How to get synonyms/antonyms from NLTK WordNet in Python?

28. Removing stop words with NLTK in Python

29. Tokenize text using NLTK in Python

Normalization

What is normalization?

It is one of the most important data-preparation processes that helps in changing the values of the numerical columns of the input dataset to be on a same scale. It is also made sure that during the process of normalization, the range of values is not distorted.

Note: Not all machine learning input datasets need to be normalized. Normalization is required onlywhen different features in a dataset have entirely different range of values.

Consider this example: A person’s weight and their height. It is not necessary that the heights and weights need to be proportional. While predicting weight given the height, if the data is normalized, the patterns can be learned and predictions can be produced based on that. If the data is not normalized, unusual heights and their respective weights will influence the predictions which might not be accurate.

There are different kinds of normalization and some of them have been listed below:

Min-max normalization
Z Normalization
Unit vector normalization

Min-max Normalization: It helps in rescaling the data to fall between the range of 0 and 1. Most ofthe times, this is used on a specific feature or a set of features.

Z Normalization: It is alsoknown as standardization, and it doesn’t change the type of distribution ofthe dataset. It makes sure that the mean of the dataset becomes 0 and the standard deviation of the dataset becomes 1. It can be applied on single feature or a set of features.

Unit vector Normalization: When data is scaled, it either shrinks or expands. Every row of data canbe visualized as an n-dimensional vector. When normalization is applied on the entire dataset, this transformed data can be visualized as a set of vectors that have different directions.

Let us look at an example of normalizing data:

from sklearn import preprocessing 
import numpy as np 
import pandas as pd 
#Obtain the dataset 
df = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv", sep=",") 
Normalize the column- total_bedrooms 
x_array = np.array(df['0'] 
normalized_X = preprocessing.normalize([x_array])

Compare the ‘train.csv’ file and the normalized_X dataframe to see how the specific column ‘0’s data was normalized.

Why should data be normalized?

Normalized data makes sure that upon training the data, it is less sensitive to the feature’s scale, which means the value of coefficients can be found efficiently and effectively.
Once data is normalized, if we need to find out which machine learning model would yield good results, normalized data would help in the analysis of these models much more efficiently.
Optimization will be a feasible process, since the problem of convergence will not have a great effect on the variance.

Conclusion

In this post, we understood why normalization is important, how it affects the input dataset and how it can be used on simple CSV files.

9-A Why Data Pre-processing?

11-A Numpy

Your email address will not be published. Required fields are marked *

Comments

Vinu

After reading your article, I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article. Thanks for sharing.

Johnson M

Good and informative article.

Vinu

I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!

Vinu

Awesome blog. I enjoyed reading this article. This is truly a great read for me. Keep up the good work!

best data science courses in India

Thanks for sharing this article!! Machine learning is a branch of artificial intelligence (AI) and computer science that focus on the uses of data and algorithms. I came to know a lot of information from this article.

View More Comments

Search

Machine Learning Tutorial

By KnowledgeHut .

Machine Learning Tutorial

Normalization

What is normalization?

Why should data be normalized?

Conclusion

Leave a Reply

Comments

Vinu

Johnson M

Vinu

Vinu

best data science courses in India