One of the world’s most popular programming languages today, Python is a great tool for Machine Learning (ML) and Artificial Intelligence (AI). It is an open-source, reusable, general-purpose, object-oriented, and interpreted programming tool. Python’s key design ideology is code readability, ease of use and high productivity. The latest trend shows that the interest in Python has grown significantly over the past five years. Python is the top choice for ML/AI enthusiasts when compared to other programming languages.
Image source: Google Trends - comparing Python with other tools in the market
Python can be used to write Machine Learning algorithms and it computes pretty accurately. Python’s concise and easy readability allows the writing of reliable code very quickly. Another reason for its popularity is the availability of various versatile, ready-to-use libraries.
It has an excellent library ecosystem and a great tool for developing prototypes. Unlike R, Python is a general-purpose programming language which can be used to build web applications and enterprise applications.
The community of Python has developed libraries that adhere to a particular area of data science application. For instance, there are libraries available for handling arrays, performing numerical computation with matrices, statistical computing, machine learning, data visualization and many more. These libraries are highly efficient and make the coding much easier with fewer lines of codes.
Let us have a brief look at some of the important Python libraries that are used for developing machine learning models.
One should also take into account the importance of IDEs specially designed for Python for Machine Learning.
The Jupyter Notebook - an open-source web-based application that enables ML enthusiasts to create, share, quote, visualize, and live-code their projects.
There are various other IDEs that can be used like PyCharm, Spyder, Vim, Visual Studio Code. For beginners, there is a nice simple online compiler available – Programiz.
Different algorithms have different tasks. It is advisable to understand the context and select the right algorithm for the right task.
|Types of ML Problem||Description||Examples|
|Classification||Pick one of N labels||Predict if loan is going to be defaulted or not|
|Regression||Predict numerical values||Predict property price|
|Clustering||Group similar examples||Most relevant documents|
|Association rule learning||Infer likely association patterns in data||If you buy butter you are likely to buy bread (unsupervised|
|Structured Output||Create complex output||Natural language parse trees, images recognition bounding boxes|
|Ranking||Identify position on a scale or status||Search result ranking|
A. Regression (Prediction): Regression algorithms are used for predicting numeric values. For example, predicting property price, vehicle mileage, stock prices and so on.
B. Linear Regression – predicting a response variable, which is numeric in nature, using one or more features or variables. Linear regression model is mathematically represented as:
Various regression algorithms include:
As a note to new learners, it is suggested to understand the concepts of – Regression assumptions, Ordinary Least Square Method, Dummy Variables (n-1 dummy encoding, one hot encoding), and performance evaluation metrics (RMSE, MSE, MAD).
Various classification algorithms include:
Some of the classification algorithms are explained here:
Other distances include – Hamming distance, Manhattan distance, Minkowski distance
Example of K-NN classification. The test sample (green dot) should be classified either to blue squares or to red triangles. If k = 3 (solid line circle) it is assigned to the red triangles because there are 2 triangles and only 1 square inside the inner circle. In other words the number of triangles is more than the number of squares If k = 5 (dashed line circle) it is assigned to the blue squares (3 squares vs. 2 triangles inside the outer circle). It is to be noted that to avoid equal voting, the value of k should be odd and not even.
If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or YES, and if it is less than 0.5, we can classify it as 0 or NO
For instance, let us take cancer prediction. If the output of the Logistic Regression is 0.75, we can say in terms of probability that, “There is a 75 percent chance that the patient will suffer from cancer.”
Decision Tree – Is a type of supervised learning algorithm which is most commonly used in the case of a classification problem. Decision Tree algorithms can also be used for regression problems i.e. to predict a numerical response variable. In other words, Decision Tree works for both categorical and continuous input and output variables.
Each branch node of the decision tree represents a choice between some alternatives and each leaf node represents a decision.
As an early learner, it is suggested to understand the concept of ID3 algorithm, Gini Index, Entropy, Information Gain, Standard Deviation and Standard Deviation Reduction.
As a new learner it is important to understand the concept of bootstrapping.
The value of each feature is the value of a particular coordinate.
Classification is performed by finding hyperplanes that differentiate the two classes.
It is important to understand the concept of margin, support vectors, hyperplanes and tuning hyper-parameters (kernel, regularization, gamma, margin). Also get to know various types of kernels like linear kernel, radial basis function kernel and polynomial kernel
Naive Bayes – a supervised learning classifier which assumes features are independent and there is no correlation between them. The idea behind Naïve Bayes algorithm is the Bayes theorem.
Clustering algorithms are unsupervised algorithms that are used for dividing data points into groups such that the data points in each group are similar to each other and very different from other groups.
Some of the clustering algorithms include:
Other types of clustering algorithms:
Association algorithms, which form part of unsupervised learning algorithms, are for associating co-occurring items or events. Association algorithms are rule-based methods for finding out interesting relationships in large sets of data. For example, find out a relationship between products that are being bought together – say, people who buy butter also buy bread.
Some of the association algorithms are:
We recommend the use of anomaly detection for discovering abnormal activities and unusual cases like fraud detection.
An algorithm that can be used for anomaly detection:
Isolation Forest - This is an unsupervised algorithm that can help isolate anomalies from huge volume of data thereby enabling anomaly detection
We use sequential pattern mining for predicting the next data events between data examples in a sequence.
Predicting the next dose of medicine for a patient
Dimensionality reduction is used for reducing the dimension of the original data. The idea is to reduce the set of random features by obtaining a set of principal components or features. The key thing to understand in this is that the components retain or represent some meaningful properties of the original data. It can be divided into feature extraction and selection.
Algorithms that can be used for dimensionality reduction are:
Principal Component Analysis - This is a dimensionality reduction algorithm that is used to reduce the number of dimensions or variables in large datasets that have a very high number of variables. However it is to be noted that though PCA transforms a very large set of features or variables into smaller sets, it helps retain most of the information of the dataset. While the reduction of dimensions comes at a cost of model accuracy, the idea is to bring in simplicity in the model by reducing the number of variables or dimensions.
Recommender Systems are used to build recommendation engines. Recommender algorithms are used in various business areas that include online stores to recommend the right product to its buyers like Amazon , content recommendation for online video & music sites like Netflix, Amazon Prime Music and various social media platforms like FaceBook, Twitter and so on.
Recommender Engines can be broadly categorized into the following types:
5. Choose the Algorithm — Several machine learning models can be used with the given context. These models are chosen depending on the data (image, numerical values, texts, sounds) and the data distribution
6. Train the model — Training the model is a process in which the machine learns from the historical data and provides a mathematical model that can be used for prediction. Different algorithms use different computation methods to compute the weights for each of the variables. Some algorithms like Neural Network initialize the weight of the variables at random. These weights are the values which affect the relationship between the actual and the predicted values.
7. Evaluation metrics to evaluate the model— Evaluation process comprises understanding the output model and evaluating the model accuracy for the result. There are various metrics to evaluate model performance. Regression problems have various metrics like MSE, RMSE, MAD, MAPE as key evaluation metrics while classification problems have metrics like Confusion Matrix, Accuracy, Sensitivity (True Positive Rate), Specificity (True Negative Rate), AUC (Area under ROC Curve), Kappa Value and so on.
It is only after the evaluation, the model can be improved or fine-tuned to get more accurate predictions. It is important to know a few more concepts like:
When we talk about regression the most commonly used regression metrics are:
We must know when to use which metric. It depends on the kind of data and the target variable you have.