An old saying, "Unity is Strength" pretty well sums up the basic ideas that govern the very powerful "ensemble method" in machine learning. Broadly speaking, ensemble learning techniques, which often rely on top rankings in many machine learning competitions (including the Kaggle competition), are based on the hypothesis that combining multiple models can often produce a more powerful model.
The idea behind model stacking machine learning or the “ensemble method in machine learning,” is to handle a machine learning problem by using different models. We can use these models to make intermediate predictions and then add a new model that can learn using the intermediate predictions to solve complex problems with ease. You can check Data Science Bootcamp syllabus if you are interested in learning about stacked model machine learning.
The bagging method works by applying multiple models with high variance and averaging these high variances to reduce the variance, alongside strengthening the work by creating more incremental models to reduce bias.
On the other hand, Boosting is an ensemble learning method that combines a group of weak learners into one strong learner to minimize training errors. It selects a random sample of data, fits the model, and then trains it sequentially. That is, each model tries to compensate for the weaknesses of the previous model. At each iteration, weak rules from individual classifiers are combined into strong predictive rules.
What is a Machine Learning Stack?
It is a powerful way to improve the performance of your model. It is a machine learning paradigm where multiple models are trained to solve the same problem and combined to achieve better results. It is also the art of bringing a diverse group of students together to improvise on the stability and predictive power of a model. Nowadays, Machine learning is a trending skill and is here to stay.
When it comes stacking, is classified into 4 different parts:
- Scikit- Learn API
- Classification of Stacking
- Regression of Stacking
The basic technique of Stacking in Machine Learning;
- Divide the training data into 2 disjoint sets.
- The level to which you train data depends on the base learner.
- Test base learner and make a prediction.
- Collect correct responses from the output.
Approaches to Building a Machine Learning Stack
Organizations depend on third-party tools and APIs to implement machine learning technology stack features and optimize their ML products. These integrations mainly include the following two approaches to complete the tech stack for machine learning:
1. Vertical Integration of Tools
Vertically integrated tools support all three levels of the Machine learning model management– data, model, and deployment. The user provides data to these tools as input. Now, by cooking this input data, they generate predictions and subsequently results. Vertically integrated tools either specialize in a specific type of data or serve use cases or industries. For example, the Clarifai is widely used machine learning model monitoring tool which specializes in images.
Vertically Integrated Tools Forming ML Stack
- No need to develop from scratch, ensuring faster implementation
- Access to limited and aggregated datasets help ensure better performance
- Lack of customization
- Targeted use of tools for specific use cases or data
2. Horizontal Integration of ML Tools
These tools do not conform to the three-layer stack, but only target a specific layer of ML stack. For example, an ML model layer supported by TensorFlow or ML monitoring supported by Celsius.
Horizontally Integrated Tools Forming ML Stack
- Customization menu to select tools and use them according to need and preference
- Third-party application ecosystem supported
- Amount of resources needed to build an ML stack – knowledge, cost of tools
- Limited access to datasets would not yield good results, unlike vertically integrated stacks
When Do You Need to Implement ML Tech Stack?
Given many machines learning models that are skilled at solving a problem but in different ways, how do you choose which model to use (trust)? It is one of the most important stacking machine learning examples many enthusiasts try to solve.
The approach to this question is to use a different machine learning model that learns when to use or trust each model in the ensemble.
The simplest example of the ensemble learning stacking method is for classification problems, you can choose KNN classifiers, logistic regression, and SVM as weak learners, and decide to learn neural networks as metamodels. The neural network then learns to take the outputs from three weak learners as inputs and return a final prediction based on them.
Stacking differs from bagging and boosting in two main ways. First, stacking often considers heterogeneous weak learners (different learning algorithms are combined), whereas bagging and boosting mainly consider homogeneous weak learners. Stacking then learns to combine base models using metamodels, while bagging and boosting combine weak learners according to deterministic algorithms.
Implementation of ML Stack in 8 Steps
Implementation includes the following steps:
Step 1: The original train data is divided into n-folds using RepeatedStratifiedKFold.
Let us first understand what is RepeatedStratifiedKFold ?
Suppose you want to take a survey and decided to call 100 people from a particular company, if you pick either 100 males completely or 100 females completely or 90 females and 10 males (randomly) to ask their opinion on a particular movie. But based on these 100 opinions you can’t decide the opinion of that entire company on your movie. This is Random Sampling.
Whereas, in Stratified Sampling, let the population for that company be 51.3% male and 48.7% female, then for choosing 100 people from that state if you pick 51 male ( 51% of 100 ) and 48 female ( 49% for 100 ) i.e 51 male + 49 female (Total=100 people) to ask their opinion, then these group of people represents the entire state. This is called Stratified Sampling. Check Data Science course online for a better understanding of the topic.
So RepeatedStratifiedKFold is a python library that is mainly used to do Stratified Sampling repetitively on randomized data.
Step 2: Base pupil (Model 1) is placed on the first n-1 folds, and predictions are made for the nth part.
Step 3: This prediction is added to the x1_train list.
Step 4: Steps 2 and 3 are repeated for the rest of the n-1 parts, and we get the array x1_train of size n
where x1_train[i] is the prediction on the (i+1)th part when model 1 is fitted on 1,2...,i-1,i+1...n parts.
Step 5: Now, train the model on all n components and make predictions for the test data. Store this prediction in y1_test.
Step 6: Similarly, we get x2_train, y2_test, x3_train, and y3_test using models 2 and 3 for training to get level 2 predictions.
Step 7: We now train the Meta Learner on level 1 predictions (we use these predictions as features for the model).
Step 8: Meta Learner is now used to predict test data.
Full (Basic) ML Stack Architecture
- Original Data: The original split is split into n-folds
- Base Models: Level 1 individual Models
- Level 1 Predictions: Predictions generated by base models on original data
- Level 2 Model: Meta-Learner, the model which combines Level 1 predictions to generate best final Predictions
A folding model architecture includes two or more base models, often referred to as level 0 models, and a metamodel that combines predictions of the base models, referred to as a level 1 model.
- Level 0 Models (Base Models): The models correspond to the training data and whose predictions are built.
- Level 1 Model (Meta-Model): A model that learns how it is best to combine the predictions of the underlying models.
The meta-model is trained on the predictions made by underlying models on out-of-sample data. That is, data not used to train the base models are fed into the base models, predictions are made, and these predictions, along with the expected outputs also provide the input and output of the meta-models
The outputs from the underlying models used as input to the meta-model can have a real value in the case of regression and a probability value, a probability-like value, or a class label in the case of classification.
The most common approach to preparing a training data set for a meta-model is k-fold cross-validation of the base models, where the out-of-fold predictions are used as the basis for the training data set for the meta-model.
The training data for the meta-model may also include inputs to the underlying models, e.g., training data input elements. This can provide additional meta-model context on how best to combine predictions from the meta-model.
Once the training data set is ready for the meta-model, the meta-model can be trained in isolation on this data set, and the base models can be trained on the entire original training data set.
Stacking ensemble machine learning is appropriate when several different machine learning models have skills on a dataset, but have skills in different ways. Another way of saying this is that the predictions made by the models, or the errors in the predictions made by the models, are uncorrelated or have a low correlation.
The underlying models are often complex and diverse. As such, it is often a good idea to use a variety of models that make very different assumptions about how to solve a predictive modeling task, such as linear models, decision trees, support vector machines, neural networks, and others. Other ensemble algorithms, such as random forests, can also be used as base models
- Basic Models: Use a diverse range of models that make different assumptions about the prediction task.
The meta-model is often simple and provides a smooth interpretation of the predictions made by the underlying models. As such, linear models such as linear regression for regression tasks (predicting a numerical value) and logistic regression for classification tasks (predicting a class label) are often used as a metamodel. Although this is common, it is not necessary.
- Regression Meta-model: Linear regression.
- Classification Metamodel: Logistic Regression
Using a simple linear model as a meta-model often gives layering the colloquial name "blending". As in prediction, the weighted average or blend is the prediction made by the underlying models. A super student can be considered a specialized type of stacking.
Stacking in machine learning is designed to improve modeling performance, although it is not guaranteed to lead to improvement in all cases.
Achieving performance improvement depends on the complexity of the problem and whether it is well-represented by the training data and complex enough to learn more by combining predictions. It also depends on the choice of the underlying models and whether they are sufficiently skillful and sufficiently uncorrelated in their predictions (or errors).
If the base model performs well or better than the stacking in ensemble learning, the base model should be used instead because it is less complex (for example: easier to describe, train, and maintain).
Machine Learning Stack Resources
Dive deeper into the Machine Learning engineering stack to have a proper understanding of how it is used and where it is used. Find out the below list of resources:
1. CometML: Comet.ML is the machine learning platform dedicated to data scientists and researchers to help them seamlessly track performance, modify code, and manage history, models, and databases.
It is somewhat similar to GitHub, which allows training models, tracks code changes, and graphs the dataset. Comet.ml can be easily integrated with other machine learning libraries to maintain the workflow and develop insights for your data. Comet.ml can work with GitHub and other git services, and a developer can merge the pull request easily with your GitHub repository. You can get help from the comet.ml official website regarding the documentation, download, installation, and cheat sheet.
2. GitHub: GitHub is an internet hosting and version control system for software developers. Using Git business and open-source communities, both can host and manage their project, review their code and deploy their software. There are more than 31 million who actively deploy their software and projects on GitHub. The GitHub platform was created in 2007, and in 2020 GitHub made all the core features free to use for everyone. You can add your private repository and perform unlimited collaborations. You can get help from the GitHub official website, or you can learn the basics of GitHub from many websites like FreeCodeCamp or the GitHub documentation.
3. Hadoop: Hadoop provides you with a facility to store data and run an application on a commodity hardware cluster. Hadoop is powered by Apache which can be described as a software library or a framework that enables you to process data or large datasets. Hadoop environment can be scaled from one to a thousand commodities providing computing power and local storage capacity.
To conclude, the purpose of the machine learning stack is to create more accurate predictive models. Stacking is a generic technique for converting good models into great models. it is a method that iteratively trains models to fix the errors made by previously-trained models. In stacking, the errors of the first-level model become the input of the second-level model, and so on. You can check KnowledgeHut’s Data Science Bootcamp syllabus to understand the topics underlying Stacking in Machine Learning and gain a deep understanding of ML tech stack.
Frequently Asked Questions (FAQs)
1. What is a target function in the machine learning stack
In machine learning, we have training data and Test data. On the training dataset, we implement our algorithm, and on the test data set, we test it(Run it for actual prediction). The function which we create in the training dataset to test on the test dataset it is called as target function in machine learning
2. What are the three layers of the AWS machine learning stack?
Amazon provides machine learning resources in three "layers of the AI stack" – the first one is framework tools, the second one API-driven services, and the third is machine learning platforms.
3. What are examples of a Machine Learning Stack?
Let’s say you’re modeling a large data set with an ensemble of simple base models (e.g., KNN, single decision tree, linear regression, etc.). The individual base models may do a relatively poor job of fitting the data. So, when you stack them together, the potential for improvement over the best individual base model could be relatively large.
4. How important is learning stack machines for a programmer?
It is very important because, using a machine learning stack, you can easily solve complex machine-learning problems.