
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python TutorialScikit learn is a module in Python that is used for data analysis and data mining purposes.
pip install scikit-learn
Following are the steps in implementing learning algorithm using scikit-learn:
A dataset can be collected and loaded or a pre-defined dataset can be loaded. It has two features named features and responses.
Features are nothing but attributes or variables present in the dataset. They are represented using a ‘feature matrix’.
Response is also known as a target variable or a label. It is the output which depends on the feature variables. A single response column known as ‘response vector’ is present.
Data can be loaded in different ways and some of them have been demonstrated below:
There are built-in modules, such as ‘csv’, that contains a reader function, which can be used to read the data present in a csv file. The CSV file can be opened in read mode, and the reader function can be used. Below is an example demonstrating the same:
import numpy as np
import csv
path = path to csv file
with open(path,'r') as infile:
reader = csv.reader(infile,delimiter = ',')
headers = next(reader)
data = list(reader)
data = np.array(data).astype(float)
The headers or the column names can be printed using the following line of code:
print(headers)
The dimensions of the dataset can be determined using the shape attribute as shown in the following line of code:
print(data.shape)
Output:
250, 302
The nature of data can be determined by examining the first few rows of the dataset using the below line of code:
data[:2]
Using numpy package
The numpy package has a function named ‘loadtxt’ that can be used to read CSV data. Below is an example demonstrating the same using StringIO.
from numpy import loadtxt
from io import StringIO
c = StringIO("0 1 2 \n3 4 5")
data = loadtxt(c)
print(data.shape)
Output:
(2, 3)
Using pandas package
There are a few things to keep in mind while dealing with CSV files using Pandas package.
Let us look at an example to understand how the CSV file is read as a dataframe.
import numpy as np
import pandas as pd
#Obtain the dataset
df = pd.read_csv("path to csv file", sep=",")
df[:5]
Output:
id target 0 1 2 ... 295 296 297 298 299 0 0 1.0 -0.098 2.165 0.681 ... -2.097 1.051 -0.414 1.038 -1.065 1 1 0.0 1.081 -0.973 -0.383 ... -1.624 -0.458 -1.099 -0.936 0.973 2 2 1.0 -0.523 -0.089 -0.348 ... -1.165 -1.544 0.004 0.800 -1.211 3 3 1.0 0.067 -0.021 0.392 ... 0.467 -0.562 -0.254 -0.533 0.238 4 4 1.0 2.347 -0.831 0.511 ... 1.378 1.246 1.478 0.428 0.253
Loading a pre-defined dataset
It can be done using the below code.
from sklearn.datasets import load_iris
iris = load_iris()
#feature matrix and target is stored in 2 variables
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
#feature names and targets are printed
print("Feature names:", feature_names)
print("Target names:", target_names)
#numpy arrays x and y
print("\nType of X is:", type(X))
#first 5 input rows are printed to understand the type of data present in the dataset
print("\nFirst 5 rows of X:\n", X[:5])
The next important step in implementing a learning algorithm is to split the dataset into training, testing and validation dataset.
Data is split into different sets so that a part of the dataset can be trained upon, a part can be validated and a part can be used for testing purposes.
This is the input dataset which is fed to the learning algorithm. Once the dataset is pre-processed and cleaned, it is fed to the algorithm. Sometimes, predefined datasets are readily available on multiple websites which can be downloaded and used. Some predefined data sets need to be cleaned and verified but some of them are usually cleaned beforehand. The machine learning model learns from this data and tries to fit a model on this data.
This is similar to the test set, but it is used on the model frequently so as to knowhow well the model performs on never-before seen data. Based on the results obtained by passing the validation set to the learning algorithm, decision can be made as to how the algorithm can be made to learn better- the hyper parameters can be tweaked so that the model gives better results on this validation set in the next run, the features can be combined or new features can be created which better describe the data, thereby yielding better results.
Test data: This is the data on which the model’s performance/its ability to generalize is judged. In theend, the model’s performance can be determined based on how well it reacts to never-before-seen data. This is the data, which is used to test how well the model would generalize on new data. This is a way of knowing whether the model actually understood and learnt the patterns or it just overfit or underfit the data.
It is important to understand that good quality data (less to no noise, less to no redundancy, less to no discrepancies) in large amounts yields great results when the right learning algorithm is applied on the input data.
The dataset needs to be split into training and test datasets, so that once the training is completed on the training dataset, the performance of the learning model is tested on the test dataset. Usually, 80 percent of the data is used for training and 20 percent of the data is assigned for testing purposes. This can be achieved using the scikit-learn library, that has a function named train_test_split. The ‘test_size’ parameter helps in dividing the dataset into training and test datasets.
from
sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.2)
This is one of the most important steps in data pre-processing. It refers to standardizing the range of independent variables or features present within the dataset. When all the variables are transformed to
the same scale, it is easier to work with machine learning equations. This can be achieved using the ‘StandardScaler’ class that is present in the scikit-learn library. The training dataset has to first be fit on the learning model and then transformed. On the other hand, the test dataset needs to just be transformed.
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
Model training
Let us look at how a model can be trained sing KNN algorithm.
from sklearn.datasets import load_iris
iris = load_iris() #loading the iris dataset
The feature matrix and respons evectors are stored X = iris.data
y = iris.target
#x and y are split into training and testing datasets from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
#model is trained on the training data
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
#predictions are made on the test data
y_pred = knn.predict(X_test)
#actual response and predicted response is compared from sklearn import metrics
print("kNN model accuracy:", metrics.accuracy_score(y_test, y_pred))
#predictions for sample data
sample = [[3, 5, 4, 2], [2, 3, 5, 4]]
preds = knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds]
print("Predictions:", pred_species)
#the model is saved
from sklearn.externals import joblib
joblib.dump(knn, 'iris_knn.pkl')
Output:
kNN model accuracy: 0.9833333333333333
Predictions: ['versicolor', 'virginica']
Out[13]: ['iris_knn.pkl']
Advantages of using scikit-learn
In this post, we saw how scikit-learn can be used to implement machine learning algorithms with ease.