top
Data Harvesting and R Essentials training
Rated 3.0/5 based on 180 customer reviews

Data Harvesting and R Essentials training

Give your business the competitive edge with effective data harvesting tools and techniques increase efficiency and improve results!

Contact Course Advisor schedules

Modes of Delivery

Classroom

Our classroom training provides you the opportunity to interact with instructors face-to-face.

Online Classroom

Collaborative, enriching virtual sessions, led by world class instructors at time slots to suit your convenience.

Description

The need for data harvesting and mining is growing in a broad range of areas, ranging from banking to insurance, retail, telecom, medicine, research, and government. This course provides aspiring data engineers and data scientists with the knowledge and tools they need to do proper data ingestion and analytics using R. R is a free software environment for statistical computing and graphics that is growing in popularity, as it gives quick results and is well supported by a worldwide community of users and developers.

By scouring databases for hidden patterns, data mining experts can find predictive information that helps to make informed business decisions and enhance profitability. You will understand practical and efficient methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Participants will be introduced to data scraping, APIs, and parsing along with various topics in data analytics such as statistical analysis, plotting, linear regression/ANOVA, PCA (Principal Component Analysis), Clustering, Factor Analysis, and time series/ARIMA. An overview of data scraping and parsing in Python using BeautifulSoup is also covered.

On successful completion of the course, you will receive a Course Completion Certificate from KnowledgeHut.

What you will learn:
  • Use of rvest and BeautifulSoup for data scraping
  • Connecting to APIs and feeds
  • ETL, Data preparation, cleansing, and structuring in R
  • EDA (Exploratory Data Analysis) in R
  • Graphing in R
  • Linear regression/ANOVA
  • Time Series/ARIMA
You will also get:
  • Comprehensive, downloadable courseware
  • In-depth case studies for better retention
  • Course completion certificate
  • 1 credit per hour of learning

Key Features

5 days of quality interactive learning
Course completion certificate
Get deep insights into best practices in Data harvesting and analytics using R
Covers practical and efficient methods for applying R to extract data
Downloadable comprehensive courseware
Hands-on exercises to cement your learning

Curriculum

What is Data Harvesting? (30 min)
Introduction to ETL and Data Harvesting Tools (60 min)
  • Rvest, BeautifulSoup, tidyr, stringr, etc
  • Scraping static pages, APIs, feeds
  • JSON/XML extraction
BREAK (15 min)
Data extraction and structuring (60 min)
Data cleansing and preparation (60 min)
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to extract data from various online data sources, such as static web pages and APIs
  • Participants will be asked to structure that data in a tabular format
Data structures and String Manipulation in R (60 min)
Hands-on Exercise (45 min)
  • Participants will be asked to work with data structures in R and manipulate the data to get it into the right format
Curating datasets (60 min)
Where to find open data and publicly available datasets (60 min)
BREAK (15 min)
Input/Output in R and working with packages and libraries (60 min)
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to read in and manipulate different types of files, such as JSON, XML, CSV, TSV, etc.
  • Participants will be asked to download open datasets and perform basic manipulation on the data
BREAK (15 min)
Performing Data Transformations and working with Dates and Probabilities (60 min)
Hands-on Exercises (45 min)
  • Participants will be asked to convert between data structures in R and applying functions
  • Participants will be asked to perform Date manipulation on sample data
What is R? (30 min)
Introduction to R and RStudio (60 min)
BREAK (15 min)
Basics of working with R and Rstudio (60 min)
  • Printing output
  • Variables
  • Vectors
  • Functions
Data Structures in R (60 min)
  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Data Frames
  • Factors and Tables
LUNCH (60 min)
Hands-On Exercise (60 min)
  • Participants will be asked to work with basic data structures in R
BREAK (15 min)
Data Structures in R contd... (60 min)
  • R programming and functions
  • Math functions and statistics
  • Sorting, linear algebra, and set operations
Hands-on Exercise (45 min)
  • Participants will be asked to practice creating functions in R and using some of the built-in functions for statistics, sorting, and set operations
Data Cleansing and Dealing with Missing Data (30 min)
Introduction to EDA – Exploratory Data Analysis (60 min)
BREAK (15 min)
EDA and Statistical Analysis (60 min)
  • Data formatting and conversion
  • Graphing/plotting using scatterplots, boxplots, histograms, etc
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to work on an existing dataset and perform basic statistical analysis on it
BREAK (15 min)
Linear Regression, Multivariate Regression, and ANOVA (90 min)
Hands-on Exercise (45 min)
  • Participants will be asked to perform regression on an existing dataset and generate ANOVA
Principal Component Analysis and Clustering Data (60 min)
Logistic Regression (30 min)
BREAK (15 min)
Factor Analysis (60 min)
Introduction to Time Series (60 min)
LUNCH (60 min)
Hands-On Exercise (60 min)
  • Participants will be asked to cluster a dataset
  • Participants will be be asked to do logistic regression on a dataset
BREAK (15 min)
ARIMA modeling (60 min)
Hands-on Exercise (45 min)
  • Participants will be asked to construct an ARIMA model on existing time series data

Our Students

"The course content covered most of the basics and went deeper into details when required. Good hands-on exercises with practical examples."

"Excellent trainer and with confidence I can handle all sorts of PM scenarios and can challenge your mindset. Very good customer service from KnowledgeHut."

"I learned much from this training session, the faculty had good knowledge of the subject matter and provided good learning examples."

"2days PMP training was very good, I got lot of inspiration from this training."

Shreerang Bhawalkar

Shreerang Bhawalkar

ADP Dealer Services
Milind Gawaskar

Milind Gawaskar

Design Managr at NEC
Jan Miko

Jan Miko

Senior Digital Manager
Ada Lee

Ada Lee

Marketing Director

Frequently Asked Questions

To attend this course, candidates must have prior programming knowledge and knowledge of RDMS.

  • Basic programming knowledge
  • Basic RDMS knowledge


Data mining and analytics comes with great potential to help companies, as they can extract hidden predictive data from large databases and use it to predict future trends and behaviors. Data mining tools are able to find answers to questions that were earlier considered too time consuming to resolve. This allows businesses to make proactive, data-driven decisions and enhance viable business opportunities.

Our expert facilitators will help you understand how to collect and refine massive amounts of information, and discover hidden value in your warehouse of data. You will increase your employment opportunities, as you can put yourself on the radar of corporate businesses looking for data experts.

On successful completion of the course, you will receive a Course Completion Certificate from KnowledgeHut with Credits (1 credit per hour of training).

Any registration cancelled within 48 hours of the initial registration will be refunded in FULL (please note that all cancellations will incur a 5% deduction in the refunded amount due to transactional costs applicable while refunding). Refunds will be processed within 30 days of receipt of written request for refund. Kindly go through our Refund Policy for more details: http://www.knowledgehut.com/refund

Please send in an email to support@knowledgehut.com, and we will answer any queries you may have!

To attend this course, candidates must have prior programming knowledge and knowledge of RDMS. This course is suggested for

  • Data Analysts
  • EAI specialists
  • Data Engineers
  • Data Scientists, among others.

other training

How We Can Help You

Course Details