Data Harvesting and R Essentials training
Rated 3.0/5 based on 180 customer reviews

Data Harvesting and R Essentials training

Give your business the competitive edge with effective data harvesting tools and techniques increase efficiency and improve results!

Contact Course Advisor schedules

Modes of Delivery


Our classroom training provides you the opportunity to interact with instructors and benefit from face-to-face instruction.

Online Classroom

Collaborative, enriching virtual sessions, led by world class instructors at time slots to suit your convenience.

Team/Corporate Training

Our Corporate training is carefully structured to help executives keep ahead of rapidly evolving business environments.


The need for data harvesting and mining is growing in a broad range of areas, ranging from banking to insurance, retail, telecom, medicine, research, and government. This course provides aspiring data engineers and data scientists with the knowledge and tools they need to do proper data ingestion and analytics using R. R is a free software environment for statistical computing and graphics that is growing in popularity, as it gives quick results and is well supported by a worldwide community of users and developers.

By scouring databases for hidden patterns, data mining experts can find predictive information that helps to make informed business decisions and enhance profitability. You will understand practical and efficient methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Participants will be introduced to data scraping, APIs, and parsing along with various topics in data analytics such as statistical analysis, plotting, linear regression/ANOVA, PCA (Principal Component Analysis), Clustering, Factor Analysis, and time series/ARIMA. An overview of data scraping and parsing in Python using BeautifulSoup is also covered.

On successful completion of the course, you will receive a Course Completion Certificate from KnowledgeHut.

What you will learn:
  • Use of rvest and BeautifulSoup for data scraping
  • Connecting to APIs and feeds
  • ETL, Data preparation, cleansing, and structuring in R
  • EDA (Exploratory Data Analysis) in R
  • Graphing in R
  • Linear regression/ANOVA
  • Time Series/ARIMA
You will also get:
  • Comprehensive, downloadable courseware
  • In-depth case studies for better retention
  • Course completion certificate
  • 1 credit per hour of learning

Key Features

5 days of quality interactive learning
Course completion certificate
Get deep insights into best practices in Data harvesting and analytics using R
Covers practical and efficient methods for applying R to extract data
Downloadable comprehensive courseware
Hands-on exercises to cement your learning


What is Data Harvesting? (30 min)
Introduction to ETL and Data Harvesting Tools (60 min)
  • Rvest, BeautifulSoup, tidyr, stringr, etc
  • Scraping static pages, APIs, feeds
  • JSON/XML extraction
BREAK (15 min)
Data extraction and structuring (60 min)
Data cleansing and preparation (60 min)
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to extract data from various online data sources, such as static web pages and APIs
  • Participants will be asked to structure that data in a tabular format
Data structures and String Manipulation in R (60 min)
Hands-on Exercise (45 min)
  • Participants will be asked to work with data structures in R and manipulate the data to get it into the right format
Curating datasets (60 min)
Where to find open data and publicly available datasets (60 min)
BREAK (15 min)
Input/Output in R and working with packages and libraries (60 min)
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to read in and manipulate different types of files, such as JSON, XML, CSV, TSV, etc.
  • Participants will be asked to download open datasets and perform basic manipulation on the data
BREAK (15 min)
Performing Data Transformations and working with Dates and Probabilities (60 min)
Hands-on Exercises (45 min)
  • Participants will be asked to convert between data structures in R and applying functions
  • Participants will be asked to perform Date manipulation on sample data
What is R? (30 min)
Introduction to R and RStudio (60 min)
BREAK (15 min)
Basics of working with R and Rstudio (60 min)
  • Printing output
  • Variables
  • Vectors
  • Functions
Data Structures in R (60 min)
  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Data Frames
  • Factors and Tables
LUNCH (60 min)
Hands-On Exercise (60 min)
  • Participants will be asked to work with basic data structures in R
BREAK (15 min)
Data Structures in R contd... (60 min)
  • R programming and functions
  • Math functions and statistics
  • Sorting, linear algebra, and set operations
Hands-on Exercise (45 min)
  • Participants will be asked to practice creating functions in R and using some of the built-in functions for statistics, sorting, and set operations
Data Cleansing and Dealing with Missing Data (30 min)
Introduction to EDA – Exploratory Data Analysis (60 min)
BREAK (15 min)
EDA and Statistical Analysis (60 min)
  • Data formatting and conversion
  • Graphing/plotting using scatterplots, boxplots, histograms, etc
LUNCH (60 min)
Hands-On Exercises (60 min)
  • Participants will be asked to work on an existing dataset and perform basic statistical analysis on it
BREAK (15 min)
Linear Regression, Multivariate Regression, and ANOVA (90 min)
Hands-on Exercise (45 min)
  • Participants will be asked to perform regression on an existing dataset and generate ANOVA
Principal Component Analysis and Clustering Data (60 min)
Logistic Regression (30 min)
BREAK (15 min)
Factor Analysis (60 min)
Introduction to Time Series (60 min)
LUNCH (60 min)
Hands-On Exercise (60 min)
  • Participants will be asked to cluster a dataset
  • Participants will be be asked to do logistic regression on a dataset
BREAK (15 min)
ARIMA modeling (60 min)
Hands-on Exercise (45 min)
  • Participants will be asked to construct an ARIMA model on existing time series data

Our Students See All

Very well conducted. Proactive management from the moderator, many exercises, hence quite practical, minimizing theoretical overload. Very pleased with the learning, for new starters as well as more experienced participants. Great experience sharing.

Attended workshop in February 2018

The experience over the two-day course was fantastic. It was not a dry explanation of the Scrum framework. Instead, a practical, example driven 2 days where we worked through all elements of the Scrum framework. I would have no hesitation in recommending Marco Mulder as a Scrum Master trainer or as a business Scrum coach within a commercial environment. His experience in this field was clear and impressive.

Attended workshop in April 2018

Knowledge Hut experience has been very good, this is my second registration with them directly and had total 3 registrations with them. The Trainer, Venue, Course material, Food and other logistics are well organized and managed. Special thanks to Lalit who takes extra effort to help and coordinate on everything. I am happy to take up other courses offered with Knowledge Hut and recommend others also to consider.

Attended workshop in April 2018

Marco is a brilliant Scrum coach. I had the pleasure to attend Certified Scrum Master training with Marco as a trainer. I was particularly impressed by Marco's hands on experience that sets him as a captivating knowledgeable trainer. Marco has a very unique approach as a trainer in which he actively engages all participants regardless of their background. And I still keep the really inspiring book on SCRUM he gave to me along with the planning poker cards we used to do task estimation! As a Scrum coach, Marco earns my highest recommendation.

Attended workshop in October 2017
Review image

Sebastian Walter

Expert Vice President at Bain & Company from Berlin, Germany
Review image

Tim Parks

Head of Engineering Systems & Services at Vanderbilt International (IRL) Ltd from Dublin, Ir
Review image

Santosh Setty

Business Analyst at IBM from Bangalore,India
Review image

Islam Shalaby

Software Engineer at HERE from Berlin, Germany

Frequently Asked Questions

To attend this course, candidates must have prior programming knowledge and knowledge of RDMS.

  • Basic programming knowledge
  • Basic RDMS knowledge

Data mining and analytics comes with great potential to help companies, as they can extract hidden predictive data from large databases and use it to predict future trends and behaviors. Data mining tools are able to find answers to questions that were earlier considered too time consuming to resolve. This allows businesses to make proactive, data-driven decisions and enhance viable business opportunities.

Our expert facilitators will help you understand how to collect and refine massive amounts of information, and discover hidden value in your warehouse of data. You will increase your employment opportunities, as you can put yourself on the radar of corporate businesses looking for data experts.

On successful completion of the course, you will receive a Course Completion Certificate from KnowledgeHut with Credits (1 credit per hour of training).

Any registration cancelled within 48 hours of the initial registration will be refunded in FULL (please note that all cancellations will incur a 5% deduction in the refunded amount due to transactional costs applicable while refunding). Refunds will be processed within 30 days of receipt of written request for refund. Kindly go through our Refund Policy for more details:

Please send in an email to, and we will answer any queries you may have!

To attend this course, candidates must have prior programming knowledge and knowledge of RDMS. This course is suggested for

  • Data Analysts
  • EAI specialists
  • Data Engineers
  • Data Scientists, among others.

other training

How We Can Help You

Course Details