
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python Tutorial
Data structure can be technically defined as the specific form of organizing and storing the data. R programming supports five basic types of data structures namely vector, matrix, list, data frame, and factor. In this tutorial, we will talk about each of these components to understand the data structures better in R.
In reality, R’s base data structure can be organized based on their dimensionality (1d, 2d, 3d, Nd) and if they are homogenous or not.
Homogeneous | Heterogeneous | |
|---|---|---|
1-D | Atomic Vector | List |
2-D | Matrix | Data Frame |
N-D | Array |
Given an object, the best way to understand what data structures it’s composed of is to use str(). str() is short for structure and it gives a compact, human-readable description of any R data structure.
One of the basic data structures in R is the vector. Vectors have two different flavors: atomic vectors and lists. They have three common properties:
They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.
NB: is.vector() does not test if an object is a vector. Instead, it returns TRUE only if the object is a vector with no attributes apart from names. One can use is.atomic(x) or is.list(x) to test if an object is actually a vector or not.
There are four basic types of atomic vectors that we will talk about in detail: logical, integer, double (often called numeric), and character. There are two rare types which we will skip for now: complex and raw.
Atomic vectors are usually created with c(), short for combine:
Examples:
var <- c(1.9, 2.0, 7.5)
var
#Result
[1] 1.9 2.0 7.5
# With the L suffix, you get an integer rather than a double
int_var <- c(2L, 8L, 100L)
int_var
#Result
[1] 2 8 100
# Use TRUE and FALSE (or T and F) to create logical vectors
logical_var <- c(TRUE, FALSE, T, F)
logical_var
#Result
[1] TRUE FALSE TRUE FALSE
chr_var <- c("example of","some strings")
chr_var
#Result
[1]"example of" "some strings"
Atomic vectors are always flat, even if you nest c()’s:
c(1, c(2.96, c(3.75, 9)))
#Result
[1] 1.00 2.96 3.75 9.00
Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type with NA_real_ (a double vector), NA_integer_ and NA_character_.
Given a vector, you can determine its type with typeof(), or check if it’s a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().
Examples:
int_var <- c(1.05L, 8L, 10L)
typeof(int_var)
#Result
[1] "double"
is.integer(int_var)
#Result
[1] FALSE
is.atomic(int_var)
#Result
[1] TRUE
is.double(int_var)
#Result
[1] TRUE
is.numeric(int_var)
#Result
[1] TRUE
All elements of an atomic vector must be of the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character.
For example, combining a character and an integer yields a character:
Examples:
str(c("a", 1L, 0.95))
#Result
chr [1:3] "a" "1" "0.95"
#When a logical vector is coerced to an integer or double,
#TRUE becomes 1 and FALSE becomes 0. This is very useful in conjunction
#with sum() and mean()
x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#Result
[1] 0 0 1
# Total number of TRUEs
sum(x)
#Result
[1] 1
mean(x)
#Result
[1] 0.3333333
Coercion can often happen automatically. Most mathematical functions (+, log, abs, etc.) will coerce to a double or integer, and most logical operations (&, |, any, etc) will coerce to a logical. One will usually get a warning message if the coercion might lose information. If confusion is likely, explicitly coerce with as.character(), as.double(), as.integer(), or as.logical().
Some key properties of Vectors:
Few Examples:
> v <- c(10, 20, 30)
> names(v) <- c("John", "Tracey", "Harry")
> print(v)
##John Tracey Harry
10 20 30
>v[“Tracey”]
## Tracey
20
Lists are quite different from atomic vectors as their elements can be of any type, including lists. One can construct lists by using list() instead of c():
Examples:
------Lists
x <- list(1:5, "a", c(TRUE, FALSE, T, F), c(2.9, 5.3))
str(x)
#Result
List of 4
$ : int [1:5] 1 2 3 4 5
$ : chr "a"
$ : logi [1:4] TRUE FALSE TRUE FALSE
$ : num [1:2] 2.9 5.3
x <- list(list(list(list())))
str(x)
#Result
List of 1
$ :List of 1
..$ :List of 1
.. ..$ : list()
is.recursive(x)
#Result
[1] True
Lists are sometimes expressed as recursive vectors, because a list may contain other lists as well. This is what makes them fundamentally different from atomic vectors.
c() will combine several lists into one. If given a combination of atomic vectors and lists, c() will coerce the vectors to lists before combining them. Compare the results of a list() and c():
Examples:
x <- list(list(1:9), c(3, 4))
y <- c(list(1, 2), c(3, 4))
str(x)
#Result
List of 2
$ :List of 1
..$ : int [1:9] 1 2 3 4 5 6 7 8 9
$ : num [1:2] 3 4
str(y)
#Result
List of 4
$ : num 1
$ : num 2
$ : num 3
$ : num 4
The typeof() a list is a list. You can test for a list with is.list() and coerce to a list with as.list(). You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().
Lists are basically used to create many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists:
Some key properties of Lists:
In R, every object has a mode, which indicates how it is stored in memory: as a number, as a character string, as a list of pointers to other objects, as a function, and so forth:
Object | Example | Mode |
|---|---|---|
Number | 2.171 | Numeric |
Vectors of Numbers | c(2.7.182, 3.1415) | Numeric |
Character String | “John” | Char |
Vectors of Character Strings | c("John", "Tracey", "Harry") | Char |
Factor | factor(c("NY", "CA", "IL")) | Numeric |
List | list("John", "Tracey", "Harry") | list |
Data Frame | data.frame(x=1:3, y=c("NY", "CA", "IL")) | List |
Function | Function |
The mode() functions give us this information
Example:
>mode(2.171)
#[1] numeric
(Please refer to the write up attached on Array and Matrices)
A factor looks like a vector, but it has special properties. R keeps track of the unique values in a vector, and each unique value is called a level of the associated factor. R uses a compact representation for factors, which makes them efficient for storage in data frames. In other programming languages, a factor would be represented by a vector of enumerated values. In simple terms: “A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class, “factor”, which makes them behave differently from regular integer vectors, and the levels, which defines the set of allowed values.”
There are two key uses for factors:
Examples:
> x <- factor(c("a", "b", "c", "d"))
>x
##Result
## [1] a b c d
## Levels: a b c d
>class(x)
#Result
#[1] “factor”
>levels(x)
#Result
##[1] "a" "b" “c” “d”
# You can't use values that are not in the levels
x[2] <- "e"
#Result
## Warning in `[<-.factor`(`*tmp*`, 2, value = "e"): invalid factor level, NA
## generated
# NB: you can't combine factors
>c(factor("a"), factor("b"))
##Result
## [1] 1 1
Factors are quite useful when you know the possible values a variable may take, even if you don’t see all values in a given dataset. Using a factor instead of a character vector makes it obvious when some groups contain no observations:
gen_char <- c("m", "m", "f")
gen_factor <- factor(gen_char, levels = c("m", "f"))
table(gen_char)
#Result
## gen_char
## f m
## 1 2
table(gen_factor)
##Result
#gen_factor
# m f
# 2 1
Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way . or -. To remedy the situation, coerce the vector from a factor to a character vector, and then from a character to a double vector. (Be sure to check for missing values after this process.) Of course, a much better plan is to discover what caused the problem in the first place and fix that; using the na.strings argument to read.csv() is often a good place to start.
A data frame is a very powerful and flexible data structure. Most serious R applications involve data frames. A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list. This means that a data frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length () of a data frame is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.
A data frame is a tabular (rectangular) data structure, which means that it has rows and columns. It is not implemented by a matrix, however. Rather, a data frame is a list:
Few important points to remember when you are dealing with a data frame:
Because a data frame is both a list and a rectangular structure, R provides two different paradigms for accessing its contents:
Examples:
#Create a data frame
df <- data.frame(x = 1:5, y = c("a", "b", "c", “d”, ”e”))
str(df)
#Result
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
One key point to remember while working with data frame is that data.frame() by default turns strings into factors. In that case , use stringsAsFactors = FALSE to suppress this behaviour:
df <- data.frame(
x = 1:5,
y = c("a", "b", "c" ,”d” , “e”),
stringsAsFactors = FALSE)
str(df)
#Result
##'data.frame': 5 obs. of 2 variables:
#$ x: int 1 2 3 4 5
#$ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
>typeof(df)
# [1] “list”
>cbind(df, data.frame( z = 5:1))
#Result
x y z
1 1 a 5
2 2 b 4
3 3 c 3
4 4 d 2
5 5 e 1
> rbind(df, data.frame(x = 10, y = "z"))
#Result
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 10 z
When combining column-wise, the number of rows must match, but row names are ignored. When combining row-wise, both the number and names of columns must match. Use plyr::rbind.fill() to combine data frames that don’t have the same columns.
It’s a common mistake to try and create a data frame by cbind() - ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly:
>correct_arg <- data.frame(a = 1:2, b = c("a", "b"),
stringsAsFactors = FALSE)
str(correct_arg)
#Result
'data.frame': 2 obs. of 2 variables:
$ a: int 1 2
$ b: chr "a" "b"
It’s also quite possible to have a column of a data frame that’s a matrix or array, as long as the number of rows matches the data frame:
dfm <- data.frame(x = 1:5, y = I(matrix(1:25, nrow = 5)))
str(dfm)
#Result
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y: 'AsIs' int [1:5, 1:5] 1 2 3 4 5 6 7 8 9 10 ...
> dfm[5, "y"]
#Result
[,1] [,2] [,3] [,4] [,5]
[1,] 5 10 15 20 25
We need to take extra care with the list and array columns: many functions that work with data frames assume that all columns are atomic vectors.
Hope you enjoyed this tutorial which discusses in detail about various data structures in R and now the next step would be to play around various aspects of each of these.