All Courses

Data Structures in R

Updated on Aug 29, 2025

8,406 Views

Table of Content

data structures

Data Structures

Data structure can be technically defined as the specific form of organizing and storing the data. R programming supports five basic types of data structures namely vector, matrix, list, data frame, and factor. In this tutorial, we will talk about each of these components to understand the data structures better in R.

In reality, R’s base data structure can be organized based on their dimensionality (1d, 2d, 3d, Nd) and if they are homogenous or not.

	Homogeneous	Heterogeneous
1-D	Atomic Vector	List
2-D	Matrix	Data Frame
N-D	Array

Given an object, the best way to understand what data structures it’s composed of is to use str(). str() is short for structure and it gives a compact, human-readable description of any R data structure.

Vectors

One of the basic data structures in R is the vector. Vectors have two different flavors: atomic vectors and lists. They have three common properties:

Type – Describes what it is (typeof())
Length – Tells how many elements it contains (length())
Attributes – Gives us information about additional arbitrary metadata (attributes())

They differ in the types of their elements: all elements of an atomic vector must be the same type, whereas the elements of a list can have different types.

NB: is.vector() does not test if an object is a vector. Instead, it returns TRUE only if the object is a vector with no attributes apart from names. One can use is.atomic(x) or is.list(x) to test if an object is actually a vector or not.

Atomic Vectors

There are four basic types of atomic vectors that we will talk about in detail: logical, integer, double (often called numeric), and character. There are two rare types which we will skip for now: complex and raw.

Atomic vectors are usually created with c(), short for combine:

Examples:

var <- c(1.9, 2.0, 7.5)
var
#Result
[1] 1.9 2.0 7.5
# With the L suffix, you get an integer rather than a double
int_var <- c(2L, 8L, 100L)
int_var
#Result
[1]   2 8 100
# Use TRUE and FALSE (or T and F) to create logical vectors
logical_var <- c(TRUE, FALSE, T, F)
logical_var
#Result
[1]  TRUE FALSE  TRUE FALSE
chr_var <- c("example of","some strings")
chr_var
#Result
[1]"example of" "some strings"

Atomic vectors are always flat, even if you nest c()’s:

c(1, c(2.96, c(3.75, 9)))
#Result
[1] 1.00 2.96 3.75 9.00

Missing values are specified with NA, which is a logical vector of length 1. NA will always be coerced to the correct type if used inside c(), or you can create NAs of a specific type with NA_real_ (a double vector), NA_integer_ and NA_character_.

Types and Test

Given a vector, you can determine its type with typeof(), or check if it’s a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().

Examples:

int_var <- c(1.05L, 8L, 10L)
typeof(int_var)
#Result
[1] "double"
is.integer(int_var)
#Result
[1] FALSE
is.atomic(int_var)
#Result
[1] TRUE
is.double(int_var)
#Result
[1] TRUE
is.numeric(int_var)
#Result
[1] TRUE

Coercion

All elements of an atomic vector must be of the same type, so when you attempt to combine different types they will be coerced to the most flexible type. Types from least to most flexible are: logical, integer, double, and character.

For example, combining a character and an integer yields a character:

Examples:

str(c("a", 1L, 0.95))
#Result
chr [1:3] "a" "1" "0.95"
#When a logical vector is coerced to an integer or double,
#TRUE becomes 1 and FALSE becomes 0. This is very useful in conjunction
#with sum() and mean()
x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#Result
[1] 0 0 1
# Total number of TRUEs
sum(x)
#Result
[1] 1
mean(x)
#Result
[1] 0.3333333

Coercion can often happen automatically. Most mathematical functions (+, log, abs, etc.) will coerce to a double or integer, and most logical operations (&, |, any, etc) will coerce to a logical. One will usually get a warning message if the coercion might lose information. If confusion is likely, explicitly coerce with as.character(), as.double(), as.integer(), or as.logical().

Some key properties of Vectors:

Vectors are homogeneous
Vectors can be indexed by positions
Vectors can be indexed by multiple positions
Vector elements can have names
If vector elements have names then you can select them by name

Few Examples:

> v <- c(10, 20, 30)
> names(v) <- c("John", "Tracey", "Harry") 
> print(v)
##John Tracey Harry 

10 20 30

>v[“Tracey”]
## Tracey
20

Lists

Lists are quite different from atomic vectors as their elements can be of any type, including lists. One can construct lists by using list() instead of c():

Examples:

------Lists
x <- list(1:5, "a", c(TRUE, FALSE, T, F), c(2.9, 5.3))
str(x)

#Result
List of 4
$ : int [1:5] 1 2 3 4 5
$ : chr "a"
$ : logi [1:4] TRUE FALSE TRUE FALSE
$ : num [1:2] 2.9 5.3
x <- list(list(list(list())))
str(x)
#Result
List of 1
$ :List of 1
..$ :List of 1
.. ..$ : list()
is.recursive(x)
#Result
[1] True

Lists are sometimes expressed as recursive vectors, because a list may contain other lists as well. This is what makes them fundamentally different from atomic vectors.

c() will combine several lists into one. If given a combination of atomic vectors and lists, c() will coerce the vectors to lists before combining them. Compare the results of a list() and c():

Examples:

x <- list(list(1:9), c(3, 4))
y <- c(list(1, 2), c(3, 4))
str(x)
#Result
List of 2
$ :List of 1
..$ : int [1:9] 1 2 3 4 5 6 7 8 9
$ : num [1:2] 3 4
str(y)
#Result
List of 4
$ : num 1
$ : num 2
$ : num 3
$ : num 4

The typeof() a list is a list. You can test for a list with is.list() and coerce to a list with as.list(). You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().

Lists are basically used to create many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists:

Some key properties of Lists:

Lists are heterogeneous
Lists can be indexed by positions
Lists allow you to extract sub-lists (For example list[c(2,3)] is a sub-list of 1st that consists of the 2nd and 3rd elements
List elements can have names

Mode and Physical Type

In R, every object has a mode, which indicates how it is stored in memory: as a number, as a character string, as a list of pointers to other objects, as a function, and so forth:

Object	Example	Mode
Number	2.171	Numeric
Vectors of Numbers	c(2.7.182, 3.1415)	Numeric
Character String	“John”	Char
Vectors of Character Strings	c("John", "Tracey", "Harry")	Char
Factor	factor(c("NY", "CA", "IL"))	Numeric
List	list("John", "Tracey", "Harry")	list
Data Frame	data.frame(x=1:3, y=c("NY", "CA", "IL"))	List
Function	print	Function

The mode() functions give us this information

Example:

>mode(2.171)

#[1] numeric

Array and Matrices

(Please refer to the write up attached on Array and Matrices)

Factors

A factor looks like a vector, but it has special properties. R keeps track of the unique values in a vector, and each unique value is called a level of the associated factor. R uses a compact representation for factors, which makes them efficient for storage in data frames. In other programming languages, a factor would be represented by a vector of enumerated values. In simple terms: “A factor is a vector that can contain only predefined values, and is used to store categorical data. Factors are built on top of integer vectors using two attributes: the class, “factor”, which makes them behave differently from regular integer vectors, and the levels, which defines the set of allowed values.”

There are two key uses for factors:

Categorical Variables: A factor can represent a categorical variable. Categorical variables are used in contingency tables, linear regression, analysis of variance (ANOVA), logistic regression, and many other areas.
Groupings: This is a technique for labeling or tagging your data items according to their group.

Examples:

> x <- factor(c("a", "b", "c", "d"))
>x
##Result
## [1] a b c d
## Levels: a b c d
>class(x)
#Result
#[1] “factor”
>levels(x)
#Result
##[1] "a" "b" “c” “d”
# You can't use values that are not in the levels
x[2] <- "e"
#Result

## Warning in `[<-.factor`(`*tmp*`, 2, value = "e"): invalid factor level, NA
## generated
# NB: you can't combine factors
>c(factor("a"), factor("b"))
##Result
## [1] 1 1

Factors are quite useful when you know the possible values a variable may take, even if you don’t see all values in a given dataset. Using a factor instead of a character vector makes it obvious when some groups contain no observations:

gen_char <- c("m", "m", "f")
gen_factor <- factor(gen_char, levels = c("m", "f"))
table(gen_char)
#Result
## gen_char
## f m 
## 1 2
table(gen_factor)
##Result
#gen_factor
# m f
# 2 1

Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way . or -. To remedy the situation, coerce the vector from a factor to a character vector, and then from a character to a double vector. (Be sure to check for missing values after this process.) Of course, a much better plan is to discover what caused the problem in the first place and fix that; using the na.strings argument to read.csv() is often a good place to start.

Data Frame

A data frame is a very powerful and flexible data structure. Most serious R applications involve data frames. A data frame is the most common way of storing data in R, and if used systematically makes data analysis easier. Under the hood, a data frame is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list. This means that a data frame has names(), colnames(), and rownames(), although names() and colnames() are the same thing. The length () of a data frame is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.

A data frame is a tabular (rectangular) data structure, which means that it has rows and columns. It is not implemented by a matrix, however. Rather, a data frame is a list:

Few important points to remember when you are dealing with a data frame:

A data frame can be built from a mixture of vectors, factors, and matrices. The columns of the matrices become columns in the data frame. The number of rows in each matrix must match the length of the vectors and factors. In other words, all elements of a data frame must have the same height.
The vectors and factors must all have the same length; in other words, all columns must have the same height.
The equal-height columns give a rectangular shape to the data frame.
The columns must have names

Because a data frame is both a list and a rectangular structure, R provides two different paradigms for accessing its contents:

You can use list operators to extract columns from a data frame, such as df[i], df[[i]], or df$name.
One can use matrix like notations like df[I,j], df[i,] or df[,j]

Examples:

#Create a data frame

df <- data.frame(x = 1:5, y = c("a", "b", "c", “d”, ”e”))
str(df)
#Result
'data.frame': 5 obs. of  2 variables:
 $ x: int  1 2 3 4 5
 $ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

One key point to remember while working with data frame is that data.frame() by default turns strings into factors. In that case , use stringsAsFactors = FALSE to suppress this behaviour:

df <- data.frame(
  x = 1:5,
  y = c("a", "b", "c" ,”d” , “e”),
  stringsAsFactors = FALSE)
str(df)
#Result
##'data.frame': 5 obs. of  2 variables:
 #$ x: int  1 2 3 4 5
 #$ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
>typeof(df)
# [1] “list”

Combining data frame

>cbind(df, data.frame( z = 5:1))
#Result
  x y z
1 1 a 5
2 2 b 4
3 3 c 3
4 4 d 2
5 5 e 1
> rbind(df, data.frame(x = 10, y = "z"))
#Result
   x y
1  1 a
2  2 b
3  3 c
4  4 d
5  5 e
6 10 z

When combining column-wise, the number of rows must match, but row names are ignored. When combining row-wise, both the number and names of columns must match. Use plyr::rbind.fill() to combine data frames that don’t have the same columns.

It’s a common mistake to try and create a data frame by cbind() - ing vectors together. This doesn’t work because cbind() will create a matrix unless one of the arguments is already a data frame. Instead use data.frame() directly:

>correct_arg <- data.frame(a = 1:2, b = c("a", "b"),
  stringsAsFactors = FALSE)
str(correct_arg)
#Result
'data.frame': 2 obs. of  2 variables:
 $ a: int  1 2
 $ b: chr  "a" "b"

It’s also quite possible to have a column of a data frame that’s a matrix or array, as long as the number of rows matches the data frame:

dfm <- data.frame(x = 1:5, y = I(matrix(1:25, nrow = 5)))
str(dfm)
#Result
'data.frame': 5 obs. of  2 variables:
 $ x: int  1 2 3 4 5
 $ y: 'AsIs' int [1:5, 1:5] 1 2 3 4 5 6 7 8 9 10 ...
> dfm[5, "y"]
#Result
     [,1] [,2] [,3] [,4] [,5]
[1,]    5 10 15   20 25

We need to take extra care with the list and array columns: many functions that work with data frames assume that all columns are atomic vectors.

Hope you enjoyed this tutorial which discusses in detail about various data structures in R and now the next step would be to play around various aspects of each of these.

References

R- Cookbook by Paul Teetor
Advanced R by Hadley Wickham
Learning R by Richard Cotton

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

10% OFF

Coupon Code "GIFT10"

Coupon Expires 22/12

Copy

Get your free handbook for CSM!!

Recommended Courses