All Courses

Data Visualization in R

Updated on Aug 29, 2025

8,430 Views

Table of Content

visualization: overview

Visualization: Overview

The objective of this tutorial is to share with you a general overview of the plotting environments in R and of the most efficient way of coding your graphs in it. We will talk about the most important Integrated Development Environment (IDE) available for R as well as the most relevant packages available for plotting your data.

Four Graphics Systems in R

There are currently 4 graphical systems available in R.

The base graphics system, written by Ross Ihaka, is included in every R installation.

The ‘grid’ graphics system, developed by Paul Murrell (2011), is implemented through the ‘grid’ package in R. ‘grid’ graphics provides a lower-level alternative to the standard graphics system. One key point to note here is that ‘grid’ graphics offers a lot of flexibility to the software developers, but lacks statistical graphics or complete plot.

The lattice package, developed by Deepayan Sarkar (2008), implements trellis graphs, as outlined by Cleveland (1985, 1993). So, trellis graphs display the distribution of a variable or the relationship between variables, separately for each level of one or more other variables. Built using the grid package, the lattice package provides a robust framework to visualizing multivariate data and a comprehensive alternative system for creating statistical graphics in R. There are many other packages like (effects, flexclust, Hmisc, mice and odfWeave) that use functions in the ‘lattice’ package to produce graphs.

Finally, the ggplot2 package, developed by Hadley Wickham (2009a), provides a system for creating graphs based on the grammar of graphics described by Wilkinson (2005) and expanded by Wickham (2009b). The intention of the ggplot2 package is to provide a comprehensive, grammar-based system for generating graphs in a coherent manner, allowing users to create new and innovative data visualizations. ggplot2 is one of the most celebrated packages in the realm of data visualisation because of the above-stated functionalities.

The lattice and ggplot2 packages overlap in functionality but approach the creation of graphs differently. Analysts tend to rely on one package or the other when plotting multivariate data. Given its power and popularity, the remainder of this tutorial will focus on ggplot2.

Let’s explore the ‘graphics’ package with some examples:

To generate the plot generated using graphics, use the following code:

plot(age~circumference, data=Orange)

The same graph can be generated using ggplot2 as well:

qplot(circumference, age, data=Orange)

qplot

qplot in r

Generating box Plot using graphics and ggplot2:

boxplot(circumference~Tree, data=Orange)

To generate the plot using ggplot2, use the following code:

qplot(Tree, circumference, data=Orange, geom="boxplot")

qplot(circumference, age, data=Orange)

Box Plot using graphics and ggplot2

‘ggplot2’ – An Introduction

As highlighted earlier, “The ggplot2 package basically implements a system for creating graphics in R based on a very comprehensive and coherent grammar.” In ggplot2 , the graphs are created by combining together functions using the “+” sign. Each function contributes to modify the plot created up to that point.

Let’s have a quick look at the following example:

ggplot(data=mtcars, aes(x=wt, y=mpg)) +

geom_point(pch=20, color="blue", size=2) +

geom_smooth(method="lm", color="purple", linetype=3) +

labs(title="Automobile Data", x="Weight", y="Mls Per Gallon")

ggplot2 in R-Programming

Let’s try to understand what ggplot does when it generates the graphics.

The ggplot() function first initializes the plot and specifies the data source (mtcars – in our example) and variables (wt, mpg) to be used. The options in the aes() function specify what role each variable will play. (aes stands for aesthetics, or how information is represented visually.) Here, the wt values are mapped along the x-axis, and mpg values are mapped along the y-axis. The ggplot() function here sets up the graph but produces no visual output on its own. Geometric objects (called geoms for short), which include points, lines, bars, box plots, and shaded regions, are added to the graph using one or more geom functions. In this example, the geom_point() function draws points on the graph, creating a scatter plot. The labs() function is optional and used for adding annotations (axis labels and a title).

Options to geom_point() set the point shape to circles (pch=20), double the points’ size (size=3), and render them in purple (color="purple"). The geom_smooth() function adds a “smoothed” line. Here a linear fit is requested (method="lm") and a purple dotted line (linetype=3) of size=2 is created. By default, the line includes 95% confidence intervals (the darker band).

The ggplot2 package provides methods for grouping and faceting. Grouping displays two or more groups of observations in a single plot. Groups are usually differentiated by color, shape, or shading. Faceting on the other hand displays groups of observations in separate, side-by-side plots. The ggplot2 package uses factors when they define groups or facets.

Plot Types in geoms

As the ggplot() function specifies the data source and variables to be plotted, the geom functions, on the other hand, decides how these variables are to be visually represented (using points, bars, lines, and shaded regions). Currently, 37 geoms are available. The following tables share the list of the most popular ones:

Function	Adds	Options
geom_bar()	Bar Chart	color, fill, alpha
geom_boxplot()	Box Plot	color, fill, alpha, notch, width
geom_density()	Density Plot	color, fill, alpha, linetype
geom_histogram()	Histogram	color, fill, alpha, linetype, binwidth
geom_jitter()	Jittered Points	color, size, alpha, shape
geom_line()	Line Graph	colorvalpha, linetype, size
geom_smooth()	Fitted Line	method, formula, color, fill, linetype, size
geom_text()	Text Annotations	Many; see the help for this function
geom_violin()	Violin Plot	color, fill, alpha, linetype
geom_point()	Scatter Plot	color, alpha, shape, size

Let’s look at one such example, which explores various options as stated above:

data(singer, package="lattice")

ggplot(singer, aes(x=voice.part, y=height)) +

geom_violin(fill="lightblue") +

geom_boxplot(fill="lightgreen", width=.1)

geom_boxplot in R-Programming

The above code snippet shows how you can combine two different graph types (box plot and violin plot) two create a new one. The box plots show the 25th, 50th, and 75th percentile scores for each voice part in the singer data frame, along with any outliers. The violin plots provide more visual cues as to the distribution of scores over the range of heights for each voice part.

Grouping

In order to develop a better understanding of the data, it is often required to plot two or more groups of observations together in the same graph. Grouping is accomplished in ggplot2 graphs by associating one or more grouping variables with visual characteristics such as shape, color, fill, size, and line type.

Let’s use grouping functionality to explore the Salaries dataset. The data frame contains information on the salaries of university professors collected during the period 2008–2009 (academic year). Variables include rank (AsstProf, AssocProf, Prof), sex (Female, Male), yrs.since.phd (years since Ph.D.), yrs.service (years of service), and salary (nine-month salary in dollars) etc.

require(carData)

data(Salaries, package="carData")

library(ggplot2)

ggplot(data=Salaries, aes(x=salary, fill=rank)) +geom_density(alpha=.7)

Grouping graph in R-Programming

One can also visualize the number of professors by their rank and some other attributes (sex) using a grouped bar chart. For example:

ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="stack") + labs(title='arrangement="stack"')

arrangement=stack in R-Programming

Alternatively you can use other types of position values (position=’dodge’ or position=’fill’)

For example:

ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="fill") + labs(y = "Proportion",title='arangement="fill"')

arangement=fill in R-Programming

Each of the plots emphasizes different aspects of the data. These graphs reveal different insights about the data like there are more female full professors than a female assistant or associate professors or the 2nd chart shows that the relative percentage of women to men in the full-professor group is less than in the other two groups, even though the total number of women is greater.

Faceting

Sometimes it becomes easier to demonstrate the relationships if the groups appear in side-by-side graphs (called faceted graphs in ggplot2). You can create faceted graphs by using facet_wrap() and facet_grid() functions.

The table below shows a list of the facet functions in ggplot2:

Syntax	Results
facet_wrap(~var, ncol=n)	Separate plots for each level of var arranged into n columns
facet_wrap(~var, nrow=n)	Separate plots for each level of var arranged into n rows
facet_grid(rowvar~.)	Separate plots for each level of rowvar, arranged as a single column
facet_grid(rowvar~.)	Separate plots for each level of rowvar, arranged as a single column

Let’s look at one example:

data(singer, package="lattice")

library(ggplot2)

ggplot(data=singer, aes(x=height)) +

geom_histogram() +

facet_wrap(~voice.part, nrow=4)

Faceting in R-Programming

The resulting plot displays the distribution of singer heights by voice part. Separating the eight distributions into their own small, side-by-side plots makes them easier to compare.

Another example:

data(singer, package="lattice")

library(ggplot2)

ggplot(data=singer, aes(x=height, fill=voice.part)) +

geom_density() +

facet_grid(voice.part~.)

Facet in R-Programming

This chart is displaying the height distribution of choral members in the singer dataset separately for each voice part, using kernel-density plots arranged horizontally.

Let’s look at a few other examples of the application of ggplot2:

set.seed(321) #for reproducibility

x <-data.frame(x=rnorm(10000)) #Generating a random data points

ggplot(data=x, aes(x=x)) +

geom_histogram(aes(y=..density..,fill=..density..)) +

geom_density()

geom in R-Programming

In this example, we just created a simple normal distribution with default values (0 as the mean and 1 as the standard deviation) using the rnorm function, and then we used them to create a histogram of such a distribution. We can then map the filling color to the number of observations in each bin available in the new count variable created by the stat_bin() function. Just remember that, in order to avoid errors because of variables with the same name in the original dataset, the newly created variables must be surrounded by .., so in our example, we would need to use ..count...

Applying this method to aesthetic mapping, we use a continuous scale of color tones to map the observation count. Since the scale is continuous, we cannot apply this method on geometries with only one continuous plot area, such as geom_density(), which generate a smooth estimate of the kernel density. On the other side, you can apply it to the histogram representing the density of observations. One can, in fact, use the new variable density created by the stat_bin() function to represent as a y value for the density of observations present in each bin and at the same time use a filling color proportional to the observations. The above code snippet does exactly the same thing.

ggplot(data=x, aes(x=x)) + geom_histogram(aes(alpha=..count..))

ggplot in R-Programming

This is a histogram of a normally distributed random variable representing the data count with transparency value (alpha) mapped to the data count.

ggplot(data=x, aes(x=x)) +

geom_histogram(aes(alpha=..count..,fill=..count..))

ggplot in R-Programming

This is exactly the same plot as the previous one but also includes a filling mapping to the data count.

We can also add text and references line for a graph:

Example code:

ggplot(x, aes(x=x)) +

geom_histogram(alpha=0.7) +

geom_vline(aes(xintercept=median(x)), color="green", linetype="dashed",

size=1) +

geom_hline(aes(yintercept=40), col="red", linetype="solid") +

geom_text(aes(x=median(x),y=90),label="Median",hjust=1) +

geom_text(aes(x=median(x),y=90,label=round(mean(x),

digit=3)),hjust=-0.7)

geom in R-Programming

Adding a Smoothed Line

The ggplot2 package offers a wide range of functions for calculating various statistical summaries that can be added to graphs. These include functions for binning data and calculating densities, contours, and quantiles. This section looks at methods for adding smoothed lines (linear, nonlinear, and nonparametric) to scatter plots.

For example, You can use the geom_smooth() function to add a variety of smoothed lines and confidence regions. An example of a linear regression with confidence limits was given in the following images:

data(Salaries, package="carData")

library(ggplot2)

ggplot(data=Salaries, aes(x=yrs.since.phd, y=salary)) +

geom_smooth() + geom_point()

data in R-Programming

The plot suggests that the relationship between experience and salary isn’t linear, at least when considering faculty who graduated many years ago. As an alternative approach, next, let’s fit a quadratic polynomial regression (one bend) separately by gender:

ggplot(data=Salaries, aes(x=yrs.since.phd, y=salary,

linetype=sex, shape=sex, color=sex)) +

geom_smooth(method=lm, formula=y~poly(x,2),

se=TRUE, size=1) +

geom_point(size=1)

The confidence limits are also displayed to simplify the graph (se=TRUE). Genders are differentiated by color, symbol shape, and line type.

geom in R-Programming

Apart from these, there are many other functionalities you can invoke to make the graphs look richer like (axes, legends, scales, themes etc.)

The number of functionalities is quite huge for ggplot2. It is a very rich package with way too many options to play around. But the encouraging part is that wealth of material is available to help you out. A list of all ggplot2 functions, along with examples, can be found at http://docs.ggplot2.org.

In this tutorial, we tried to cover major aspects related to R-graphics with a key focus on the ggplot2.R

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Get your free handbook for CSM!!

Recommended Courses