top

Search

R Programming Tutorial

Reading Datafiles in RIn this tutorial, we will explore various options for loading and saving data into R from plain-text files, R files, and excel spreadsheets. It will also demonstrate the R packages that you can use to load data from databases and other common programs, like SAS and MATLAB.Data sets in Base RR comes with many data sets pre-loaded in the datasets package, which comes with base R. One can list the pre-loaded datasets as well as the short descriptions of each by running:> help(package=”dataset name”)You can load/source your own dataset in R by first setting you work directory:> setwd(“/location or directory path/”) #This is to set the working directory at some specified location.To determine which directory R is using as your working directory, run:> getwd()[1] "/Users/ssg"you can place the data files straight into the folder that is the working directory, or you can move your working directory to where your data files are. You can move your working directory to any folder on your computer with the function setwd. Just give setwd the file path to your new working directory.If the file path does not begin with your root directory, R will assume that it begins at your current working directory.You can also change your working directory by clicking on Session > Set Working Directory > Choose Directory in the RStudio menu bar. The Windows and Mac GUIs have similar options. If you start R from a UNIX command line (as on Linux machines), the working directory will be whichever directory you were in when you called R.You can see what files are in your working directory with list.files(). If you see the file that you would like to open in your working directory, then you are ready to proceed. How you open files in your working directory will depend on which type of file you would like to open.Plain-Text FilesPlain-text files are one of the most common ways to save data. They are very simple and can be read by many different computer programs—even the most basic text editors. For this reason, public data often comes as plain-text files.You can use load.table() functionalities to read/load a plain-test file.The following table shares a quick overview of the R’s read functions in which you can always overwrite any of the default arguments as necessary for your data type:FunctionsDefaultsUseread.tablesep = " “, header = FALSEGeneral read functionread.csvsep = “,”, header = TRUETo read the csv filesread.delimsep = “\t”, header = TRUETo read the tab-delimited fileread.csv2sep = “;”, header = TRUE,dec = “,”To read CSV file with European decimal formatread.delim2sep = “\t”, header = TRUE,dec = “,”To read the tab-delimited files with European decimal format.There is one specific type of plain-text file that defies the pattern by using its layout to separate data cells. Each row is placed in its own line (as with other plain-text files), and then each column begins at a specific number of characters from the left-hand side of the document. To achieve this, an arbitrary number of character spaces is added to the end of each entry to correctly position the next entry. These documents are known as fixed-width files and usually, end with the extension “.fwf”. R has its own function to read such file – read.fwf.HTML LinksThere are many data files which are available on the Internet at their own web address. If you are connected to the Internet, you can easily open these files straight into R with read.table, read.csv, etc. You need to pass a web address into the file name argument for any of R’s data-reading functions. As a result, you could read in the iris data set from a web address like http://.../poker.csv with:poker <- read.csv("http://.../iris.csv")Saving the plain-text fileOnce your data is in R, you can save it to any file format that R supports. If you’d like to save it as a plain-text file, you can use the ‘write’ family of functions. R saves data sets to plain-text files with the write family of functions:File FormatFunction and Syntax.csvwrite.csv(r_object, file = filepath, row.names = FALSE).csv (with european decimal notation)write.csv2(r_object, file = filepath, row.names = FALSE)tab delimitedwrite.table(r_object, file = filepath, sep = "\t", row.names=FALSE)The first argument of each function is the R object that contains your data set. The file argument is the file name (including extension) that you wish to give the saved data. By default, each function will save your data into your working directory.Compressing fileSometimes it becomes quite important to save a big output file and share with others. Compression is an effective way to deal in such a scenario.To compress a plain-text file, surround the file name or file path with the function bzfile, gzfile, or xzfile. For example:> write.csv(poker, file = bzfile("data/poker.csv.bz2"), row.names = FALSE)There are 2 more functions that will also compress the file with two different types of compression format.FunctionCompression Typebzfilebzip2gzfilegnu zip(gzip)xzfilexz compressionR FilesR provides two file formats of its own for storing data, .RDS and .RData. RDS files can store a single R object, and RData files can store multiple R objects.You can open an RDS file with readRDS. For example, if the iris data was saved as iris.RDS, you could open it with:poker <- readRDS("iris.RDS")Opening RData files is even easier. Simply run the function load with the file:load("file.RData")There’s no need to assign the output to an object. The R objects in your RData file will be loaded into your R session with their original names. RData files can contain multiple R objects, so loading one may read in multiple objects. load doesn’t tell you how many objects it is reading in, nor what their names are, so it pays to know a little about the RData file before you load it.Saving R filesTo save data as an RData object, one can use the save function. To save data as an RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.For example, if you have three R objects, a, b, and c, you could save them all in the same RData file and then reload them in another R session:a<-5 b<-90 c<-100 save(a, b, c, file = "test.RData") load("test.RData")XLConnectThere are quite a few numbers of packages to help you read Excel files directly into R. Unfortunately, many of these packages do not work on all operating systems. Others have been made out of date by the .xlsx file format. One package that does work on all file systems (and gets good reviews) is the XLConnect package. To use it, you’ll need to install and load the package:    install.packages("XLConnect")     library(XLConnect)XLConnect relies on Java to be platform-independent. So when you first open XLConnect, RStudio may ask to download a Java Runtime Environment if you do not already have one.Loading files from other programsA good number of functions are available in R, which attempt to read the file types of other data- analysis programs:File FormatFunctionLibraryMATLABreadMatR.matlabminitabread.mtpforeignSASread.ssdforeignSAS (XPORT format)read.xportforeignSPSSread.spssForeignStataread.dtaForeignSystatread.systatforeignERSI ArcGISread.shapefileshapefilesConnecting to DatabasesOne great aspect of R is that you can also use R to connect to a database and read in data. How you do this will depend on the database management system that you use. Working with a database will usually require experience that goes beyond the skill set of a typical R user. However, if you are interested in doing this, the best place to start is by downloading these R packages and reading their documentation.Use the RODBC package to connect to databases through an ODBC connection.Use the DBI package to connect to databases through individual drivers. The DBI package provides a common syntax for working with different databases. You will have to download a database-specific package to use in conjunction with DBI. These packages provide the API for the native drivers of different database programs. For MySQL use RMySQL, for SQLite use RSQLite, for Oracle use ROracle, for PostgreSQL use RPostgreSQL, and for databases that use drivers based on the Java Database Connectivity (JDBC) API use RJDBC. Once you have loaded the appropriate driver package, you can use the commands provided by DBI to access your database.Hope you enjoyed this tutorial and in our next tutorial, we will discuss how to generate visualization using R.
Rated 5/5 based on 19 customer reviews
logo

R Programming Tutorial

Working with Data in R

Reading Datafiles in R

In this tutorial, we will explore various options for loading and saving data into R from plain-text files, R files, and excel spreadsheets. It will also demonstrate the R packages that you can use to load data from databases and other common programs, like SAS and MATLAB.

Data sets in Base R

R comes with many data sets pre-loaded in the datasets package, which comes with base R. One can list the pre-loaded datasets as well as the short descriptions of each by running:

> help(package=”dataset name”)

  • You can load/source your own dataset in R by first setting you work directory:

> setwd(“/location or directory path/”) #This is to set the working directory at some specified location.

  • To determine which directory R is using as your working directory, run:

> getwd()

  • [1] "/Users/ssg"

you can place the data files straight into the folder that is the working directory, or you can move your working directory to where your data files are. You can move your working directory to any folder on your computer with the function setwd. Just give setwd the file path to your new working directory.

If the file path does not begin with your root directory, R will assume that it begins at your current working directory.

You can also change your working directory by clicking on Session > Set Working Directory > Choose Directory in the RStudio menu bar. The Windows and Mac GUIs have similar options. If you start R from a UNIX command line (as on Linux machines), the working directory will be whichever directory you were in when you called R.

You can see what files are in your working directory with list.files(). If you see the file that you would like to open in your working directory, then you are ready to proceed. How you open files in your working directory will depend on which type of file you would like to open.

Plain-Text Files

Plain-text files are one of the most common ways to save data. They are very simple and can be read by many different computer programs—even the most basic text editors. For this reason, public data often comes as plain-text files.

You can use load.table() functionalities to read/load a plain-test file.

The following table shares a quick overview of the R’s read functions in which you can always overwrite any of the default arguments as necessary for your data type:

FunctionsDefaultsUse
read.tablesep = " “, header = FALSEGeneral read function
read.csvsep = “,”, header = TRUETo read the csv files
read.delimsep = “\t”, header = TRUETo read the tab-delimited file
read.csv2sep = “;”, header = TRUE,dec = “,”To read CSV file with European decimal format
read.delim2sep = “\t”, header = TRUE,dec = “,”To read the tab-delimited files with European decimal format.

There is one specific type of plain-text file that defies the pattern by using its layout to separate data cells. Each row is placed in its own line (as with other plain-text files), and then each column begins at a specific number of characters from the left-hand side of the document. To achieve this, an arbitrary number of character spaces is added to the end of each entry to correctly position the next entry. These documents are known as fixed-width files and usually, end with the extension “.fwf”. R has its own function to read such file – read.fwf.

HTML Links

There are many data files which are available on the Internet at their own web address. If you are connected to the Internet, you can easily open these files straight into R with read.table, read.csv, etc. You need to pass a web address into the file name argument for any of R’s data-reading functions. As a result, you could read in the iris data set from a web address like http://.../poker.csv with:

poker <- read.csv("http://.../iris.csv")

Saving the plain-text file

Once your data is in R, you can save it to any file format that R supports. If you’d like to save it as a plain-text file, you can use the ‘write’ family of functions. 

R saves data sets to plain-text files with the write family of functions:

File FormatFunction and Syntax
.csvwrite.csv(r_object, file = filepath, row.names = FALSE)
.csv (with european decimal notation)write.csv2(r_object, file = filepath, row.names = FALSE)
tab delimitedwrite.table(r_object, file = filepath, sep = "\t", row.names=FALSE)

The first argument of each function is the R object that contains your data set. The file argument is the file name (including extension) that you wish to give the saved data. By default, each function will save your data into your working directory.

Compressing file

Sometimes it becomes quite important to save a big output file and share with others. Compression is an effective way to deal in such a scenario.

To compress a plain-text file, surround the file name or file path with the function bzfile, gzfile, or xzfile. For example:

> write.csv(poker, file = bzfile("data/poker.csv.bz2"), row.names = FALSE)

There are 2 more functions that will also compress the file with two different types of compression format.

FunctionCompression Type
bzfilebzip2
gzfilegnu zip(gzip)
xzfilexz compression

R Files

R provides two file formats of its own for storing data, .RDS and .RData. RDS files can store a single R object, and RData files can store multiple R objects.

You can open an RDS file with readRDS. For example, if the iris data was saved as iris.RDS, you could open it with:

poker <- readRDS("iris.RDS")

Opening RData files is even easier. Simply run the function load with the file:

load("file.RData")

There’s no need to assign the output to an object. The R objects in your RData file will be loaded into your R session with their original names. RData files can contain multiple R objects, so loading one may read in multiple objects. load doesn’t tell you how many objects it is reading in, nor what their names are, so it pays to know a little about the RData file before you load it.

Saving R files

To save data as an RData object, one can use the save function. To save data as an RDS object, use the saveRDS function. In each case, the first argument should be the name of the R object you wish to save. You should then include a file argument that has the file name or file path you want to save the data set to.

For example, if you have three R objects, a, b, and c, you could save them all in the same RData file and then reload them in another R session:

a<-5
b<-90
c<-100
save(a, b, c, file = "test.RData") load("test.RData")

XLConnect

There are quite a few numbers of packages to help you read Excel files directly into R. Unfortunately, many of these packages do not work on all operating systems. Others have been made out of date by the .xlsx file format. One package that does work on all file systems (and gets good reviews) is the XLConnect package. To use it, you’ll need to install and load the package:

    install.packages("XLConnect")
    library(XLConnect)

XLConnect relies on Java to be platform-independent. So when you first open XLConnect, RStudio may ask to download a Java Runtime Environment if you do not already have one.

Loading files from other programs

A good number of functions are available in R, which attempt to read the file types of other data- analysis programs:

File FormatFunctionLibrary
MATLABreadMatR.matlab
minitabread.mtpforeign
SASread.ssdforeign
SAS (XPORT format)read.xportforeign
SPSSread.spssForeign
Stataread.dtaForeign
Systatread.systatforeign
ERSI ArcGISread.shapefileshapefiles

Connecting to Databases

One great aspect of R is that you can also use R to connect to a database and read in data. How you do this will depend on the database management system that you use. Working with a database will usually require experience that goes beyond the skill set of a typical R user. However, if you are interested in doing this, the best place to start is by downloading these R packages and reading their documentation.

Use the RODBC package to connect to databases through an ODBC connection.

Use the DBI package to connect to databases through individual drivers. The DBI package provides a common syntax for working with different databases. You will have to download a database-specific package to use in conjunction with DBI. These packages provide the API for the native drivers of different database programs. For MySQL use RMySQL, for SQLite use RSQLite, for Oracle use ROracle, for PostgreSQL use RPostgreSQL, and for databases that use drivers based on the Java Database Connectivity (JDBC) API use RJDBC. Once you have loaded the appropriate driver package, you can use the commands provided by DBI to access your database.

Hope you enjoyed this tutorial and in our next tutorial, we will discuss how to generate visualization using R.

Leave a Reply

Your email address will not be published. Required fields are marked *

Suggested Tutorials

Swift Tutorial

By KnowledgeHut

Introduction to Swift Tutorial
Rated 5/5 based on 44 customer reviews
Swift Tutorial

Introduction to Swift Tutorial

Read More

C# Tutorial

By KnowledgeHut

C# is an object-oriented programming developed by Microsoft that uses the .Net Framework. It utilizes the Common Language Interface (CLI) that describes the executable code as well as the runtime environment. C# can be used for various applications such as web applications, distributed applications, database applications, window applications etc.For greater understanding of this tutorial, a basic knowledge of object-oriented languages such as C++, Java etc. would be beneficial.
Rated 5/5 based on 12 customer reviews
C# Tutorial

C# is an object-oriented programming developed by Microsoft that uses ...

Read More

Python Tutorial

By KnowledgeHut

Python Tutorial 
Rated 5/5 based on 11 customer reviews