top

Search

R Programming Tutorial

“Looping”/“Cycling”/“Iterating” is a very helpful way to automate a multi-step process by organizing sequences of activities by grouping the parts that need to be repeated. In R, there are 3 types of loops: ‘repeat’, ‘while’ and ‘for’. ‘repeat’ LoopsThe easiest loop among the 3. All it does is execute the same code over and over again until you ask it to stop. In other languages, it often goes by the name do while, or something similar. In general, we want our code to complete before the end of the world so that it is possible to break out of the infinite loop by including a break statement. Sometimes, rather than breaking out of the loop we just want to skip the rest of the current iteration and start the next iteration: x <- 20  repeat{   print(x)   x = x+1   if (x==30){     break   }  } #Result [1] 20 [1] 21 [1] 22 [1] 23 [1] 24 [1] 25 [1] 26 [1] 27 [1] 28 [1] 29Sample Code with ‘repeat’ loopAs you can see in the above code snippet that the ‘repeat’ loop whose block is executed at least once and that will terminate whenever the ‘if’ condition is verified. The ‘break’ clause helps us exit or interrupt the cycles within loops.‘while’ Loops ‘while’ loops are more like backward repeat loops. Instead of executing some code and then checking to see if the loop should end or not, this type of loops check first and then (maybe) execute. Since the check happens at the very beginning, it is possible that the contents of the loop will never be executed (unlike in a repeat loop).Sample Code with ‘while’ loopSame results as above will be obtained using the ‘while’ loop as we got from the example of the ‘repeat’ loop. i <- 20 while (i < 30) {   print(i)   i = i+1 }In general, it is always possible to convert a ‘repeat’ loop to a ‘while’ loop or a ‘while’ loop to a ‘repeat’ loop, but usually the syntax is much cleaner one way or the other. If you know that the contents must execute at least once, use repeat; otherwise, use while. ‘for’ LoopsThe third type of loop is to be used when someone knows exactly how many times you want the code to repeat. The for loop accepts an iterator variable and a vector. It repeats the loop, giving the iterator each element from the vector in turn. In the simplest case, the vector contains integers: Sample Code with ‘for’ loopx <- c(1,9,3,5,8,7,2) count <- 0 for (val in x) {   if(val %% 2 == 0)  count = count+1 } print(count) [1] 2In the above example, the loop iterates 7 times as the vector x has got 7 elements. In each iteration, the variable takes on the value of the corresponding element of x. Here we have used a counter to count the number of even numbers in x. We can see that x contains 2 (2 and 8) even numbers.Breaking a LoopWhen the R interpreter encounters a break, it will pass control to the instruction immediately after the end of the loop (if any). In the case of nested loops, the break will allow exit only from the innermost loop in the section.# Make a lower triangular matrix (zeroes in upper right corner) m=5 n=5 # A counter to count the assignment ctr=0 # Create a 5 x 5 matrix with zeroes mat = matrix(0,m,n) for(i in 1:m) {   for(j in 1:n) {     if(i==j) {       break;     } else {       # you assign the values only when i<>j       mat[i,j] = i*j       ctr=ctr+1     }   }   print(i*j) } # Result [1] 1 [1] 4 [1] 9 [1] 16 [1] 25 # Print how many matrix cell were assigned print(ctr) #Result [1] 10The above code snippet defines an m x n (5 x 5) matrix of zeros and then enters a nested for loop to fill the locations of the matrix, but only if the two indexes differ. The purpose was to create a lower triangular matrix, that is a matrix whose elements below the main diagonal are non-zero. The others are left untouched to their initialized zero value. When the indexes are equal and thus the condition in the inner loop, which runs over the column index ‘j’ is fulfilled, a ‘break’ command is executed and the innermost loop is interrupted with a direct jump to the instruction following the inner loop. This instruction is to print(). Then, control gets to the outer for condition (over the rows, index ‘i’), which is evaluated again. If the indexes differ, the assignment is performed and the counter is incremented by 1. In the end, the program prints the counter ‘ctr', which contains the number of elements that were assigned.Use of ‘next’ in loops‘next’ also discontinues a particular iteration and shift to the next cycle of operation. In other languages, you may find the (slightly confusing) equivalent called “continue”, which means the same: wherever you are, upon the verification of the condition, jump to the evaluation of the loop.Example of ‘next’ in R codem=5 for (k in 1:m){   if (!k %% 2)     next   print(k) } [1] 1 [1] 3 [1] 5‘If then else’An if-else statement is a very powerful tool to return output based on a condition. In R. Let’s think about a scenario where, for a transition data for a product, we have the information for the number units sold daily for say last 5years and we want to dig deeper and check how many days are there where the number of units sold is between 50 and 70 and for any day, it the value is higher than 70, we mark it as an exceptional day. The syntax would look something like this:# Create vector quantity # Create vector quantity quantity <-  100 # Create multiple condition statement if (quantity <50) {   print('Not enough for today') } else if (quantity > 50  &quantity <= 60) {   print('Average day') } else {   print('Great day!') } #Results [1] "Great day!"Switch () Function and its usesSometimes, you might end up writing a very big nested if-then-else conditions for a query and that might create some challenges if anything goes wrong inside that query. One effective solution would be to use ‘switch()’ function. It allows you to evaluate the selected code based on the position or name:function(x, y, op) {   switch(op,          add = x + y,          sub = x - y,          mul = x * y,          div = x / y,          stop("Unknown operation!")          ) }When to use R LoopsLoops are very handy options for any repetitive operations and you just need to specify how many times or which conditions you would like operation to repeat itself. you assign initial values to a control loop variable, perform the loop and then, once the loop has finished, you typically do something with the results.For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it’s possible to wrap up for loops in a function and call that function instead of using the for loop directly. There are a few limitations that any practitioner would highlight about ‘for’ loops or other types of the loop is that they are slow in operations (which is a fact though even after recent modifications!). Another major issue with the loop is that they are not very expressive. “A for loop conveys that it’s iterating over something, but doesn’t clearly convey a high-level goal. Instead of using a for loop, it’s better to use a functional. Each functional is tailored for a specific task, so when you recognize the functional you know immediately why it’s being used. Functionals play other roles as well as replacements for for-loops. They are useful for encapsulating common data manipulation tasks like split-apply-combine, for thinking “functionally”, and for working with mathematical functions” [Ref: “Advanced R” by Hadley Wickham]Alternatives to Loops in RVectorization: Vectorization is the process of converting repeated operations on simple numbers (referred to as ‘scalers’ ) into single operations on vectors or matrices. Now, a vector is the elementary data structure in R and is “a single entity consisting of a collection of things”, according to the R base manual. If you combine vectors (of the same length), you obtain a matrix. You can do this vertically or horizontally, using different R instructions. Thus in R, a matrix is seen as a collection of horizontal or vertical vectors. By extension, you can vectorize repeated operations on vectors. So in that connection, many of the above operations can be made implicit by using vectorization. The apply() family - In R, a very powerful and rich family of functions which is made of intrinsically vectorized functions, is the apply() functions. “The apply command or rather family of commands pertain to the R base package. It is populated with a number of functions (the [s, l, m, r, t, v]apply) to manipulate slices of data in the form of matrices or arrays in a repetitive way, allowing to cross or traverse the data and avoiding explicit use of loop constructs. The functions act on an input matrix or array and apply a chosen named function with one or several optional arguments. “apply (): ‘apply()’ can be used to apply a function to a matrix. For example:x <- 20  repeat{   print(x)   x = x+1   if (x==30){     break   }  }In this example, first we are creating a matrix of values generated using random numbers and then we are performing various operations using ‘apply’ functionallapply (): lapply() is similar to apply, but it takes a list as an input and returns a list as output.sapply(): sapply() is again similar as lapply(), but returns a vector instead of a list.Let’s have a look at the results of both ‘lapply’ and ‘sapply’ functionals:#Let's create a matrix mat_new <- matrix(data=cbind(rnorm(20, 0), rnorm(20, 2), rnorm(20, 5)), nrow=20, ncol=3) mat_new #First few records from the derived matrix           [,1]       [,2] [,3] [1,] -0.96051550  3.1468613 6.072214 [2,] -1.39166772  2.9056725 5.722543 [3,]  0.88049546  4.2234216 4.839496 [4,]  0.17057773  2.8729929 7.126668 [5,] -0.46655639  1.3653404 4.300621 [6,]  0.84594859  1.9774440 4.281742 #Let's apply apply function to calculate the row-wise means apply(mat_new, 1, mean) #Results [1] 2.752853 2.412182 3.314471 3.390080 1.733135 2.368378 2.761731 2.660597 2.175851 [10] 2.960857 1.782839 1.752489 1.453420 2.243107 2.191123 2.182224 2.444636 2.415256 [19] 3.026896 1.851031 #Let's apply apply function to calculate the column-wise means apply(mat_new, 2, mean) #Results [1] -0.1371763  2.3060624 5.0120875 # Let's find out how many negative numbers each column has got apply(mat_new,2, function(y) length(y[y<0])) #Results [1] 11  1 0 #Let's get the mean of the positive values in the matrix apply(mat_new, 2, function(y) mean(y[y>0])) #Results [1] 0.6629146 2.4402103 5.0120875data_apply <- matrix(c(1:20, 11:30), nrow = 5, ncol = 4) data_apply #Result       [,1] [,2] [,3] [,4] [1,]    1 6 11   16 [2,]    2 7 12   17 [3,]    3 8 13   18 [4,]    4 9 14   19 [5,]    5 10 15   20 # Now we can use the apply function to find the mean/median of each row as follows apply(data_apply, 1, mean) #Result [1]  8.5 9.5 10.5 11.5 12.5tapply() : tapply() basically splits the array based on any data, usually at factor level and then applies the functions to it:We will be using the ‘mtcars’ dataset:library(datasets) tapply(mtcars$wt, mtcars$cyl, median) 4 6 8 2.200 3.215 3.755The ‘tapply’ function first groups the cars together based on the number of cylinders they have and then calculate the median weight for each group.mapply() : ‘mapply()’ is a multivariate version of sapply. It will apply the specified function to the first element of each argument first, followed by the second element, and so on. For example:It adds 1 with 11, 2 with 12, and so on.x <- 1:10 y <- 11:20 mapply(sum, x, y)  [1] 12 14 16 18 20 22 24 26 28 30
logo

R Programming Tutorial

Control and Loop Statements in R & Loop functions

“Looping”/“Cycling”/“Iterating” is a very helpful way to automate a multi-step process by organizing sequences of activities by grouping the parts that need to be repeated. In R, there are 3 types of loops: ‘repeat’, ‘while’ and ‘for’. 

‘repeat’ Loops

The easiest loop among the 3. All it does is execute the same code over and over again until you ask it to stop. In other languages, it often goes by the name do while, or something similar. In general, we want our code to complete before the end of the world so that it is possible to break out of the infinite loop by including a break statement. Sometimes, rather than breaking out of the loop we just want to skip the rest of the current iteration and start the next iteration: 

x <- 20
 repeat{
  print(x)
  x = x+1
  if (x==30){
    break
  }
 }
#Result
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29

Sample Code with ‘repeat’ loop

As you can see in the above code snippet that the ‘repeat’ loop whose block is executed at least once and that will terminate whenever the ‘if’ condition is verified. The ‘break’ clause helps us exit or interrupt the cycles within loops.

‘while’ Loops

 ‘while loops are more like backward repeat loops. Instead of executing some code and then checking to see if the loop should end or not, this type of loops check first and then (maybe) execute. Since the check happens at the very beginning, it is possible that the contents of the loop will never be executed (unlike in a repeat loop).

Sample Code with ‘while’ loop

Same results as above will be obtained using the ‘while’ loop as we got from the example of the ‘repeat’ loop. 

i <- 20
while (i < 30) {
  print(i)
  i = i+1
}

In general, it is always possible to convert a ‘repeat’ loop to a ‘while’ loop or a ‘while’ loop to a ‘repeat’ loop, but usually the syntax is much cleaner one way or the other. If you know that the contents must execute at least once, use repeat; otherwise, use while. 

‘for’ Loops

The third type of loop is to be used when someone knows exactly how many times you want the code to repeat. The for loop accepts an iterator variable and a vector. It repeats the loop, giving the iterator each element from the vector in turn. In the simplest case, the vector contains integers: 

Sample Code with ‘for’ loop

x <- c(1,9,3,5,8,7,2)
count <- 0
for (val in x) {
  if(val %% 2 == 0)  count = count+1
}
print(count)
[1] 2

In the above example, the loop iterates 7 times as the vector x has got 7 elements. In each iteration, the variable takes on the value of the corresponding element of x. Here we have used a counter to count the number of even numbers in x. We can see that x contains 2 (2 and 8) even numbers.

Breaking a Loop

When the R interpreter encounters a break, it will pass control to the instruction immediately after the end of the loop (if any). In the case of nested loops, the break will allow exit only from the innermost loop in the section.

# Make a lower triangular matrix (zeroes in upper right corner)
m=5
n=5
# A counter to count the assignment
ctr=0
# Create a 5 x 5 matrix with zeroes
mat = matrix(0,m,n)
for(i in 1:m) {
  for(j in 1:n) {
    if(i==j) {
      break;
    } else {
      # you assign the values only when i<>j
      mat[i,j] = i*j
      ctr=ctr+1
    }
  }
  print(i*j)
}
# Result
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
# Print how many matrix cell were assigned
print(ctr)
#Result
[1] 10

The above code snippet defines an m x n (5 x 5) matrix of zeros and then enters a nested for loop to fill the locations of the matrix, but only if the two indexes differ. The purpose was to create a lower triangular matrix, that is a matrix whose elements below the main diagonal are non-zero. The others are left untouched to their initialized zero value. When the indexes are equal and thus the condition in the inner loop, which runs over the column index ‘j’ is fulfilled, a ‘break’ command is executed and the innermost loop is interrupted with a direct jump to the instruction following the inner loop. This instruction is to print(). Then, control gets to the outer for condition (over the rows, index ‘i’), which is evaluated again. If the indexes differ, the assignment is performed and the counter is incremented by 1. In the end, the program prints the counter ‘ctr', which contains the number of elements that were assigned.

Use of ‘next’ in loops

‘next’ also discontinues a particular iteration and shift to the next cycle of operation. In other languages, you may find the (slightly confusing) equivalent called “continue”, which means the same: wherever you are, upon the verification of the condition, jump to the evaluation of the loop.

Example of ‘next’ in R code

m=5
for (k in 1:m){
  if (!k %% 2)
    next
  print(k)
}
[1] 1
[1] 3
[1] 5

‘If then else’

An if-else statement is a very powerful tool to return output based on a condition. In R. Let’s think about a scenario where, for a transition data for a product, we have the information for the number units sold daily for say last 5years and we want to dig deeper and check how many days are there where the number of units sold is between 50 and 70 and for any day, it the value is higher than 70, we mark it as an exceptional day. The syntax would look something like this:

# Create vector quantity
# Create vector quantity
quantity <-  100
# Create multiple condition statement
if (quantity <50) {
  print('Not enough for today')
} else if (quantity > 50  &quantity <= 60) {
  print('Average day')
} else {
  print('Great day!')
}
#Results
[1] "Great day!"

Switch () Function and its uses

Sometimes, you might end up writing a very big nested if-then-else conditions for a query and that might create some challenges if anything goes wrong inside that query. One effective solution would be to use ‘switch()’ function. It allows you to evaluate the selected code based on the position or name:

function(x, y, op) {
  switch(op,
         add = x + y,
         sub = x - y,
         mul = x * y,
         div = x / y,
         stop("Unknown operation!")
         )
}

When to use R Loops

Loops are very handy options for any repetitive operations and you just need to specify how many times or which conditions you would like operation to repeat itself. you assign initial values to a control loop variable, perform the loop and then, once the loop has finished, you typically do something with the results.

For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it’s possible to wrap up for loops in a function and call that function instead of using the for loop directly. 

There are a few limitations that any practitioner would highlight about ‘for’ loops or other types of the loop is that they are slow in operations (which is a fact though even after recent modifications!). Another major issue with the loop is that they are not very expressive. “A for loop conveys that it’s iterating over something, but doesn’t clearly convey a high-level goal. Instead of using a for loop, it’s better to use a functional. Each functional is tailored for a specific task, so when you recognize the functional you know immediately why it’s being used. Functionals play other roles as well as replacements for for-loops. They are useful for encapsulating common data manipulation tasks like split-apply-combine, for thinking “functionally”, and for working with mathematical functions” [Ref: “Advanced R” by Hadley Wickham]

Alternatives to Loops in R

Vectorization: Vectorization is the process of converting repeated operations on simple numbers (referred to as ‘scalers’ ) into single operations on vectors or matrices. Now, a vector is the elementary data structure in R and is “a single entity consisting of a collection of things”, according to the R base manual. If you combine vectors (of the same length), you obtain a matrix. You can do this vertically or horizontally, using different R instructions. Thus in R, a matrix is seen as a collection of horizontal or vertical vectors. By extension, you can vectorize repeated operations on vectors. So in that connection, many of the above operations can be made implicit by using vectorization. 

The apply() family - In R, a very powerful and rich family of functions which is made of intrinsically vectorized functions, is the apply() functions. “The apply command or rather family of commands pertain to the R base package. It is populated with a number of functions (the [s, l, m, r, t, v]apply) to manipulate slices of data in the form of matrices or arrays in a repetitive way, allowing to cross or traverse the data and avoiding explicit use of loop constructs. The functions act on an input matrix or array and apply a chosen named function with one or several optional arguments. 

“apply (): ‘apply()’ can be used to apply a function to a matrix. For example:

x <- 20
 repeat{
  print(x)
  x = x+1
  if (x==30){
    break
  }
 }

In this example, first we are creating a matrix of values generated using random numbers and then we are performing various operations using ‘apply’ functional

  • lapply (): lapply() is similar to apply, but it takes a list as an input and returns a list as output.
  • sapply(): sapply() is again similar as lapply(), but returns a vector instead of a list.

Let’s have a look at the results of both ‘lapply’ and ‘sapply’ functionals:

#Let's create a matrix
mat_new <- matrix(data=cbind(rnorm(20, 0), rnorm(20, 2), rnorm(20, 5)), nrow=20, ncol=3)
mat_new
#First few records from the derived matrix
          [,1]       [,2] [,3]
[1,] -0.96051550  3.1468613 6.072214
[2,] -1.39166772  2.9056725 5.722543
[3,]  0.88049546  4.2234216 4.839496
[4,]  0.17057773  2.8729929 7.126668
[5,] -0.46655639  1.3653404 4.300621
[6,]  0.84594859  1.9774440 4.281742
#Let's apply apply function to calculate the row-wise means
apply(mat_new, 1, mean)
#Results
[1] 2.752853 2.412182 3.314471 3.390080 1.733135 2.368378 2.761731 2.660597 2.175851
[10] 2.960857 1.782839 1.752489 1.453420 2.243107 2.191123 2.182224 2.444636 2.415256
[19] 3.026896 1.851031
#Let's apply apply function to calculate the column-wise means
apply(mat_new, 2, mean)
#Results
[1] -0.1371763  2.3060624 5.0120875
# Let's find out how many negative numbers each column has got
apply(mat_new,2, function(y) length(y[y<0]))
#Results
[1] 11  1 0
#Let's get the mean of the positive values in the matrix
apply(mat_new, 2, function(y) mean(y[y>0]))
#Results
[1] 0.6629146 2.4402103 5.0120875
data_apply <- matrix(c(1:20, 11:30), nrow = 5, ncol = 4)
data_apply
#Result
      [,1] [,2] [,3] [,4]
[1,]    1 6 11   16
[2,]    2 7 12   17
[3,]    3 8 13   18
[4,]    4 9 14   19
[5,]    5 10 15   20
# Now we can use the apply function to find the mean/median of each row as follows
apply(data_apply, 1, mean)
#Result
[1]  8.5 9.5 10.5 11.5 12.5

tapply() : tapply() basically splits the array based on any data, usually at factor level and then applies the functions to it:

We will be using the ‘mtcars’ dataset:

library(datasets)
tapply(mtcars$wt, mtcars$cyl, median)
4 6 8
2.200 3.215 3.755

The ‘tapply’ function first groups the cars together based on the number of cylinders they have and then calculate the median weight for each group.

mapply() : ‘mapply()’ is a multivariate version of sapply. It will apply the specified function to the first element of each argument first, followed by the second element, and so on. For example:

It adds 1 with 11, 2 with 12, and so on.

x <- 1:10
y <- 11:20
mapply(sum, x, y)
 [1] 12 14 16 18 20 22 24 26 28 30

Leave a Reply

Your email address will not be published. Required fields are marked *

Suggested Tutorials

Swift Tutorial

Introduction to Swift Tutorial
Swift Tutorial

Introduction to Swift Tutorial

Read More

C# Tutorial

C# is an object-oriented programming developed by Microsoft that uses the .Net Framework. It utilizes the Common Language Interface (CLI) that describes the executable code as well as the runtime environment. C# can be used for various applications such as web applications, distributed applications, database applications, window applications etc.For greater understanding of this tutorial, a basic knowledge of object-oriented languages such as C++, Java etc. would be beneficial.
C# Tutorial

C# is an object-oriented programming developed by Microsoft that uses ...

Read More

Python Tutorial

Python Tutorial