top
April flash sale

Search

R Programming Tutorial

R-functions are nothing but a special type of R-objects (that contains only a body of the code). The special format in which R-functions are built helps to create in turn a reusable framework, which can be used in other situations as well. Anyone can build/create their own functions by recreating this format.Functions are one of the most fundamental building blocks of R: In order to master many of the more advanced techniques in R, one needs a solid foundation in how functions work. The primary focus of this tutorial is to turn your existing, informal knowledge of functions into a rigorous understanding of what functions are and how they work. You’ll experience some interesting tricks and techniques in this tutorial, but most of what you’ll learn will be more important as the building blocks for more advanced techniques.The Function ConstructorR-functions has got 3 major components: name of the function, body of the code and a set of arguments within the function. One can define their own functions by constructing each of these parts and store them in an object.For example, any function can be defined using this generic structure:my_func <- function() {}The function will build a function out of code chunks you pass on in R code. One of the biggest strengths of R is the user's flexibility to add functions. In fact, many of the functions in R are actually functions of functions. The generic structure of a function would follow a particular format:my_func <- function(arg1, arg2, ... ){ statement(s)  return(object) }Objects in the function are typically local to the function. The object returned can be any data type as well.For example, we simply writing our own function to get the R-Squared value of any linear regression exercise:r_square <- function(y, y_hat) { 1 - sum((y - y_hat)^2)/sum((y-mean(y))^2) }You can simply use these functions to calculate R-squared value(s) between the prediction and the actual y (Dependent Variable) with the R commands in the following listing.r_square(y, y_hat)Note on R-Squared value: R-squared can be thought of as what fraction of they variation is explained by the model. For well-fit models, R-squared is also equal to the square of the correlation between the predicted values and actual training values. Similarly using the shared framework, one can create the UDF for the RMSE (Root Mean Squared Error) for any given machine learning model:rmse <- function(y, f) { sqrt(mean( (y-f)^2 )) }All these above examples show how you can write your own sample UDF and use that to work various aspects of your exercise.In order to gain some understanding about what the function/object contains, you can simply run the function by calling it by its own name:r_squared#Resultfunction(y, x) { 1 - sum((y-x)^2)/sum((y-mean(y))^2) }“{ 1 - sum((y-x)^2)/sum((y-mean(y))^2) }” – this part in the function call is the body of the defined function. R will typically execute the entire code and return the outcome of the last line. So while creating the function, please ensure that it finally returns a value as the final outcome.A couple of important points to remember while working with UDF in R:There is no limit in terms of the number of arguments one can pass on in a function. The only caveat is that all these arguments should be separated by ‘commas’. In case the user fails to provide the values of the argument(s), the default values of those arguments will be used.UDF (User-defined function) helps you with the flexibility to customize your own solution/metrics.Function componentsAll R functions have three parts:the body(), the code inside the function.the formals(), the list of arguments which controls how you can call the function.the environment(), the “map” of the location of the function’s variables.When you print a function in R, it shows you these three important components. If the environment isn’t displayed, it means that the function was created in the global environment.Lexical ScopingScoping is the set of rules that controls how R looks up the value of a symbol. R has two types of scoping: lexical scoping, implemented automatically at the language leveldynamic scoping, used in select functions to save typing during interactive analysis. Now there are four basic but distinct principles behind R’s implementation of lexical scoping:Name Maskingfunctions Vs Variablesdynamic lookupfresh startExample – Name Masking:f <- function() {   x <- 5   y <- 6 c(x, y) } f() #Result [1] 5 6 rm(f) # The functions is removedIf no name is defined inside a function, R will look one level up.x <- 9 g <- function() {   y <- 10 c(x, y) } g() #Result [1] 9 10The same rules will be applied if a function is defined inside another function: look inside the current function, then where that function was defined, and so on, all the way up to the global environment, and then on to other loaded packages. Please refer to the following code:x <- 1 h <- function() {   y <- 2   i <- function() {     z <- 3 c(x, y, z)   } i() } h()Functions vs. variablesFor functions, there is one small change to the rule. If you are using a name in a context where it’s quite obvious that you want a function (e.g., f(9)), R will ignore objects that are not functions while it is searching. In the following example, n takes on a different value depending on whether R is looking for a function or a variable.n <- function(x) x / 2 o <- function() {   n <- 10 n(n) } o() #Result [1] 3.33333So as the above example shows, please avoid using the same name to define a function and as this will lead to confusion.A fresh startWhat happens to the values between invocations of a function? What will happen the first time you run this function? What will happen the second time? To understand this better please look at the following code:k <- function() {   if (!exists("p")) {     p <- 9   } else {     p <- p + 1   }   p } k() #Result [1] 9You might be surprised that it returns the same value, 1, every time. This is because every time a function is called, a new environment is created to host execution. A function has no way to tell what happened the last time it was run; Dynamic LookupLexical scoping in other ways determines where to look for values, not when to look for them. R looks for values when the function is run, not when its build. This means that the output of a function can be different at times depending on objects outside its environment:Function ArgumentsIt’s indeed helpful to distinguish between formal arguments and actual arguments of a function. The formal arguments are a property of the function, on the other hand, the actual or calling arguments can vary each time you call the function.Calling FunctionsWhen calling a function one can specify arguments by position, by complete name, or by partial name. Arguments are matched first by exact name (perfect matching), then by prefix matching, and finally by position.Example:f <- function(abcde, bcde1, bcde2) { list(a = abcde, b1 = bcde1, b2 = bcde2) } str(f(1, 2, 3)) #Result List of 3  $ a : num 1  $ b1: num 2  $ b2: num 3 # Can abbreviate long argument names: str(f(2, 3, a = 1)) #Result List of 3  $ a : num 1  $ b1: num 2  $ b2: num 3Calling a function given a list of arguments:Suppose you have a list of arguments:args <- list(1:100, na.rm = TRUE) #How could you then send that list to mean()? You need do.call(): do.call(mean, args) #Result [1] 50.5 #Default and missing arguments f <- function(a = 5, b = 10) {   c(a, b) } f() #Result [1]  5 10 #Default arguments can even be defined in terms of  #variables created within the function.  k <- function(a = 1, b = d) {   d <- (a + 1) ^ 3   c(a, b) } k() #Result [1] 1 8One can also find out if the argument was actually supplied or not using missing() function:i <- function(a, b) { c(missing(a), missing(b)) } i() #Result ## [1] TRUE TRUE i(a = 5) #Result ## [1] FALSE  TRUELazy Evaluation:R function arguments are lazy in general in a sense — they’re only evaluated if they’re actually used:f <- function(x) { 999 } f(stop("This is an error!")) #Result [1] 999Special Calls:R supports two additional syntaxes for calling special types of functions: infix and replacement functions.Infix functionsMajority of R-functions are basically “prefix” operators: the name of the function comes before the arguments. One can also create infix functions where the function name comes in between its arguments, like + or -. All user-created infix functions must start and end with %. R comes with the following infix functions predefined: %%, %*%, %/%, %in%, %o%, %x%. (The complete list of built-in infix operators that don’t need % is: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-)For example, the following function creates a new operator that pastes together strings:`%+%` <- function(a, b) paste0(a, b) "great" %+% " functionality" #Result ## [1] "great functionality"Replacement FunctionReplacement functions act to modify their arguments in place, and they can have a special name like yyy<-. They typically have two arguments (y and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the third element of a vector:`third<-` <- function(x, value) {   x[3] <- value   x } x <- 1:9 third(x) <- 5L x #Result  [1] 1 2 5 4 5 6 7 8 9“When R evaluates the assignment third(x) <- 5, it notices that the left hand side of the <- is not a simple name, so it looks for a function named second<- to do the replacement.”So these tutorials takes you to an in-depth tour of R-User defined functions.
logo

R Programming Tutorial

Functions in R

R-functions are nothing but a special type of R-objects (that contains only a body of the code). The special format in which R-functions are built helps to create in turn a reusable framework, which can be used in other situations as well. Anyone can build/create their own functions by recreating this format.

Functions are one of the most fundamental building blocks of R: In order to master many of the more advanced techniques in R, one needs a solid foundation in how functions work. The primary focus of this tutorial is to turn your existing, informal knowledge of functions into a rigorous understanding of what functions are and how they work. You’ll experience some interesting tricks and techniques in this tutorial, but most of what you’ll learn will be more important as the building blocks for more advanced techniques.

The Function Constructor

R-functions has got 3 major components: name of the function, body of the code and a set of arguments within the function. One can define their own functions by constructing each of these parts and store them in an object.

For example, any function can be defined using this generic structure:

my_func <- function() {}

The function will build a function out of code chunks you pass on in R code. 

One of the biggest strengths of R is the user's flexibility to add functions. In fact, many of the functions in R are actually functions of functions. The generic structure of a function would follow a particular format:

my_func <- function(arg1, arg2, ... ){
statement(s) 
return(object)
}

Objects in the function are typically local to the function. The object returned can be any data type as well.

For example, we simply writing our own function to get the R-Squared value of any linear regression exercise:

r_square <- function(y, y_hat) { 1 - sum((y - y_hat)^2)/sum((y-mean(y))^2) }

You can simply use these functions to calculate R-squared value(s) between the prediction and the actual y (Dependent Variable) with the R commands in the following listing.

r_square(y, y_hat)

Note on R-Squared value: R-squared can be thought of as what fraction of they variation is explained by the model. For well-fit models, R-squared is also equal to the square of the correlation between the predicted values and actual training values. 

Similarly using the shared framework, one can create the UDF for the RMSE (Root Mean Squared Error) for any given machine learning model:

rmse <- function(y, f) { sqrt(mean( (y-f)^2 )) }

All these above examples show how you can write your own sample UDF and use that to work various aspects of your exercise.

In order to gain some understanding about what the function/object contains, you can simply run the function by calling it by its own name:

r_squared

#Result

function(y, x) { 1 - sum((y-x)^2)/sum((y-mean(y))^2) }

“{ 1 - sum((y-x)^2)/sum((y-mean(y))^2) }” – this part in the function call is the body of the defined function. R will typically execute the entire code and return the outcome of the last line. So while creating the function, please ensure that it finally returns a value as the final outcome.

A couple of important points to remember while working with UDF in R:

  1. There is no limit in terms of the number of arguments one can pass on in a function. The only caveat is that all these arguments should be separated by ‘commas’. In case the user fails to provide the values of the argument(s), the default values of those arguments will be used.
  2. UDF (User-defined function) helps you with the flexibility to customize your own solution/metrics.

Function components

All R functions have three parts:

  • the body(), the code inside the function.
  • the formals(), the list of arguments which controls how you can call the function.
  • the environment(), the “map” of the location of the function’s variables.

When you print a function in R, it shows you these three important components. If the environment isn’t displayed, it means that the function was created in the global environment.

Lexical Scoping

Scoping is the set of rules that controls how R looks up the value of a symbol. R has two types of scoping: 

  • lexical scoping, implemented automatically at the language level
  • dynamic scoping, used in select functions to save typing during interactive analysis. 

Now there are four basic but distinct principles behind R’s implementation of lexical scoping:

  • Name Masking
  • functions Vs Variables
  • dynamic lookup
  • fresh start

Example – Name Masking:

f <- function() {
  x <- 5
  y <- 6
c(x, y)
}
f()
#Result
[1] 5 6
rm(f) # The functions is removed

If no name is defined inside a function, R will look one level up.

x <- 9
g <- function() {
  y <- 10
c(x, y)
}
g()
#Result
[1] 9 10

The same rules will be applied if a function is defined inside another function: look inside the current function, then where that function was defined, and so on, all the way up to the global environment, and then on to other loaded packages. Please refer to the following code:

x <- 1
h <- function() {
  y <- 2
  i <- function() {
    z <- 3
c(x, y, z)
  }
i()
}
h()

Functions vs. variables

For functions, there is one small change to the rule. If you are using a name in a context where it’s quite obvious that you want a function (e.g., f(9)), R will ignore objects that are not functions while it is searching. In the following example, n takes on a different value depending on whether R is looking for a function or a variable.

n <- function(x) x / 2
o <- function() {
  n <- 10
n(n)
}
o()
#Result
[1] 3.33333

So as the above example shows, please avoid using the same name to define a function and as this will lead to confusion.

A fresh start

What happens to the values between invocations of a function? What will happen the first time you run this function? What will happen the second time? To understand this better please look at the following code:

k <- function() {
  if (!exists("p")) {
    p <- 9
  } else {
    p <- p + 1
  }
  p
}
k()
#Result
[1] 9

You might be surprised that it returns the same value, 1, every time. This is because every time a function is called, a new environment is created to host execution. A function has no way to tell what happened the last time it was run; 

Dynamic Lookup

Lexical scoping in other ways determines where to look for values, not when to look for them. R looks for values when the function is run, not when its build. This means that the output of a function can be different at times depending on objects outside its environment:

Function Arguments

It’s indeed helpful to distinguish between formal arguments and actual arguments of a function. The formal arguments are a property of the function, on the other hand, the actual or calling arguments can vary each time you call the function.

Calling Functions

When calling a function one can specify arguments by position, by complete name, or by partial name. Arguments are matched first by exact name (perfect matching), then by prefix matching, and finally by position.

Example:

f <- function(abcde, bcde1, bcde2) {
list(a = abcde, b1 = bcde1, b2 = bcde2)
}
str(f(1, 2, 3))
#Result
List of 3
 $ a : num 1
 $ b1: num 2
 $ b2: num 3
# Can abbreviate long argument names:
str(f(2, 3, a = 1))
#Result
List of 3
 $ a : num 1
 $ b1: num 2
 $ b2: num 3

Calling a function given a list of arguments:

Suppose you have a list of arguments:

args <- list(1:100, na.rm = TRUE)
#How could you then send that list to mean()? You need do.call():
do.call(mean, args)
#Result
[1] 50.5
#Default and missing arguments
f <- function(a = 5, b = 10) {
  c(a, b)
}
f()
#Result
[1]  5 10
#Default arguments can even be defined in terms of 
#variables created within the function. 
k <- function(a = 1, b = d) {
  d <- (a + 1) ^ 3
  c(a, b)
}
k()
#Result
[1] 1 8

One can also find out if the argument was actually supplied or not using missing() function:

i <- function(a, b) {
c(missing(a), missing(b))
}
i()
#Result
## [1] TRUE TRUE
i(a = 5)
#Result
## [1] FALSE  TRUE

Lazy Evaluation:

R function arguments are lazy in general in a sense — they’re only evaluated if they’re actually used:

f <- function(x) {
999
}
f(stop("This is an error!"))
#Result
[1] 999

Special Calls:

R supports two additional syntaxes for calling special types of functions: infix and replacement functions.

Infix functions

Majority of R-functions are basically “prefix” operators: the name of the function comes before the arguments. One can also create infix functions where the function name comes in between its arguments, like + or -. All user-created infix functions must start and end with %. R comes with the following infix functions predefined: %%, %*%, %/%, %in%, %o%, %x%. (The complete list of built-in infix operators that don’t need % is: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-)

For example, the following function creates a new operator that pastes together strings:

`%+%` <- function(a, b) paste0(a, b)
"great" %+% " functionality"
#Result
## [1] "great functionality"

Replacement Function

Replacement functions act to modify their arguments in place, and they can have a special name like yyy<-. They typically have two arguments (y and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the third element of a vector:

`third<-` <- function(x, value) {
  x[3] <- value
  x
}
x <- 1:9
third(x) <- 5L
x
#Result
 [1] 1 2 5 4 5 6 7 8 9

“When R evaluates the assignment third(x) <- 5, it notices that the left hand side of the <- is not a simple name, so it looks for a function named second<- to do the replacement.”

So these tutorials takes you to an in-depth tour of R-User defined functions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments

liana

Thanks for this info.

Suggested Tutorials

Swift Tutorial

Introduction to Swift Tutorial
Swift Tutorial

Introduction to Swift Tutorial

Read More

C# Tutorial

C# is an object-oriented programming developed by Microsoft that uses the .Net Framework. It utilizes the Common Language Interface (CLI) that describes the executable code as well as the runtime environment. C# can be used for various applications such as web applications, distributed applications, database applications, window applications etc.For greater understanding of this tutorial, a basic knowledge of object-oriented languages such as C++, Java etc. would be beneficial.
C# Tutorial

C# is an object-oriented programming developed by Microsoft that uses ...

Read More

Python Tutorial

Python Tutorial