R Programming Interview Questions

Prepare for your R interview with the top R interview questions curated by our experts. This will help convert your R interview into a top job offer as a Business Statistical Analyst, R programmer, etc. The following list covers the conceptual questions for freshers and experts and helps you answer questions on the difference between dcast() and table(), tidy data in R, etc, giving you an edge in the data analytics market. Prepare well with these R programming interview questions and answers and ace your next interview.

  • 4.8 Rating
  • 21 Question(s)
  • 30 Mins of Read
  • 2385 Reader(s)


We can use gather() function in tidyr package to accomplish this.

Below is the desired line of code.

# This will load the “tidyr” package
# This will reshape the data in desired format
gather(my_df,"Year","n",2:4,convert = TRUE)

gather() function parameters –

  • my_df is the first parameter to reshape the data.
  • “Year” is the second parameter which is name of the new key column, typically this is a character string.
  • “n” is the third parameter which is the name of the new value column.
  • 2:4 is the fourth parameter which shows names or numeric indexes of columns to collapse from your input dataset .
  • “Convert=TRUE” is the last parameter mentioned here which converts number in the keys column from factors to numeric.

We can use the following approach using separate to distribute date field into three separate columns for year, month and day values.

# This will load the tidyr package
# This will reshape the data in desired format
separate(my_df, Col4, c("year","month","day"),sep = "-")

separate() function will use the parameters appropriately to display data in desired format.

  • First parameter used here is the data frame which is my_df.
  • Second parameter used here is the date column. We can use any column to split up as per need.
  • Third parameter used here is the names of new columns to make.
  • Fourth parameter is the string to split on. Basically this is the separation criteria. By default, separate() will split on any non-alphanumeric characters.

The output data will not be same as that of input.

Output will look like below.


The difference is in the format of Col4 which is the date value.
Separate() function splits into 3 different parts of this date column.
Unite() function unites these 3 different parts into one column which is Col4.
However the format is slightly different as mentioned in the code.

Here we are converting non-tidy format to tidy format and again back to non-tidy format.

The differences are the following:

apply(): Use as an alternative to for() loop

lapply(): Applies function to every item and returns the result as a list

sapply(): function will be executed column wise

tapply(): Similar to aggregate() function

These are NOT same. Flights_mutate1 will perform appropriately. Where as

flights_mutate2 will throw an error. We can not use select because the derived variables “speed” does not exist. It has to be created first using mutate() function and then select() function can be used to extract specific variables from the data frame.

The n() provides the number of values in a vector, where as n_distinct() provides number of distinct values in a vector. For example, if we take the sample “flights” dataset in R, then we see the following characteristic:

We first remove the NA values from air_time and distance before using the summarise function.

The n() function performs a count of total number of flights or rows in the dataset. The

n_distinct() function captures the number of distinct carriers / airlines in the dataset which is 16.

Data set comes in many formats but R prefers just one format and that is tidy data. Tidyr package in R does this. For example if you look at below dataset of pollution:

Each variable is saved in its own column, each observation is saved in its own row and each “type” observation stored in a single table (here it is in “pollution” shown above). It automatically preserve observations.

Library(tidyr) can be used to load the required package in R if not installed already.


 The differences are the following:

%>% indicates – left hand side (LHS) to the right hand side (RHS) call

%<>% indicates – left hand side (LHS) to the right hand side (RHS) call. However, at the end update the LHS object with the resulting value.

The mutate() function in dplyr package in R is used to derive new variables from existing variables (not from existing observations). For existing observations, one needs to use summarise() function instead. Below is an example:

If we take a sample data from “nycflights13” dataset, and try to view top few records, it looks like as below.


Now, if we use the mutate() function to derive a new variable and use select() function to fetch selected columns from above data frame.

flights <- as.data.frame(flights)
flights_mutate <- flights %>% mutate(speed=distance/air_time*60) %>% select(carrier,arr_delay,speed)

This will give below desired result. (again, few records shown from the data frame). Here the new derived variable is “speed” which is computed and derived based on the formula [distance / air_time*60]

kable() function is used to explore entirety of a data frame. This is from the knitr() package in R. When we execute above two statements from R console, the kable() statement produces output which is much more legible. It is used in the R markdown where documentation can be clearer.

Below are snapshot of differences while executing from R console.

We need to use gather() function to reshape the dataset into tidy format in R so that desired / expected output can be achieved. Please see below.

The first parameter in gather()function takes the data frame name that needs to be reshaped, second parameter is the name of the new key column which is “year” here since we want to show number of cases by year, by country, third parameter is the name of new value column which is count here, fourth parameter is the names or numeric indexes of columns to collapse. There could be different ways to achieve, but important aspect to think about the approach and see how we can leverage powerful packages such as “tidyr” package in R to accomplish this.

Both code snippet will yield the same result output.

This is because we are arranging by country, year, sex and age in both cases.

The 4:6 and child:elderly portion will pick based on column indexes or column names. Post that reshaping by arrange() will provide in desired / expected organized fashion.


R is a programming language at your disposal which can be used for multiple purposes like statistical analysis, predictive modeling, data manipulation, data visualization, etc. It holds a high percentage of market share in the analytics industry. R is an open source programming language which is cross-platform compatible, that is it can run on several operating systems with varied Software/Hardware. Candidates proficient in R programming language are generally paid more than Python and SAS programmers.

According to indeed.com, the average salary of an R Programmer is $76,487 per year.  Big companies including Facebook, Google, Twitter use R programming language.

If you are determined to ace your next interview as an R programmer, these R interview questions and answers will fast-track your career. To relieve you of the worry and burden of preparation for your upcoming interviews, we have compiled the above list of interview questions for R programming with answers prepared by industry experts. Being well versed with these commonly asked R language interview questions will be your very first step towards a promising career as an R programmer.

Candidates can opt for various options after learning R programming. A few are listed below:

  • R Programmer
  • Data Scientist
  • Data Architect
  • Data Analyst

Candidates who wish to build a career as an R programmer can learn more about R programming from the best training available.

Crack your R interview with ease and confidence!

Read More