SAS Interview Questions

Want to ace your next SAS interview and land that open job you’ve been seeking? Then here are the best SAS interview questions & answers. Stay focused on these SAS basic interview questions. We have covered various topics like index, different ways to create a table in proc SQL, PAD option and how it is used, etc. It includes questions ranging from simple theoretical concepts to tricky questions. The SAS interview questions & answers here have been handpicked to represent ones that interviewers ask. Prepare with these SAS technical interview questions & answers and pursue your career.

  • 4.5 Rating
  • 30 Question(s)
  • 20 Mins of Read
  • 3325 Reader(s)

Intermediate

data a;

input name $ 1. marks;

cards;

a 100

b 200

c 300

;

run;

procmeansdata = a summean;

run;

Output

Here in this code we have A (Amount) = 100000 and I (interest) = 0.0925. We need to find Amount and updated interest till five years also the interest increases by 10% every year.  

Obs
A
i
Year
1
109250.00
0.10175
1
2
120366.19
0.11193
2
3
133838.17
0.12312
3
4
150315.99
0.13543
4
5
170673.18
0.14897
5

This code is only an illustration to create a table declares variables name, age, and dob with formats and taking values using datalines.

input name$ age dob date9.; this statement basically creating variables and giving format to dob as date9.

Obs
name
Age
Dob
1
am
67
Oct 13, 1981
2
pr
66
Apr 10, 1982

We can clear the libref through one of the below approaches:-

  1. Using Libname Statement:-

   We can use the below syntax to disassociate a specific library

Syntax : LIBNAME libref <CLEAR>

        Example : Libname temp_lib clear

   where temp_lib is the libref that you want to deassign      

   If, we would like to disassociate all the libref that we have declared in a SAS    Session, then use the below-modified version

 LIBNAME _ALL_ <CLEAR>

        Example: Libname _ALL_ clear

  1. Using the Libname Function : -

   We can use the LIBNAME function to deassign a library reference. In case, a warning or an error occurs then a message is written to the SAS log.

   %if (%sysfunc(libname(temp_lib))) %then

  %put %sysfunc(sysmsg());

   where temp_lib is the libref that you want to deassign

Note:- In both the above approaches libref is cleared if and only if there are no open files that are associated from that libref.

  1. Using SAS Explorer Window:-

   On the Explorer window go to libraries. Right click on the library that you want to design and select delete. Once you click on Delete you will receive a Delete confirmation with the text "Are you sure you want to remove the library reference named temp_lib?". Once you click Ok, the same is going to get deleted.

Output of the code

Obs
Pt
visit
Ae
111Headache
212Kill
313Pain
414Knee
521Tick
632Fle

SET and Merge are two very vital statements in SAS. Let us understand by taking 2 tables ‘one’ and ‘two’ and perform SET and Merge Operation on them.

Code -

 DATA one;
  INPUT id v1 v2;
  DATALINES;
  1 10 100
  2 15 150
  3 20 200
  ;
  PROC SORT Data=one;
   BY id;
 RUN;

Output

Obs
id
v1
v2
1
1
10
100
2
2
15
150
3
3
20
200



Code -

DATA two;

  INPUT id v3 v4;

  DATALINES;

  1 1000 10000

  2 1500 15000

  3 2000 20000

  4 800 30000

  ;

 PROC SORT Data=two;

  BY id;

 RUN;

Output

Obs
id
v3
v4
1
1
1000
10000
2
2
1500
15000
3
3
2000
20000
4480030000

The SET statement is the main tool to read observations from a SAS data set. The basics are so obvious you might not even consider them. Just SET and a data set name and there you go. Through SET statement the dataset gets appended while with merge the two datasets get merged on the basis of ID and extra variables are created if needed. Example below.

Code -

DATA TestSet;

  Set one two;

RUN;

Output

Obs
id
v1
v2
v3
v4
1110100..
2215150..
3320200..
41..100010000
52..150015000
63..200020000
74..80030000

Code -          

 DATA TestMerge;

  MERGE one two;

  BY id;

 PROC PRINT DATA= TestMerge; title "TestMerge";

 RUN;

Output

Obs
id
v1
v2
v3
v4
1110100100010000
2215150150015000
3320200200020000
44..80030000

data a;

input name $ 1. age;

cards;

a 12

b 15

c 18

;

run;

procprint; run;

data b;

input name $1. marks;

cards;

a 200

c 150

;

run;

procprint; run;

procsortdata = a;

by name;

run;

procsortdata = b;

by name;

run;

data final;

merge a(in=x) b(in=y);

by name;

if x and y thenoutput;

run;

proc print;

run;

When we submit the DATA step for execution. SAS checks the syntax of the submitted code and compiles them. During the compile phase, SAS created 3 items –

  • Input buffer – is a logic memory area into which SAS reads each record of raw data. This buffer is not created in a case wherein the operations are performed on an already existing dataset.
  • Program data vector (PDV) – a logical memory area wherein sas builds the dataset, one observation at a time. Additionally, SAS also creates 2 automatic variables - _N_ and _ERROR_.

Descriptor information – it includes data set attributes and variable information.

ERRORS = specifies the maximum number of observations for which SAS issues complete error messages.

Note: - If there are more errors then what is specified in the option then a warning message displays which lets the user know that the limit that was set has been reached.

So, ERROR = 1 will only issue an error

Example :

data a

input name $ 1. age;

cards;

a 20

b 23

procpint; run;

1st Error -

Before setting the options ERRORS = 1

Answer – the RUN statement is used to define the step boundary, it tells the compiler that the previous set of SAS statements are to be executed. Additionally, sometimes it also helps to make the code easy to read and interpret in the log.

DATA TEST;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.;

CARDS;

A 95 96 99 M

B 85 90 70 F

C 60 70 80 M

;

_numeric_ :- it is used to filter all the numeric type variables from the dataset.

_character_ :- it is used to filter and select all the character type variables from the dataset.

DATA A;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.;

CARDS;

A 95 96 99 M

B 85 90 70 F

C 60 70 80 M

;

RUN;

DATA NUM (KEEP=_NUMERIC_);

SET A;

RUN;

PROCPRINT;

RUN;

DATA CHAR (KEEP=_CHARACTER_);

SET A;

RUN;

PROCPRINT;

RUN;

Advanced

Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact, the following code will give the desired results. 

Data Clinical_Analysis;
Set mylib.clinical;
If '01JAN2018'D <= Data <= '31MAR2019'D ;
array diag(*) $diag1 - diag5;
do i=1 to 5;
if diag(i) in ('D45') then f_diag =1;
end;
run;
proc sort data = Clinical_Analysis; by IDs date; run;
Data diag_pat (keep = IDs YOB Date f_diag first_date diag age);
set Clinical_Analysis;
by Ids;
format first_date date9.;
retain diag first_date age;
if first.Ids then do;
diag=0;
first_date =date;
age =0;
end;
age = Year(first_date) - yob;
if f_diag =1 then diag=1;
if last.Ids;
run;
Proc freq data = diag_pat;
title "All required counts";
tables diag age/missing list;
run;

output: 

Dataset of the code – Table: diag_pat

OBS
IDs
YOB
Date
f_diag
first_date
diag
Age
1
100
2-Mar-86
8-Mar-18
1
2-Mar-18
1
32
4
101
5-Jan-80
18-Apr-18
18-Jan-18
138
9
102
4-Jan-85
15-Feb-19
118-Jan-17
134
14
103
16-Jan-90
17-Mar-18
115-Feb-18
128
18
104
25-Jan-92
18-Apr-18
127-Feb-18
126

Format for creating a simple index -

Data <dataset_name> (INDEX = (<variable-name></UNIQUE></NOMISS>))
Creating a single index in one go
Example : data a(index=(seq /unique /nomiss));
                               Set a;
  Run;
Creating multiple indexes in one go
Example: data a(index =(seq /unique /nomiss
                                                      State /unique
                                                       Year /unique /nomiss)
                              Set a;
 Run;
Format for creating a composite index-
Data <dataset_name> (INDEX = (index-name = <var1 var2 etc></UNIQUE></NOMISS>))
              Creating a composite index –
Example – data a(index=(quarter_month = (quarter month) /nomiss)
                               Set a;
                               Run;

Note: We can also use the proc’s to create the indexes in the newly generated output datasets.

Additionally, we could also use the proc datasets procedure and proc SQL to generate indexes.

Information From Dataset: A

Information From Dataset: B

The output of the UNION Operator

UNION Operator is used to club the specific variable information from both the tables. Apart from updating the information, it also removes the duplicate data and sorts the combined information.

The output of UNIONALL Operator

UNIONALL Operator works by appending the information from dataset B after the information from dataset A. It doesn’t remove the duplicates or sorts the resultant data.

Note: If the variables are existing in both the tables, but the order of the same is different. SAS throws the below error –

ERROR: Column 1 from the first contributor of UNION is not the same type as its counterpart from the second.

ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.

There are multiple ways for creating table in proc sql

I. Create Table From Column Definition

We can create a new table without rows by using the CREATE TABLE statement to define the columns and attributes. We can specify a column’s name, type, length, informat, format and label. 

procsql;

createtable employee(

ssn char(9),

dob numformat = DATE9.informat = DATE9.label = "Date Of Birth",

doh numformat = DATE9.informat = DATE9.label = "Date Of Hire",

gender char(1) label = "Gender",

estat char(10) label = "Employement Status");

insertinto employee values ("111111111",500,25000,"M","ACTIVE");

quit;

procsql;

select * from employee;

quit;

II. Create Table from a Query Result

Many times, there are situations wherein we already have a structure of the table/view and we are trying to create a mirror table that is referring to columns of the earlier existing table. In such cases, we can use the same CREATE TABLE Statement and place it before the SELECT Statement. An additional benefit of using this approach of creating the table is that the data of the new table is derived on the basis of existing data from the parent table.

procsql;

createtable employee1 as

select * from employee

quit;

procsql;

select * from employee1;

quit;

III. Create Table Like an Existing Table

To create an empty table that has the same columns and attributes as an existing table, we use the LIKE clause while creating the table. One point to note is that an empty table is created using this method and no data is populated inside the table.

procsql;

createtable employee3 like employee;

quit;

Connecting to teradata using pass through

Procsql;
Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass”
tdpid =rchgrp2 MODE= ‘teardata’);
Create table work. Account_details as(Select
Account_ Number, Enroll_ Date, Sales, City
From Master_ acct_ details
Where Account_ Number is not null and Sales gt 40000
);
Disconnect from Teradata;
Quit;
Connecting to db2 using pass through:
Procsql;
Connect to db2(  DATABASE=CRD_ PROD USER= “ &user_id.”
PASSWORD=” &dbpass”);
Create table work. Account_details as(Select
Account_ Number, Enroll_ Date, Sales, City
From Master_ acct_ details
Where Account_ Number is not null and Sales gt 40000
);
Disconnect from db2;
Quit;

Other way using the library assignments:

Libname AM DB2 DATABASE= CARD_USA Schema=DDKAUTO

USER=”&user_ id.” PASSWORD= “&dbpass”;

Procsql;

Create table work. Account_details as(Select

Account_ Number, Enroll_ Date, Sales, City

From AM.Master_ acct_ details

Where Account_ Number is not null and Sales gt 40000);

Data test;

input id sal;

cards;

1 300

1 200

3 900

2 300

2 400

3 700

2 100

3 800

;

run;

proc sort data =test out=test1;

by id sal;

run;

proc sql;

create table test2(drop=number) as

select *, monotonic() as number from test1

group by id

having number=max(number);

quit;

data test;

input name$ weight;

cards;

a 25

b 30

c 35

d 40

;

run;

proc sql;

select name, weight

from test

union all

select 'Total', sum(weight)

from test;

quit;

The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile

statement;

  example

data Test;
infile 'C:\Users\Anshu\Desktop\test.txt' PAD;
input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.;
run;
proc print data=Test;
run;

Macro is way of automating in sas programs. There are many ways to create macro. 

%macro get_var_dist (data_nm , categ_cut_off);
/* Taking list of variables in the dataset along with type*/
proc contents data=&data_nm. out=_content_;
run;
/* Note - how data_nm is getting resolved. */

/* Fiding number of variables to define loop */
proc sql noprint;
select count(*) into: nvars from _content_;
quit;
/* Note this is another way of defining macro variables */
%put "No of Variables is  " &nvars. ;
/* Note this can be a useful way to detect the value of macro variables.*/
%do i=1 %to &nvars. ;
data _null_;
set _content_;
if _n_ = &i. then do;
       call symput("variable", name);
       call symput("var_type",type);
end;
run;
/* Note -
  1: One more way of defining variables.
  Please note here information is being passed from general SAS dataset
  to SAS macros using call symput
  2: Please see how to run do a loop
*/
%put "Variable is  " &variable. ;
%put "Variable Type is  " &var_type. ;
proc sql noprint;
       select count(distinct &variable) into: distinct_categories
       from &data_nm.;
       quit;
%put "Number of distinct categories is  &distinct_categories." ;
/* Note the syntax of writing if else condition */
/* %do ....  %end; defines the block */
/* Missing command below includes missing value as a category */
/* Also note, if you don't use %eval it can throw errors */
/* if comparison involves floating numbers, you will need to use %sasevalf */
%if %eval(&distinct_categories.) <=  %eval(&categ_cut_off.) %then %do;
Title "Freq Dist for Variable &variable. as number of distinct
category is &distinct_categories.";  
proc freq data=&data_nm. ;
table &variable. / missing ;
run;
%end;
%if %eval(&var_type.) =  1 %then %do;
Title "Univariate Dist for Variable = &variable. ";  
proc univariate data=&data_nm. plot;
var &variable.  ;
run;
%end;

%end;
%mend get_var_dist;
options symbolgen mlogic mprint;
%get_var_dist(data_nm=sashelp.heart, categ_cut_off=25);
%get_var_dist(data_nm=sashelp.asscmgr, categ_cut_off=10);

The SAS processing is in two steps

  1. Compilation phase  
  2. Execution phase
  •  Compilation phase

A. If there is a raw data to be read, there is a creation of input buffer a logical concept for holding the data, the input buffer is not created while reading a dataset.

B. After the input buffer, the pdv is created, it is the area of memory where SAS builds its dataset.

Creating two automatic variables 1. _n_ (takes the value of a number of iterations) and 2. _error_ (takes the value 1 or 0, 1 if errors and 0 if no errors)

C. The SAS then do the syntax checking for the data steps, that includes misspelled keywords, invalid varnames, missing punctuation, and invalid option.

D. Data set variables are created as the input statement is encountered and their length and type are determined.

E. The descriptor portion is now written and this is the final step in the compilation phase, the run statement is the signal for the end of compilation phase. At this point, the data set contain the variables mentioned in the input statement, but the values would only be written in the execution phase.

  •  Execution phase

A. The automatic variables are initialized _n_ takes the value 1, _error_ to 0 and the remaining variables to missing (.)

B. The input pointer now reads the data values and assigns the variable values to the pdv.

C. At the end of the data step the data values from the pdv are written to the dataset and then _n_ takes the value as 2, _error_ 0 and all rest as missing(.).

The use of double trailing is that, if you have datalines in one line then you can use @@;

data test1;
input name $ age marks @@;
datalines;
am 12 95 ra 13 67 pa 14 98 sa 14 87
;
run;
proc print data=test1;
run;

Output 

ObsnameAgeMarks
1Am1295
2Ra1367
3Pa1498
4Sa1487

The use of single @ is in the holding the line, it basically is used in more than one input statement, the pointer basically hold the line after reading the values, it basically holds the value and checks the condition;

data test2;
input @1 gender $1. @;
if gender ne 'F' then delete;
input @3 age @5 marks;
datalines;
F 13 56
M 12 78
F 13 78
M 56 90
;
run;
proc print data=test2;
run;

Output

Obs
Gender
age
Marks
1
F
13
56
2
F
13
78

data Test;

input @1name $7. @9 sex $1. @10 marks;

cards;

Anshukr M123

MohitaS F32

;

run;

proc print data=test;

run;

Answer

Obs
name
sex
marks
1
Anshukr
M
123
2
MohitaS
F
32

The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile

statement;

example

data Test;
infile 'C:\Users\Anshu\Desktop\test.txt' PAD;
input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.;
run;
proc print data=Test;
run;

RETAIN statement causes a variable that is defined during the input statement to be retained across data set iterations, in other words, retain statement retains the variable value at each iteration and keeps on substituting it, the pdv does not get cleared when you use the retain statement, let’s try to understand the basic example explained below –

Table - mylib.clinical

OBSIDsYOBDatediag1daig2daig3daig4daig5
11002-Mar-862-Mar-18D4500D450
21002-Mar-865-Mar-180D4500D45
31002-Mar-868-Mar-180C50
C50D45
41015-Jan-808-Jan-1800D45D450
51015-Jan-802-Feb-18D4500D450
61015-Jan-8027-Feb-180D4500D45
71015-Jan-8024-Mar-18D450D4500
81015-Jan-8018-Apr-18D4500D450

Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact the following code will give the desired results.

Data Clinical_Analysis;
Set mylib.clinical;
If '01JAN2018'D <= Date <= '31MAR2019'D ;
array diag(*) $diag1 - diag5;
do i=1 to 5;
if diag(i) in ('D45') then f_diag =1;
end;
run;
proc sort data = Clinical_Analysis; by IDs date; run;
Data diag_pat (keep = IDs YOB Date f_diag first_date diag age);
set Clinical_Analysis;
by Ids;
format first_date date9.;

retain diag first_date age; /*retaining diag first_date and age at every iteration */

if first. Ids then do;
diag=0;
first_date =date;
age =0;
end;
age = Year(first_date) - yob;
if f_diag =1 then diag=1;
if last.Ids;
run;
Proc freq data = diag_pat;
title "All required counts";
tables diag age/missing list;
run;

output: 

Dataset of the code – Table: diag_pat

OBS
IDs
YOB
Date
f_diag
first_date
diag
Age
1
100
2-Mar-86
8-Mar-18
1
2-Mar-18
1
32
4
101
5-Jan-80
18-Apr-18
1
8-Jan-18
1
38

The mean() function is used to calculate the mean of one observation across the variable. It is used to generate a horizontal data summary.

DATA A;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3;

CARDS;

A 95 96 99

B 85 90 70

C 60 70 80

;

RUN;

DATA B;

SET A;

MEAN = MEAN(SUBJ1,SUBJ2,SUBJ3);

RUN;

PROCPRINT;

RUN;

Proc Means is used to calculate the mean of a variable across all observations.

It is used for generating the vertical data summary (i.e. mode of operation is column-wise).

By default, it calculates N, Mean, Std Dev, Minimum, Maximum statistics.

Description

Want to ace your next SAS interview and land that open job you’ve been seeking? Then here are the best SAS interview questions & answers. Stay focused on these SAS basic interview questions. We have covered various topics like index, different ways to create a table in proc SQL, PAD option and how it is used, etc. It includes questions ranging from simple theoretical concepts to tricky questions. The SAS interview questions & answers here have been handpicked to represent ones that interviewers ask. Prepare with these SAS technical interview questions & answers and pursue your career.
Levels