SAS Interview Questions

Want to ace your next SAS interview and land that open job you’ve been seeking? Then here are the best SAS interview questions & answers. Stay focused on these SAS basic interview questions. We have covered various topics like index, different ways to create a table in proc SQL, PAD option and how it is used, etc. It includes questions ranging from simple theoretical concepts to tricky questions. The SAS interview questions & answers here have been handpicked to represent ones that interviewers ask. Prepare with these SAS technical interview questions & answers and pursue your career.

  • 4.5 Rating
  • 60 Question(s)
  • 30 Mins of Read
  • 3325 Reader(s)

Intermediate

data a;

input name $ 1. marks;

cards;

a 100

b 200

c 300

;

run;

procmeansdata = a summean;

run;

Output

Here in this code we have A (Amount) = 100000 and I (interest) = 0.0925. We need to find Amount and updated interest till five years also the interest increases by 10% every year.  

Obs
A
i
Year
1
109250.00
0.10175
1
2
120366.19
0.11193
2
3
133838.17
0.12312
3
4
150315.99
0.13543
4
5
170673.18
0.14897
5

This code is only an illustration to create a table declares variables name, age, and dob with formats and taking values using datalines.

input name$ age dob date9.; this statement basically creating variables and giving format to dob as date9.

Obs
name
Age
Dob
1
am
67
Oct 13, 1981
2
pr
66
Apr 10, 1982

We can clear the libref through one of the below approaches:-

  1. Using Libname Statement:-

   We can use the below syntax to disassociate a specific library

Syntax : LIBNAME libref <CLEAR>

        Example : Libname temp_lib clear

   where temp_lib is the libref that you want to deassign      

   If, we would like to disassociate all the libref that we have declared in a SAS    Session, then use the below-modified version

 LIBNAME _ALL_ <CLEAR>

        Example: Libname _ALL_ clear

  1. Using the Libname Function : -

   We can use the LIBNAME function to deassign a library reference. In case, a warning or an error occurs then a message is written to the SAS log.

   %if (%sysfunc(libname(temp_lib))) %then

  %put %sysfunc(sysmsg());

   where temp_lib is the libref that you want to deassign

Note:- In both the above approaches libref is cleared if and only if there are no open files that are associated from that libref.

  1. Using SAS Explorer Window:-

   On the Explorer window go to libraries. Right click on the library that you want to design and select delete. Once you click on Delete you will receive a Delete confirmation with the text "Are you sure you want to remove the library reference named temp_lib?". Once you click Ok, the same is going to get deleted.

Output of the code

Obs
Pt
visit
Ae
111Headache
212Kill
313Pain
414Knee
521Tick
632Fle

SET and Merge are two very vital statements in SAS. Let us understand by taking 2 tables ‘one’ and ‘two’ and perform SET and Merge Operation on them.

Code -

 DATA one;
  INPUT id v1 v2;
  DATALINES;
  1 10 100
  2 15 150
  3 20 200
  ;
  PROC SORT Data=one;
   BY id;
 RUN;

Output

Obs
id
v1
v2
1
1
10
100
2
2
15
150
3
3
20
200



Code -

DATA two;

  INPUT id v3 v4;

  DATALINES;

  1 1000 10000

  2 1500 15000

  3 2000 20000

  4 800 30000

  ;

 PROC SORT Data=two;

  BY id;

 RUN;

Output

Obs
id
v3
v4
1
1
1000
10000
2
2
1500
15000
3
3
2000
20000
4480030000

The SET statement is the main tool to read observations from a SAS data set. The basics are so obvious you might not even consider them. Just SET and a data set name and there you go. Through SET statement the dataset gets appended while with merge the two datasets get merged on the basis of ID and extra variables are created if needed. Example below.

Code -

DATA TestSet;

  Set one two;

RUN;

Output

Obs
id
v1
v2
v3
v4
1110100..
2215150..
3320200..
41..100010000
52..150015000
63..200020000
74..80030000

Code -          

 DATA TestMerge;

  MERGE one two;

  BY id;

 PROC PRINT DATA= TestMerge; title "TestMerge";

 RUN;

Output

Obs
id
v1
v2
v3
v4
1110100100010000
2215150150015000
3320200200020000
44..80030000

data a;

input name $ 1. age;

cards;

a 12

b 15

c 18

;

run;

procprint; run;

data b;

input name $1. marks;

cards;

a 200

c 150

;

run;

procprint; run;

procsortdata = a;

by name;

run;

procsortdata = b;

by name;

run;

data final;

merge a(in=x) b(in=y);

by name;

if x and y thenoutput;

run;

proc print;

run;

When we submit the DATA step for execution. SAS checks the syntax of the submitted code and compiles them. During the compile phase, SAS created 3 items –

  • Input buffer – is a logic memory area into which SAS reads each record of raw data. This buffer is not created in a case wherein the operations are performed on an already existing dataset.
  • Program data vector (PDV) – a logical memory area wherein sas builds the dataset, one observation at a time. Additionally, SAS also creates 2 automatic variables - _N_ and _ERROR_.

Descriptor information – it includes data set attributes and variable information.

ERRORS = specifies the maximum number of observations for which SAS issues complete error messages.

Note: - If there are more errors then what is specified in the option then a warning message displays which lets the user know that the limit that was set has been reached.

So, ERROR = 1 will only issue an error

Example :

data a

input name $ 1. age;

cards;

a 20

b 23

procpint; run;

1st Error -

Before setting the options ERRORS = 1

Answer – the RUN statement is used to define the step boundary, it tells the compiler that the previous set of SAS statements are to be executed. Additionally, sometimes it also helps to make the code easy to read and interpret in the log.

DATA TEST;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.;

CARDS;

A 95 96 99 M

B 85 90 70 F

C 60 70 80 M

;

_numeric_ :- it is used to filter all the numeric type variables from the dataset.

_character_ :- it is used to filter and select all the character type variables from the dataset.

DATA A;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.;

CARDS;

A 95 96 99 M

B 85 90 70 F

C 60 70 80 M

;

RUN;

DATA NUM (KEEP=_NUMERIC_);

SET A;

RUN;

PROCPRINT;

RUN;

DATA CHAR (KEEP=_CHARACTER_);

SET A;

RUN;

PROCPRINT;

RUN;

Generally, macro variable references shown so far have been direct macro references that begin with one ampersand: &name. However, indirect reference is also possible where macro variables name that belongs to a series is determined when the macro variable reference resolves. The macro facility provides indirect macro variable referencing, which allows you to use an expression (for example, DRUG&N) to generate a reference to one of a series of macro variables. For example, you could use the value of macro variable N to reference a variable in the series of macro variables named DRUG1 to DRUG10. If N has the value 8, the reference would be to DRUG8. If the value of N is 3, the reference would be to DRUG3.

Although for this example the type of reference you want is DRUG&N, the following example will not produce the results that you expect, which is the value of &N appended to DRUG:

%put &drug&n;  /* incorrect */

This produces a warning message saying that there is no macro variable DRUG because the macro facility has tried to resolve &DRUG and then &N and concatenate those values.

To force the macro processor to rescan a macro variable reference, you use more than one ampersand in the macro variable reference. When the macro processor encounters multiple ampersands, its basic action is to resolve two ampersands to one ampersand. For example, to append the value of &N to DRUG and then reference the appropriate variable name, you use:

%put &&drug&n;  /* correct */

Assuming that &N contains 6 when the macro processor receives this statement, it performs the following steps:

  • resolves && to &
  • passes DRUG as text
  • resolves &N into 6
  • returns to the beginning of the macro variable reference, &DRUG6, starts resolving from the beginning again, and prints the value of DRUG6.

Thus using multiple ampersands (&) for reference to a macro variable is called indirect referencing and few characteristics of indirect referencing are -

  • Ampersands (&), are resolved from left to right.
  • Ampersands (&), are resolved in pair (groups of 2), i.e &&  &
  • Number of ampersands required to resolve an indirect referencing on a macro variable is 2^n-1

Line hold specifiers are basically ‘@’ and ‘@@’ where ‘@@’ is used when we have multiple observations per line of raw data. It is used at the end of the INPUT statement. It tells SAS to hold that line of data and continue to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double trailing @. When we use @ only without specifying n, SAS will hold that line of data until it reaches either the end of the DATA step or an INPUT statement that does not end with a trailing @. 
Example – 

One has a lab testing data file. The ASCII file layout looks like as follows, 

      test1 11 2

      test2 1 2 5

test3 2

Where the first field is the test subject and the rest of the fields are repeated testing results for the subject.

One wants the final file has following format,

test_id count result

 test1 1 11

 test1 2 2

 test2 1 1

 test2 2 2

 test2 3 5

 test3 1 2

The difficult part of this example is that the numbers of tests vary from subjects to subjects. However, if one reads each line with a trailing @, one needs not to know the numbers of tests. The logic is straightforward. First, one reads the first field, and then reads in the rest of the fields and outputs it. Second, one needs to check when a line ends. If it ends, then one begins to read the next line.

 data test1;
 length test_id $8 cnt result 8;
 infile cards eof=eof ;
 input @1 test_id $5. @ ; 
 do while (1);
 cnt+1;
 input result ?? @; 
 if result = '.' then do; 
 cnt=0;
 input @@; 
 leave;
 end;
 else output;
 end;
 return;                           
eof: stop;
cards;
test1 11 2
test2 1 2 5
test3 2
;

“input @1 test_id $5. @ ;”  or “input result ?? @;” hold line position to prevent SAS from going to a new line.

“if result = '.' then do;”   tests if it is the beginning of a new line.

“input @@;” holds the current line position.

It is much simpler if one use double trailing @@ and

a retain statement in this example.

data test1(drop=dummy);

 length test_id dummy $8;

 retain test_id;

 input dummy @@ ;

 if input(dummy, ?? best.) ne . then do;

result = input(dummy, 8.); 

                                                              output; end;

                                                                else test_id=dummy; 

                                                             cards;

test1 11 2 

test2 1 2 5 

test3 2 ;

The Program Data Vector is a logical area of memory that is created during the data step Processing.

- PDV is the brain of SAS.

- The program data vector contains two types of variables.

  • Permanent (data set and computed variables)
  • Temporary (automatic and option defined) Automatic (_N_ and _ERROR_)

- Suppose we have a data set

data a;
input id name$;
cards;
11 A
12 B
;
run;

- Step 1:- Compilation

- Step 1.1:- syntax checking of the code, syntactical errors is compilation errors.

- Step 1.2:- PDV gets created. (Logical concepts where SAS builds data).

- The sas statements are checked and then compiled when we submit the data step that means the sas codes getting converted into machine code.

- SAS identifies the type and length of each new variable and determines whether a variable type conversion is necessary for each subsequent reference to a variable in this phase.

-In the compilation phase, SAS creates three items – 

  • Input buffer 

 -Logical area where sas : 

  • Reads each record of data in the input statement.
  • This is created only when Data steps read raw data else the data is directly read into PDV.
  • PDV (program Data Vector)

-Logical area in memory where sas :

  • Build a Dataset by one observation at a time.
  • The Data values are assigned to the appropriate variables in the program data vector.
  • Two automatic variables named _N_ and _ERROR_ created. 
  • The _N_ Variable is counter variable it counts the number of time the data steps iterate.
  • The occurrences of error are getting recorded in _ERROR_ Variable.
  • The _N_ and _ERROR_ Variable are not written in an output
  • Descriptor Information

-Descriptor part of sas does most vital things as follows :

  • Creates and maintain each dataset.
  • All the variable dataset, variables name type length and all other attributes are recorded into it.

The name of the excel sheet containing the drugs code is “all_other_drugs.xlsx” 

Proc import OUT= drugs_code_ndc(keep=NDC)   /*this line keep NDC code only from the excel sheet and creating a dataset named “drugs_code_ndc”. */

Datafile= "%sysfunc(pathname(saslib))/all_other_drugs.xlsx" /*this line importing the excel file from saslib and the file name is “all_other_drugs.xlsx” */

            DBMS=xlsx REPLACE;                                                                                                   

     SHEET="sheet1";                                                                                                              

     GETNAMES=YES;                                                                                                               

Run; 

proc sql noprint;

select distinct compress("'"||NDC||"'") into : DrugNDC_Excl_Code separated by "," from drugs_code_ndc;

quit;

%put &DrugNDC_Excl_Code;   /* Note this can be be useful way to detect the value of variables.*/

Here we can use like “if drug_code in(&DrugNDC_Excl_Code) then drug =1; this will flag all the patients who have the drugs in the list. 

ods rtf file="%sysfunc(pathname(saslib))/pat_details_data.rtf";

Proc freq data = Pat_details;

Table all Linaclotide  Lubiprostone Plecanatide/missing list;

Run;

Ods rtf close;

This will export the frequency report in .rtf format at the location “saslib” at path mention above. 

Data Allclm_MM;

set mm_diag_data;

if '01JAN2010'D <= date_svc <= '31DEC2018'D ;  

array diag(*) $diag1 – diag12;             

do i=1 to 12;

  if (icd_ver = "9" and diag(i) in :( "2030" )) or (icd_ver = "10" and diag(i) in :( "C900" )) then f_MM_diag = 1;

end;

if f_MM_diag = 1;

run;

We can count this by using FREQ procedure that prints all values of a given categorical variable in the Output window, along with the number and percentage of times each value appears. The FREQ procedure can work with both string (character) and numeric categorical variables. In the case mentioned above, we can code like –

Proc freq data = Pat_details;

Table all Linaclotide Lubiprostone Plecanatide/missing list;

Run;

This code will give the counts of the variable mentioned in the table statement and results will look like the one mentioned below -

Output – 

                                Cumulative Cumulative

all    Frequency     Percent Frequency      Percent 

-----------------------------------------------------------------------

  1       10 100.00         10 100.00  

                                Cumulative Cumulative

Linaclotide  Frequency Percent     Frequency Percent 

-----------------------------------------------------------------------

  1       7 70.00         10 100.00  

                                Cumulative Cumulative

Lubiprostone Frequency     Percent Frequency Percent 

-----------------------------------------------------------------------

  1      2 20.00         10 100.00  

                                Cumulative Cumulative

Plecanatide Frequency     Percent Frequency Percent 

-----------------------------------------------------------------------

  1       1 10.00        10 100.00  

Proc sort data = Pat_details; by pat_id; run;

Data all_drug Linaclotide Lubiprostone Plecanatide;

Set  Pat_details;

by pat_id;

If all = 1 then output all_drug;

If Linaclotide =1 then output Linaclotide;

If  Lubiprostone =1 then output Lubiprostone;

If  Plecanatide = 1 then output Plecanatide;

Run;

The code above will create three different datasets with name all_drug Linaclotide Lubiprostone Plecanatide respectively in work library which has only respective drugs patients.

proc transpose data =  All_first_drug out = t_ All_first_drug (where =(date ne .) rename=(_name_= drug_name col1=date));

by ps_indv_id; run;

data t_ All_first_drug;

set t_ All_first_drug;

x=put(date,monyy7.);

run;

proc sql;

 create table final_data as select x, count(distinct ps_indv_id) from (

 select distinct x, ps_indv_id, count(ps_indv_id) as cnt 

from t_ All_first_drug group by x, ps_indv_id having count(ps_indv_id)>1)

group by x order by x;

quit;

result – 

Obs       x _TEMG003

  1    JAN2018        2   

  2    FEB2018        1   

  3    MAR2018       3

Macro in sas is the way of automation and it is a very powerful and flexible way to do automation.

Macro variables can be created using a number of different techniques and statements. However, the five most common methods are not only the most useful but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself.

These methods include:

  • %LET statement
  • macro parameters (named and positional)
  • iterative %DO statement
  • using the INTO in PROC SQL

using the CALL SYMPUTX routine 

The substr function extracts a text from the character variable, it could also be used to replace the values, when it is on the left side it assigns the new values, suppose you need to change the first two values from 91 to 00, you can assign by putting it to left side;

data a; x='(91) 9999265789'; substr(x,2,2)='00'; run;

proc print data=a; run;       

output –

Obs        x

1 (00) 9999265789 

SCAN function - The scan function searches for a particular string and puts the value in the target variable, the target variable length using the scan function is 200 chars, here value is of 200 char variable now, the delimiter by default is blank.

It also takes 3 arguments first being string second the nth word and third the delimiter (by default blank), it divides a string into the chunks divided by delimiters;

data a; 

x='Champ are always best people’; 

value=scan(x,5); run;

proc print data=a; 

run;

output – 

Obs      value

1  'Champs are always best people     people

This is very important thing, consider the below example, the data is not consistent and you wanted to pick the last name;

 data x;
input @1 name $5. @7 marks;
 datalines;
 a b c 20 
k     d 15 
a       25 
n a b 20 
k   a 30 ;
 run;
data master; 
set x; lname=scan(name,-1); 
run; 
proc print data=master; run;
/********email id example**********/
data email; 
length id $25; 
input id$; 
datalines;
 a,b@n.com 
b@tt.com a@b@cc.com ;
 run;
data a; 
set email; 
domain_name=scan(id,-1,'@'); run;
proc print data=a; run;

Output – 

Obs   name       marks   lname

1       a b c         20     c

2       k d         15     d

3      a           25   a

4      n a b         20     b

5      k a           30   a

Obs                id domain_name

1              a,b@n.com       n.com

2             b@tt.com         tt.com

3            a@b@cc.com     cc.com

Advanced

Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact, the following code will give the desired results. 

Data Clinical_Analysis;
Set mylib.clinical;
If '01JAN2018'D <= Data <= '31MAR2019'D ;
array diag(*) $diag1 - diag5;
do i=1 to 5;
if diag(i) in ('D45') then f_diag =1;
end;
run;
proc sort data = Clinical_Analysis; by IDs date; run;
Data diag_pat (keep = IDs YOB Date f_diag first_date diag age);
set Clinical_Analysis;
by Ids;
format first_date date9.;
retain diag first_date age;
if first.Ids then do;
diag=0;
first_date =date;
age =0;
end;
age = Year(first_date) - yob;
if f_diag =1 then diag=1;
if last.Ids;
run;
Proc freq data = diag_pat;
title "All required counts";
tables diag age/missing list;
run;

output: 

Dataset of the code – Table: diag_pat

OBS
IDs
YOB
Date
f_diag
first_date
diag
Age
1
100
2-Mar-86
8-Mar-18
1
2-Mar-18
1
32
4
101
5-Jan-80
18-Apr-18
18-Jan-18
138
9
102
4-Jan-85
15-Feb-19
118-Jan-17
134
14
103
16-Jan-90
17-Mar-18
115-Feb-18
128
18
104
25-Jan-92
18-Apr-18
127-Feb-18
126

Format for creating a simple index -

Data <dataset_name> (INDEX = (<variable-name></UNIQUE></NOMISS>))
Creating a single index in one go
Example : data a(index=(seq /unique /nomiss));
                               Set a;
  Run;
Creating multiple indexes in one go
Example: data a(index =(seq /unique /nomiss
                                                      State /unique
                                                       Year /unique /nomiss)
                              Set a;
 Run;
Format for creating a composite index-
Data <dataset_name> (INDEX = (index-name = <var1 var2 etc></UNIQUE></NOMISS>))
              Creating a composite index –
Example – data a(index=(quarter_month = (quarter month) /nomiss)
                               Set a;
                               Run;

Note: We can also use the proc’s to create the indexes in the newly generated output datasets.

Additionally, we could also use the proc datasets procedure and proc SQL to generate indexes.

Information From Dataset: A

Information From Dataset: B

The output of the UNION Operator

UNION Operator is used to club the specific variable information from both the tables. Apart from updating the information, it also removes the duplicate data and sorts the combined information.

The output of UNIONALL Operator

UNIONALL Operator works by appending the information from dataset B after the information from dataset A. It doesn’t remove the duplicates or sorts the resultant data.

Note: If the variables are existing in both the tables, but the order of the same is different. SAS throws the below error –

ERROR: Column 1 from the first contributor of UNION is not the same type as its counterpart from the second.

ERROR: Column 2 from the first contributor of UNION is not the same type as its counterpart from the second.

There are multiple ways for creating table in proc sql

I. Create Table From Column Definition

We can create a new table without rows by using the CREATE TABLE statement to define the columns and attributes. We can specify a column’s name, type, length, informat, format and label. 

procsql;

createtable employee(

ssn char(9),

dob numformat = DATE9.informat = DATE9.label = "Date Of Birth",

doh numformat = DATE9.informat = DATE9.label = "Date Of Hire",

gender char(1) label = "Gender",

estat char(10) label = "Employement Status");

insertinto employee values ("111111111",500,25000,"M","ACTIVE");

quit;

procsql;

select * from employee;

quit;

II. Create Table from a Query Result

Many times, there are situations wherein we already have a structure of the table/view and we are trying to create a mirror table that is referring to columns of the earlier existing table. In such cases, we can use the same CREATE TABLE Statement and place it before the SELECT Statement. An additional benefit of using this approach of creating the table is that the data of the new table is derived on the basis of existing data from the parent table.

procsql;

createtable employee1 as

select * from employee

quit;

procsql;

select * from employee1;

quit;

III. Create Table Like an Existing Table

To create an empty table that has the same columns and attributes as an existing table, we use the LIKE clause while creating the table. One point to note is that an empty table is created using this method and no data is populated inside the table.

procsql;

createtable employee3 like employee;

quit;

Connecting to teradata using pass through

Procsql;
Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass”
tdpid =rchgrp2 MODE= ‘teardata’);
Create table work. Account_details as(Select
Account_ Number, Enroll_ Date, Sales, City
From Master_ acct_ details
Where Account_ Number is not null and Sales gt 40000
);
Disconnect from Teradata;
Quit;
Connecting to db2 using pass through:
Procsql;
Connect to db2(  DATABASE=CRD_ PROD USER= “ &user_id.”
PASSWORD=” &dbpass”);
Create table work. Account_details as(Select
Account_ Number, Enroll_ Date, Sales, City
From Master_ acct_ details
Where Account_ Number is not null and Sales gt 40000
);
Disconnect from db2;
Quit;

Other way using the library assignments:

Libname AM DB2 DATABASE= CARD_USA Schema=DDKAUTO

USER=”&user_ id.” PASSWORD= “&dbpass”;

Procsql;

Create table work. Account_details as(Select

Account_ Number, Enroll_ Date, Sales, City

From AM.Master_ acct_ details

Where Account_ Number is not null and Sales gt 40000);

Data test;

input id sal;

cards;

1 300

1 200

3 900

2 300

2 400

3 700

2 100

3 800

;

run;

proc sort data =test out=test1;

by id sal;

run;

proc sql;

create table test2(drop=number) as

select *, monotonic() as number from test1

group by id

having number=max(number);

quit;

data test;

input name$ weight;

cards;

a 25

b 30

c 35

d 40

;

run;

proc sql;

select name, weight

from test

union all

select 'Total', sum(weight)

from test;

quit;

The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile

statement;

  example

data Test;
infile 'C:\Users\Anshu\Desktop\test.txt' PAD;
input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.;
run;
proc print data=Test;
run;

Macro is way of automating in sas programs. There are many ways to create macro. 

%macro get_var_dist (data_nm , categ_cut_off);
/* Taking list of variables in the dataset along with type*/
proc contents data=&data_nm. out=_content_;
run;
/* Note - how data_nm is getting resolved. */

/* Fiding number of variables to define loop */
proc sql noprint;
select count(*) into: nvars from _content_;
quit;
/* Note this is another way of defining macro variables */
%put "No of Variables is  " &nvars. ;
/* Note this can be a useful way to detect the value of macro variables.*/
%do i=1 %to &nvars. ;
data _null_;
set _content_;
if _n_ = &i. then do;
       call symput("variable", name);
       call symput("var_type",type);
end;
run;
/* Note -
  1: One more way of defining variables.
  Please note here information is being passed from general SAS dataset
  to SAS macros using call symput
  2: Please see how to run do a loop
*/
%put "Variable is  " &variable. ;
%put "Variable Type is  " &var_type. ;
proc sql noprint;
       select count(distinct &variable) into: distinct_categories
       from &data_nm.;
       quit;
%put "Number of distinct categories is  &distinct_categories." ;
/* Note the syntax of writing if else condition */
/* %do ....  %end; defines the block */
/* Missing command below includes missing value as a category */
/* Also note, if you don't use %eval it can throw errors */
/* if comparison involves floating numbers, you will need to use %sasevalf */
%if %eval(&distinct_categories.) <=  %eval(&categ_cut_off.) %then %do;
Title "Freq Dist for Variable &variable. as number of distinct
category is &distinct_categories.";  
proc freq data=&data_nm. ;
table &variable. / missing ;
run;
%end;
%if %eval(&var_type.) =  1 %then %do;
Title "Univariate Dist for Variable = &variable. ";  
proc univariate data=&data_nm. plot;
var &variable.  ;
run;
%end;

%end;
%mend get_var_dist;
options symbolgen mlogic mprint;
%get_var_dist(data_nm=sashelp.heart, categ_cut_off=25);
%get_var_dist(data_nm=sashelp.asscmgr, categ_cut_off=10);

The SAS processing is in two steps

  1. Compilation phase  
  2. Execution phase
  •  Compilation phase

A. If there is a raw data to be read, there is a creation of input buffer a logical concept for holding the data, the input buffer is not created while reading a dataset.

B. After the input buffer, the pdv is created, it is the area of memory where SAS builds its dataset.

Creating two automatic variables 1. _n_ (takes the value of a number of iterations) and 2. _error_ (takes the value 1 or 0, 1 if errors and 0 if no errors)

C. The SAS then do the syntax checking for the data steps, that includes misspelled keywords, invalid varnames, missing punctuation, and invalid option.

D. Data set variables are created as the input statement is encountered and their length and type are determined.

E. The descriptor portion is now written and this is the final step in the compilation phase, the run statement is the signal for the end of compilation phase. At this point, the data set contain the variables mentioned in the input statement, but the values would only be written in the execution phase.

  •  Execution phase

A. The automatic variables are initialized _n_ takes the value 1, _error_ to 0 and the remaining variables to missing (.)

B. The input pointer now reads the data values and assigns the variable values to the pdv.

C. At the end of the data step the data values from the pdv are written to the dataset and then _n_ takes the value as 2, _error_ 0 and all rest as missing(.).

The use of double trailing is that, if you have datalines in one line then you can use @@;

data test1;
input name $ age marks @@;
datalines;
am 12 95 ra 13 67 pa 14 98 sa 14 87
;
run;
proc print data=test1;
run;

Output 

ObsnameAgeMarks
1Am1295
2Ra1367
3Pa1498
4Sa1487

The use of single @ is in the holding the line, it basically is used in more than one input statement, the pointer basically hold the line after reading the values, it basically holds the value and checks the condition;

data test2;
input @1 gender $1. @;
if gender ne 'F' then delete;
input @3 age @5 marks;
datalines;
F 13 56
M 12 78
F 13 78
M 56 90
;
run;
proc print data=test2;
run;

Output

Obs
Gender
age
Marks
1
F
13
56
2
F
13
78

data Test;

input @1name $7. @9 sex $1. @10 marks;

cards;

Anshukr M123

MohitaS F32

;

run;

proc print data=test;

run;

Answer

Obs
name
sex
marks
1
Anshukr
M
123
2
MohitaS
F
32

The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile

statement;

example

data Test;
infile 'C:\Users\Anshu\Desktop\test.txt' PAD;
input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.;
run;
proc print data=Test;
run;

RETAIN statement causes a variable that is defined during the input statement to be retained across data set iterations, in other words, retain statement retains the variable value at each iteration and keeps on substituting it, the pdv does not get cleared when you use the retain statement, let’s try to understand the basic example explained below –

Table - mylib.clinical

OBSIDsYOBDatediag1daig2daig3daig4daig5
11002-Mar-862-Mar-18D4500D450
21002-Mar-865-Mar-180D4500D45
31002-Mar-868-Mar-180C50
C50D45
41015-Jan-808-Jan-1800D45D450
51015-Jan-802-Feb-18D4500D450
61015-Jan-8027-Feb-180D4500D45
71015-Jan-8024-Mar-18D450D4500
81015-Jan-8018-Apr-18D4500D450

Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact the following code will give the desired results.

Data Clinical_Analysis;
Set mylib.clinical;
If '01JAN2018'D <= Date <= '31MAR2019'D ;
array diag(*) $diag1 - diag5;
do i=1 to 5;
if diag(i) in ('D45') then f_diag =1;
end;
run;
proc sort data = Clinical_Analysis; by IDs date; run;
Data diag_pat (keep = IDs YOB Date f_diag first_date diag age);
set Clinical_Analysis;
by Ids;
format first_date date9.;

retain diag first_date age; /*retaining diag first_date and age at every iteration */

if first. Ids then do;
diag=0;
first_date =date;
age =0;
end;
age = Year(first_date) - yob;
if f_diag =1 then diag=1;
if last.Ids;
run;
Proc freq data = diag_pat;
title "All required counts";
tables diag age/missing list;
run;

output: 

Dataset of the code – Table: diag_pat

OBS
IDs
YOB
Date
f_diag
first_date
diag
Age
1
100
2-Mar-86
8-Mar-18
1
2-Mar-18
1
32
4
101
5-Jan-80
18-Apr-18
1
8-Jan-18
1
38

The mean() function is used to calculate the mean of one observation across the variable. It is used to generate a horizontal data summary.

DATA A;

INPUT NAME$ SUBJ1 SUBJ2 SUBJ3;

CARDS;

A 95 96 99

B 85 90 70

C 60 70 80

;

RUN;

DATA B;

SET A;

MEAN = MEAN(SUBJ1,SUBJ2,SUBJ3);

RUN;

PROCPRINT;

RUN;

Proc Means is used to calculate the mean of a variable across all observations.

It is used for generating the vertical data summary (i.e. mode of operation is column-wise).

By default, it calculates N, Mean, Std Dev, Minimum, Maximum statistics.

In case you want to delete multiple delimiters in one go, you need to put all delimiters in the second argument in any order and use the SAS function compress. 

data clean; 
x='ab,c,*&%$ ~d';
 y=compress(x,'~$%&* ,'); run;
proc print data=clean;
run;

Output – 

Obs  x    y

ab,c,*&%$ ~d    abcd

Numeric Format ag’ is created, in the work library (is a temporary format) with the specified ranges: 

0 to less than 30 coded as ‘young’
30 to less than 61 coded as ‘old’
Format ‘ag’ is called & applied with ‘age’

 Proc format lib=work;
Value ag
0-<30=’young’
30-<61=’old’;
Run;

Proc print data=sasuser.admit;
Format age ag.;

Procsql;

Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass”

tdpid =rchgrp2 MODE= ‘teardata’);

Create table work. Account_details as (Select

Account_ Number, Enroll_ Date, Sales , City 

From Master_ acct_ details

Where Account_ Number is not null and Sales gt 40000);

Disconnect from Teradata;

Quit; 

The options nofmterr mprint mlogic symbolgen;

%macro datecompare;

proc format;

value $bckcolor

'Dates match No action'='green'

 'PLZ CHECK'=red;

run;

%macro a;

%do i=1 %to 5;

data sasuser.a&i;

set sasuser.admit;

run;

proc copy in =sasuser out=work memtype=data;

select a&i;

run;

%end;

%mend a;

%a;

proc sql noprint;

select count(distinct memname) into:cntwork from dictionary.columns where upcase(libname)

='WORK';

quit;

%put &cntwork;

proc sql noprint;

select count(distinct memname) into: cntuser from dictionary.columns where upcase(libname)

='SASUSER' and upcase(memname) in ('A1','A2','A3','A4','A5');

quit;

%put &cntuser;

proc sql;

create table wrk as select libname,memname,crdate from dictionary.tables where upcase(libname)

='WORK' and upcase(memname) in ('A1','A2','A3','A4','A5');

quit;

proc sql;

create table suser as select libname,memname,crdate from dictionary.tables where upcase(libname)

='SASUSER' and  upcase(memname) in ('A1','A2','A3','A4','A5');

quit;

proc sort data=wrk (rename=(crdate=crdate1 libname=libname1));

by memname;

run;

proc sort data=suser;

by memname;

run;

data final;

merge wrk(in=a) suser(in=b);

by memname;

if a and b;

run;

data new;

set final;

if left(trim(crdate))=left(trim(crdate1)) then action='Dates match No action';

else action='PLZ CHECK';

run;

proc sql;

create table new as select memname,libname,crdate,libname1,crdate1,action from new;

quit;

ods listing close;

ods rtf file='\\path \new.rtf';

proc report data=new nowd headline headskip split='*';

Title1 j = center font = arial h = 4 color = blue "Creation Date comparision for";

Title2 j = center font = arial h = 4 color = blue "datasets of library sasuser and work";

columns memname libname crdate libname1 crdate1 action;

define memname/display width=9 spacing=3 center;

define libname/display width=2 spacing=3 center;

define crdate/display width=2 spacing=3 center;

define libname1/display width=3 spacing=3 center;

define crdate1/display width=9 spacing=3 center;

define action/display width=9 spacing=3 center style ={background =$bckcolor.};;

run;

ods rtf close;

ods listing;

%mend datecompare;

%datecompare;

The tranwrd function helps in the replacement of a string in a char variable. We can reverse the text/values in SAS using tranward function. It has many functions like reversing, replacing etc. 

For example –

Converting the multiple occurence of a string;

data x; input name$; datalines; highhigh high cc ; run;

data a; set x; name=tranwrd(name,'high','hi'); run;

proc print data=a; run;

output – 

Obs name

1 hihi

2 hi

3 cc

In SAS, we can convert text into any case like LOWCASE, UPCASE and PROCASE.

LOWCASE – It converts the string into small letters or in lower case.

data lower; 

x='SAS'; 

y=lowcase(x); 

x=lowcase('CHAMP');

 run;

proc print data=lower; run;

output – 

Obs     x   y

1       champ sas

UPCASE – It converts the string into capital letters or in upper case.

data uper; 

x='sAs'; 

y=upcase(x); 

x=upcase('ChaMP');

 run;

proc print data=uper; run;

output – 

Obs     x   y

1       CHAMP SAS

PROCASE – It converts the string into proper letters or in other words we can say the first letter upper/upcase and rest in lowercase. Example-

data procase; 

x='sAs'; 

y=procase(x); 

x=procase('ChaMP');

 run;

proc print data=procase; run;

output – 

Obs     x   y

1       Champ Sas

Let suppose we have a table admit in sasuser library and we have to fetch the student who has age greater than the average age then this can be code like the one below using symget.

%macro ch;                                                                                                                              

proc sql;                                                                                                                               

select avg(age) into:avg_age from sasuser.admit;                                                                                        

quit;                                                                                                                                   

data x;                                                                                                                                 

set sasuser.admit;                                                                                                                      

ageif=symget('avg_age');                                                                                                                

if age gt ageif;                                                                                                                        

run;                                                                                                                                    

%mend ch;                                                                                                                               

%ch;

 The mdy function creates a numeric date from the values of the month day and year.

data date_data; 
input name $ month year day; 
datalines;
 ram 10 1981 13 
Geeta 04 1982 20 ; 
run;
data date_date1; 
set date_data;
 attrib bdy format=date9.; 
bdy=mdy(month,day,year); 
run;
proc print data= date_date1; run;

Output – 

Obs name  month year   day bdy

1     ram 10      1981 13 13OCT1981

2    geeta  4 1982    20 20APR1982

we can use CATX function for it. CATX function with a series, Let’s say you want to concatenate the value of a variable; 

data a; 
sp='|'; 
x1='a';
 x2='b'; 
x3='c'; 
string=catx(sp,of x1-x3); 
run; 
proc print data=a; run;

output –

Obs     sp     x1 x2         x3 string

1           |   a b           c a|b|c

The N function counts the number of non-missing values in a row 

data a; 
x1=2; 
x2=3; 
x3=4;
nvars_nonmiss=n(of x1-x3); 
put nvars_nonmiss=; 
run;
proc print data = a; run;

Output – 

Obs    x1 x2    x3 nvars_nonmiss

        1       2 3     4 3

we can count missing value in sas using NMISS function.

For example – 

data a; x1=2; x2=3; x3=.; 

nvars_miss=nmiss(of x1-x3); 

put nvars_miss=;

 run;

proc print data = a; 

run;

The reverse function just reverse the string, if there are leading blanks they become trailing.

data a; 
x='abcdef'; 
change=reverse(x); 
put change;
 run;

Output – 

fedcba

The INTNX function increments day, year or month specified a date by the parameters given. It is a very helpful function present in sas. Few examples below can show you how - 

 Here it increments the date by 6 weeks, means date after 6 weeks including current week.

data _null_; 
x=intnx('week', '23mar2012'd, 2); 
put x date9.; 
run; 
output - 01APR2012 
The intnx function increments the day, year or month on the specified date, you can use put as well; 
data amit; 
x='13oct1981'd; 
z=put(intnx('month',x,3,'e'),date9.); 
run; 
proc print data=amit; 
run;

output

Obs  x  z 

7956  31JAN1982 

The intnx function increments the day, year or month on the specified date, you can use the same argument for incrementing 1 month from the date; 

data amit; 

x='13oct1981'd; 

z=put(intnx('month',x,1,'same'),date9.); 

put z=; 

run;

Output - z=13NOV1981 

The MONOTONIC function is quite similar to the internal variable _N_ in the DATA Step. We can use it to select the records according to their row number. For example, we choose the SSNs from the 501st line to the 888th line in the SSN dataset.

 proc sql;

 select * from ssn_data where monotonic() between 501 and 800 ;

quit;

you can also the monotonic() to subset the dataset in the where clause to limit the number of records read/outputted to the final dataset…

view source

print?

proc sql;

create table class2 as

select monotonic() as rowno, *

from sashelp.class

where 10 le monotonic() le 20;

quit;

Description

SAS stands for Statistical Analysis Software which is used for Data Analytics. It facilitates reporting, data mining, predictive modeling with the help of powerful visualization. It helps you to use qualitative techniques and processes which allows you to enhance employee productivity and increase business profits.