Accreditation Bodies
Accreditation Bodies
Accreditation Bodies
Supercharge your career with our Multi-Cloud Engineer Bootcamp
KNOW MOREStatistical Analysis System, or SAS, is a programming language for statistical analysis that is helpful in data mining and data handling. SAS gives results in relation to multivariate analysis and predictive analytics. Whether you are a beginner or an intermediate, or an experienced SAS professional, this write-up will aid you in boosting your confidence and knowledge of SAS. Whether you are a beginner or an intermediate, or an expert SAS professional, this guide will aid you in increasing your knowledge and learning of SAS. The questions provided below cover topics like index, different ways to create a table in proc SQL, PAD option, and how it is used. It includes questions ranging from simple theoretical concepts to tricky questions. These questions will provide you step-by-steo explanations for every query which will help you understand the concepts in detail. With SAS interview questions you can be confident of your preparation for the upcoming interview.
Filter By
Clear all
Find the number of patients with daig code (D45) and calculate the age of each patient between 01st Jan 2018 to 31 March 2019 from the table given below.
Table Name – Clinical
OBS | IDs | YOB | Date | diag1 | daig2 | daig3 | daig4 | daig5 |
---|---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 2-Mar-18 | D45 | 0 | 0 | D45 | 0 |
2 | 100 | 2-Mar-86 | 5-Mar-18 | 0 | D45 | 0 | 0 | D45 |
3 | 100 | 2-Mar-86 | 8-Mar-18 | 0 | C50 | 0 | C50 | D45 |
4 | 101 | 5-Jan-80 | 8-Jan-18 | 0 | 0 | D45 | D45 | 0 |
5 | 101 | 5-Jan-80 | 2-Feb-18 | D45 | 0 | 0 | D45 | 0 |
6 | 101 | 5-Jan-80 | 27-Feb-18 | 0 | D45 | 0 | 0 | D45 |
7 | 101 | 5-Jan-80 | 24-Mar-18 | D45 | 0 | D45 | 0 | 0 |
8 | 101 | 5-Jan-80 | 18-Apr-18 | D45 | 0 | 0 | D45 | 0 |
9 | 102 | 4-Jan-85 | 18-Jan-17 | 0 | D45 | 0 | 0 | D45 |
10 | 102 | 4-Jan-85 | 25-Jan-18 | D45 | 0 | 0 | D45 | 0 |
11 | 102 | 4-Jan-85 | 1-Feb-18 | 0 | D45 | 0 | 0 | D45 |
12 | 102 | 4-Jan-85 | 8-Jan-18 | 0 | 0 | D45 | D45 | 0 |
13 | 102 | 4-Jan-85 | 15-Feb-19 | 0 | 0 | D45 | D45 | 0 |
14 | 103 | 16-Jan-90 | 15-Feb-18 | 0 | D45 | 0 | 0 | D45 |
15 | 103 | 16-Jan-90 | 25-Feb-18 | D45 | 0 | 0 | D45 | 0 |
16 | 103 | 16-Jan-90 | 7-Mar-18 | 0 | D45 | 0 | 0 | D45 |
17 | 103 | 16-Jan-90 | 17-Mar-18 | 0 | 0 | D45 | D45 | 0 |
18 | 104 | 25-Jan-92 | 27-Feb-18 | 0 | 0 | D45 | D45 | 0 |
19 | 104 | 25-Jan-92 | 24-Mar-18 | 0 | D45 | 0 | 0 | D45 |
20 | 104 | 25-Jan-92 | 18-Apr-18 | D45 | 0 | 0 | D45 | 0 |
Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact, the following code will give the desired results.
Data Clinical_Analysis; Set mylib.clinical; If '01JAN2018'D <= Data <= '31MAR2019'D ; array diag(*) $diag1 - diag5; do i=1 to 5; if diag(i) in ('D45') then f_diag =1; end; run; proc sort data = Clinical_Analysis; by IDs date; run; Data diag_pat (keep = IDs YOB Date f_diag first_date diag age); set Clinical_Analysis; by Ids; format first_date date9.; retain diag first_date age; if first.Ids then do; diag=0; first_date =date; age =0; end; age = Year(first_date) - yob; if f_diag =1 then diag=1; if last.Ids; run; Proc freq data = diag_pat; title "All required counts"; tables diag age/missing list; run;
output:
Dataset of the code – Table: diag_pat
OBS | IDs | YOB | Date | f_diag | first_date | diag | Age |
---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 8-Mar-18 | 1 | 2-Mar-18 | 1 | 32 |
4 | 101 | 5-Jan-80 | 18-Apr-18 | 1 | 8-Jan-18 | 1 | 38 |
9 | 102 | 4-Jan-85 | 15-Feb-19 | 1 | 18-Jan-17 | 1 | 34 |
14 | 103 | 16-Jan-90 | 17-Mar-18 | 1 | 15-Feb-18 | 1 | 28 |
18 | 104 | 25-Jan-92 | 18-Apr-18 | 1 | 27-Feb-18 | 1 | 26 |
This is a frequently asked question in SAS interview questions for experienced.
Format for creating a simple index -
Data <dataset_name> (INDEX = (<variable-name></UNIQUE></NOMISS>)) Creating a single index in one go Example : data a(index=(seq /unique /nomiss)); Set a; Run; Creating multiple indexes in one go Example: data a(index =(seq /unique /nomiss State /unique Year /unique /nomiss) Set a; Run; Format for creating a composite index- Data <dataset_name> (INDEX = (index-name = <var1 var2 etc></UNIQUE></NOMISS>)) Creating a composite index – Example – data a(index=(quarter_month = (quarter month) /nomiss) Set a; Run;
Note: We can also use the proc’s to create the indexes in the newly generated output datasets.
Additionally, we could also use the proc datasets procedure and proc SQL to generate indexes.
Information From Dataset: A
Information From Dataset: B
The output of the UNION Operator
UNION Operator is used to club the specific variable information from both the tables. Apart from updating the information, it also removes the duplicate data and sorts the combined information.
The output of UNIONALL Operator
UNIONALL Operator works by appending the information from dataset B after the information from dataset A. It doesn’t remove the duplicates or sorts the resultant data.
Note: If the variables are existing in both the tables, but the order of the same is different. SAS throws the below error –
Expect to come across this popular question in SAS logical interview questions.
There are multiple ways to create table in proc SQL:
I. Create Table From Column Definition
We can create a new table without rows by using the CREATE TABLE statement to define the columns and attributes. We can specify a column’s name, type, length, informat, format and label.
procsql; createtable employee( ssn char(9), dob numformat = DATE9.informat = DATE9.label = "Date Of Birth", doh numformat = DATE9.informat = DATE9.label = "Date Of Hire", gender char(1) label = "Gender", estat char(10) label = "Employement Status"); insertinto employee values ("111111111",500,25000,"M","ACTIVE"); quit; procsql; select * from employee; quit;
II. Create Table from a Query Result
Many times, there are situations wherein we already have a structure of the table/view and we are trying to create a mirror table that is referring to columns of the earlier existing table. In such cases, we can use the same CREATE TABLE Statement and place it before the SELECT Statement. An additional benefit of using this approach of creating the table is that the data of the new table is derived on the basis of existing data from the parent table.
procsql; createtable employee1 as select * from employee quit; procsql; select * from employee1; quit;
III. Create Table Like an Existing Table
To create an empty table that has the same columns and attributes as an existing table, we use the LIKE clause while creating the table. One point to note is that an empty table is created using this method and no data is populated inside the table.
procsql;
createtable employee3 like employee;
quit;
A must-know for anyone heading into a SAS interview, this question is frequently asked in tricky SAS interview questions.
Connecting to teradata using pass through
Procsql; Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass” tdpid =rchgrp2 MODE= ‘teardata’); Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000 ); Disconnect from Teradata; Quit; Connecting to db2 using pass through: Procsql; Connect to db2( DATABASE=CRD_ PROD USER= “ &user_id.” PASSWORD=” &dbpass”); Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000 ); Disconnect from db2; Quit;
Other way using the library assignments:
Libname AM DB2 DATABASE= CARD_USA Schema=DDKAUTO
USER=”&user_ id.” PASSWORD= “&dbpass”; Procsql;
Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From AM.Master_ acct_ details Where Account_ Number is not null and Sales gt 40000);
Calculate the rank according to the marks in ascending order and if any marks value is repeated the rank should be the same for both.
Dataset
Rno | Name | Marks |
---|---|---|
1 | A | 50 |
2 | B | 49 |
3 | C | 51 |
4 | D | 52 |
5 | E | 52 |
Output
Rno | Name | Marks | Rank |
---|---|---|---|
2 | B | 49 | 1 |
1 | A | 50 | 2 |
3 | C | 51 | 3 |
4 | D | 52 | 4 |
5 | E | 52 | 4 |
Data test; input id sal; cards; 1 300 1 200 3 900 2 300 2 400 3 700 2 100 3 800 ; run; proc sort data =test out=test1; by id sal; run; proc sql; create table test2(drop=number) as select *, monotonic() as number from test1 group by id having number=max(number); quit;
Fetch 3 variables: height, weight, maxval(max of height or weight) with format=3.
Dataset
Height | Weight |
---|---|
69 | 112.5 |
70 | 84 |
89.5 | 78 |
65.3 | 97.5 |
57.3 | 88 |
Output
Height | Weight | Maxval |
---|---|---|
69 | 112.5 | 113 |
70 | 84 | 84 |
89.5 | 78 | 90 |
65.3 | 97.5 | 98 |
57.3 | 88 | 88 |
data test; input name$ weight; cards; a 25 b 30 c 35 d 40 ; run; proc sql; select name, weight from test union all select 'Total', sum(weight) from test; quit;
The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile
statement;
example
data Test; infile 'C:\Users\Anshu\Desktop\test.txt' PAD; input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.; run; proc print data=Test; run;
It's no surprise that this one pops up often in SAS Macro interview questions.
Macro is a way of automating in SAS programs. There are many ways to create macro.
%macro get_var_dist (data_nm , categ_cut_off); /* Taking list of variables in the dataset along with type*/ proc contents data=&data_nm. out=_content_; run; /* Note - how data_nm is getting resolved. */ /* Fiding number of variables to define loop */ proc sql noprint; select count(*) into: nvars from _content_; quit; /* Note this is another way of defining macro variables */ %put "No of Variables is " &nvars. ; /* Note this can be a useful way to detect the value of macro variables.*/ %do i=1 %to &nvars. ; data _null_; set _content_; if _n_ = &i. then do; call symput("variable", name); call symput("var_type",type); end; run; /* Note - 1: One more way of defining variables. Please note here information is being passed from general SAS dataset to SAS macros using call symput 2: Please see how to run do a loop */ %put "Variable is " &variable. ; %put "Variable Type is " &var_type. ; proc sql noprint; select count(distinct &variable) into: distinct_categories from &data_nm.; quit; %put "Number of distinct categories is &distinct_categories." ; /* Note the syntax of writing if else condition */ /* %do .... %end; defines the block */ /* Missing command below includes missing value as a category */ /* Also note, if you don't use %eval it can throw errors */ /* if comparison involves floating numbers, you will need to use %sasevalf */ %if %eval(&distinct_categories.) <= %eval(&categ_cut_off.) %then %do; Title "Freq Dist for Variable &variable. as number of distinct category is &distinct_categories."; proc freq data=&data_nm. ; table &variable. / missing ; run; %end; %if %eval(&var_type.) = 1 %then %do; Title "Univariate Dist for Variable = &variable. "; proc univariate data=&data_nm. plot; var &variable. ; run; %end; %end; %mend get_var_dist; options symbolgen mlogic mprint; %get_var_dist(data_nm=sashelp.heart, categ_cut_off=25); %get_var_dist(data_nm=sashelp.asscmgr, categ_cut_off=10);
The SAS processing is in two steps
The use of double trailing is that, if you have datalines in one line then you can use @@;
data test1; input name $ age marks @@; datalines; am 12 95 ra 13 67 pa 14 98 sa 14 87 ; run; proc print data=test1; run;
Output
Obs | name | Age | Marks |
---|---|---|---|
1 | Am | 12 | 95 |
2 | Ra | 13 | 67 |
3 | Pa | 14 | 98 |
4 | Sa | 14 | 87 |
The use of single @ is in the holding the line, it basically is used in more than one input statement, the pointer basically hold the line after reading the values, it basically holds the value and checks the condition;
data test2; input @1 gender $1. @; if gender ne 'F' then delete; input @3 age @5 marks; datalines; F 13 56 M 12 78 F 13 78 M 56 90 ; run; proc print data=test2; run;
Output
Obs | Gender | age | Marks |
---|---|---|---|
1 | F | 13 | 56 |
2 | F | 13 | 78 |
data Test; input @1name $7. @9 sex $1. @10 marks; cards; Anshukr M123 MohitaS F32 ; run; proc print data=test; run;
Answer
Obs | name | sex | marks |
---|---|---|---|
1 | Anshukr | M | 123 |
2 | MohitaS | F | 32 |
The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile
statement;
example
data Test; infile 'C:\Users\Anshu\Desktop\test.txt' PAD; input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.; run; proc print data=Test; run;
A common question in SAS scenario based interview questions, don't miss this one.
RETAIN statement causes a variable that is defined during the input statement to be retained across data set iterations, in other words, retain statement retains the variable value at each iteration and keeps on substituting it, the pdv does not get cleared when you use the retain statement, let’s try to understand the basic example explained below –
Table - mylib.clinical
OBS | IDs | YOB | Date | diag1 | daig2 | daig3 | daig4 | daig5 |
---|---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 2-Mar-18 | D45 | 0 | 0 | D45 | 0 |
2 | 100 | 2-Mar-86 | 5-Mar-18 | 0 | D45 | 0 | 0 | D45 |
3 | 100 | 2-Mar-86 | 8-Mar-18 | 0 | C50 | C50 | D45 | |
4 | 101 | 5-Jan-80 | 8-Jan-18 | 0 | 0 | D45 | D45 | 0 |
5 | 101 | 5-Jan-80 | 2-Feb-18 | D45 | 0 | 0 | D45 | 0 |
6 | 101 | 5-Jan-80 | 27-Feb-18 | 0 | D45 | 0 | 0 | D45 |
7 | 101 | 5-Jan-80 | 24-Mar-18 | D45 | 0 | D45 | 0 | 0 |
8 | 101 | 5-Jan-80 | 18-Apr-18 | D45 | 0 | 0 | D45 | 0 |
Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact the following code will give the desired results.
Data Clinical_Analysis; Set mylib.clinical; If '01JAN2018'D <= Date <= '31MAR2019'D ; array diag(*) $diag1 - diag5; do i=1 to 5; if diag(i) in ('D45') then f_diag =1; end; run; proc sort data = Clinical_Analysis; by IDs date; run; Data diag_pat (keep = IDs YOB Date f_diag first_date diag age); set Clinical_Analysis; by Ids; format first_date date9.;
retain diag first_date age; /*retaining diag first_date and age at every iteration */
if first. Ids then do; diag=0; first_date =date; age =0; end; age = Year(first_date) - yob; if f_diag =1 then diag=1; if last.Ids; run; Proc freq data = diag_pat; title "All required counts"; tables diag age/missing list; run;
output:
Dataset of the code – Table: diag_pat
OBS | IDs | YOB | Date | f_diag | first_date | diag | Age |
---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 8-Mar-18 | 1 | 2-Mar-18 | 1 | 32 |
4 | 101 | 5-Jan-80 | 18-Apr-18 | 1 | 8-Jan-18 | 1 | 38 |
The mean() function is used to calculate the mean of one observation across the variable. It is used to generate a horizontal data summary.
DATA A; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3; CARDS; A 95 96 99 B 85 90 70 C 60 70 80 ; RUN; DATA B; SET A; MEAN = MEAN(SUBJ1,SUBJ2,SUBJ3); RUN; PROCPRINT; RUN;
Proc Means is used to calculate the mean of a variable across all observations.
It is used for generating the vertical data summary (i.e. mode of operation is column-wise).
By default, it calculates N, Mean, Std Dev, Minimum, Maximum statistics.
In case you want to delete multiple delimiters in one go, you need to put all delimiters in the second argument in any order and use the SAS function compress.
data clean; x='ab,c,*&%$ ~d'; y=compress(x,'~$%&* ,'); run; proc print data=clean; run;
Output –
Obs x y 1 ab,c,*&%$ ~d abcd
It's no surprise that this one pops up often in advanced SAS interview questions.
Numeric Format ag’ is created, in the work library (is a temporary format) with the specified ranges:
Proc format lib=work; Value ag 0-<30=’young’ 30-<61=’old’; Run; Proc print data=sasuser.admit; Format age ag.;
Procsql; Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass” tdpid =rchgrp2 MODE= ‘teardata’);
Create table work. Account_details as (Select Account_ Number, Enroll_ Date, Sales , City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000);
Disconnect from Teradata; Quit;
The options nofmterr mprint mlogic symbolgen;
%macro datecompare; proc format; value $bckcolor 'Dates match No action'='green' 'PLZ CHECK'=red; run; %macro a; %do i=1 %to 5; data sasuser.a&i; set sasuser.admit; run; proc copy in =sasuser out=work memtype=data; select a&i; run; %end; %mend a; %a; proc sql noprint; select count(distinct memname) into:cntwork from dictionary.columns where upcase(libname) ='WORK'; quit; %put &cntwork; proc sql noprint; select count(distinct memname) into: cntuser from dictionary.columns where upcase(libname) ='SASUSER' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; %put &cntuser; proc sql; create table wrk as select libname,memname,crdate from dictionary.tables where upcase(libname) ='WORK' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; proc sql; create table suser as select libname,memname,crdate from dictionary.tables where upcase(libname) ='SASUSER' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; proc sort data=wrk (rename=(crdate=crdate1 libname=libname1)); by memname; run; proc sort data=suser; by memname; run; data final; merge wrk(in=a) suser(in=b); by memname; if a and b; run; data new; set final; if left(trim(crdate))=left(trim(crdate1)) then action='Dates match No action'; else action='PLZ CHECK'; run; proc sql; create table new as select memname,libname,crdate,libname1,crdate1,action from new; quit; ods listing close; ods rtf file='\\path \new.rtf'; proc report data=new nowd headline headskip split='*'; Title1 j = center font = arial h = 4 color = blue "Creation Date comparision for"; Title2 j = center font = arial h = 4 color = blue "datasets of library sasuser and work"; columns memname libname crdate libname1 crdate1 action; define memname/display width=9 spacing=3 center; define libname/display width=2 spacing=3 center; define crdate/display width=2 spacing=3 center; define libname1/display width=3 spacing=3 center; define crdate1/display width=9 spacing=3 center; define action/display width=9 spacing=3 center style ={background =$bckcolor.};; run; ods rtf close; ods listing; %mend datecompare; %datecompare;
The tranwrd function helps in the replacement of a string in a char variable. We can reverse the text/values in SAS using tranward function. It has many functions like reversing, replacing etc.
For example –
Converting the multiple occurence of a string; data x; input name$; datalines; highhigh high cc ; run; data a; set x; name=tranwrd(name,'high','hi'); run; proc print data=a; run;
output –
Obs name 1 hihi 2 hi 3 cc
In SAS, we can convert text into any case like LOWCASE, UPCASE and PROCASE.
LOWCASE – It converts the string into small letters or in lower case.
data lower; x='SAS'; y=lowcase(x); x=lowcase('CHAMP'); run; proc print data=lower; run;
output –
Obs x y 1 champ sas
UPCASE – It converts the string into capital letters or in upper case.
data uper; x='sAs'; y=upcase(x); x=upcase('ChaMP'); run; proc print data=uper; run;
output –
Obs x y 1 CHAMP SAS
PROCASE – It converts the string into proper letters or in other words we can say the first letter upper/upcase and rest in lowercase. Example-
data procase; x='sAs'; y=procase(x); x=procase('ChaMP'); run; proc print data=procase; run;
output –
Obs x y 1 Champ Sas
Let suppose we have a table admit in sasuser library and we have to fetch the student who has age greater than the average age then this can be code like the one below using symget.
%macro ch; proc sql; select avg(age) into:avg_age from sasuser.admit; quit; data x; set sasuser.admit; ageif=symget('avg_age'); if age gt ageif; run; %mend ch; %ch;
The mdy function creates a numeric date from the values of the month day and year.
data date_data; input name $ month year day; datalines; ram 10 1981 13 Geeta 04 1982 20 ; run; data date_date1; set date_data; attrib bdy format=date9.; bdy=mdy(month,day,year); run; proc print data= date_date1; run;
Output –
Obs name month year day bdy 1 ram 10 1981 13 13OCT1981 2 geeta 4 1982 20 20APR1982
we can use CATX function for it. CATX function with a series, Let’s say you want to concatenate the value of a variable;
data a; sp='|'; x1='a'; x2='b'; x3='c'; string=catx(sp,of x1-x3); run; proc print data=a; run;
output –
Obs sp x1 x2 x3 string 1 | a b c a|b|c
A common question in SAS interview questions for experienced, don't miss this one.
The N function counts the number of non-missing values in a row
data a; x1=2; x2=3; x3=4; nvars_nonmiss=n(of x1-x3); put nvars_nonmiss=; run; proc print data = a; run;
Output –
Obs x1 x2 x3 nvars_nonmiss 1 2 3 4 3
One of the most frequently posed SAS logical interview questions, be ready for it.
We can count missing value in sas using NMISS function.
For example –
data a; x1=2; x2=3; x3=.; nvars_miss=nmiss(of x1-x3); put nvars_miss=; run; proc print data = a; run;
The reverse function just reverse the string, if there are leading blanks they become trailing.
data a; x='abcdef'; change=reverse(x); put change; run;
Output –
fedcba
The INTNX function increments day, year or month specified a date by the parameters given. It is a very helpful function present in sas. Few examples below can show you how -
Here it increments the date by 6 weeks, means date after 6 weeks including current week.
data _null_; x=intnx('week', '23mar2012'd, 2); put x date9.; run; output - 01APR2012 The intnx function increments the day, year or month on the specified date, you can use put as well; data amit; x='13oct1981'd; z=put(intnx('month',x,3,'e'),date9.); run; proc print data=amit; run;
output -
Obs x z
1 7956 31JAN1982
The intnx function increments the day, year or month on the specified date, you can use the same argument for incrementing 1 month from the date;
data amit; x='13oct1981'd; z=put(intnx('month',x,1,'same'),date9.); put z=; run; Output - z=13NOV1981
The MONOTONIC function is quite similar to the internal variable _N_ in the DATA Step. We can use it to select the records according to their row number. For example, we choose the SSNs from the 501st line to the 888th line in the SSN dataset.
proc sql;
select * from ssn_data where monotonic() between 501 and 800 ;
quit;
you can also the monotonic() to subset the dataset in the where clause to limit the number of records read/outputted to the final dataset…
proc sql; create table class2 as select monotonic() as rowno, * from sashelp.class where 10 le monotonic() le 20; quit;
Because of Cartesian product - Cartesian product result-set contains the number of rows in the first table, multiplied by the number of rows in the second table.
data a; input name $ 1. marks; cards; a 100 b 200 c 300 ; run; procmeansdata = a summean; run;
Output
One of the most frequently posed SAS interview questions, be ready for it.
In this code, we are giving a data or you can say creating a table using ‘datalines’ but printing the rows which only have status as ‘PT’. We have used loop here to check each row and thus printing the desired results.
Obs | Name | idno | status | Benefits |
---|---|---|---|---|
1 | Richa | 1002 | FT | Eye/Dental |
2 | Simmi | 433 | FT | HMO |
Here in this code we have A (Amount) = 100000 and I (interest) = 0.0925. We need to find Amount and updated interest till five years also the interest increases by 10% every year.
Obs | A | i | Year |
---|---|---|---|
1 | 109250.00 | 0.10175 | 1 |
2 | 120366.19 | 0.11193 | 2 |
3 | 133838.17 | 0.12312 | 3 |
4 | 150315.99 | 0.13543 | 4 |
5 | 170673.18 | 0.14897 | 5 |
This code is only an illustration to create a table declares variables name, age, and dob with formats and taking values using datalines.
input name$ age dob date9.; this statement basically creating variables and giving format to dob as date9.
Obs | name | Age | Dob |
---|---|---|---|
1 | am | 67 | Oct 13, 1981 |
2 | pr | 66 | Apr 10, 1982 |
A staple in SAS logical interview questions, be prepared to answer this one.
Obs | J | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 13 |
This question is a regular feature in tricky SAS interview questions, be ready to tackle it.
We can clear the libref through one of the below approaches:-
We can use the below syntax to disassociate a specific library
Syntax : LIBNAME libref <CLEAR>
Example : Libname temp_lib clear where temp_lib is the libref that you want to deassign If, we would like to disassociate all the libref that we have declared in a SAS Session, then use the below-modified version
LIBNAME _ALL_ <CLEAR> Example: Libname _ALL_ clear
We can use the LIBNAME function to deassign a library reference. In case, a warning or an error occurs then a message is written to the SAS log.
%if (%sysfunc(libname(temp_lib))) %then %put %sysfunc(sysmsg());
where temp_lib is the libref that you want to deassign.
Note:- In both the above approaches libref is cleared if and only if there are no open files that are associated from that libref.
On the Explorer window go to libraries. Right click on the library that you want to design and select delete. Once you click on Delete you will receive a Delete confirmation with the text "Are you sure you want to remove the library reference named temp_lib?". Once you click Ok, the same is going to get deleted.
In this code, we are trying to show the characteristics of ‘do while’ loop. For each iteration, it’s increasing the counter and writing in the output dataset. Please refer the output table below -
Obs | I | X |
---|---|---|
1 | 1 | 1 |
2 | 2 | 2 |
3 | 3 | 3 |
4 | 4 | 4 |
5 | 5 | 5 |
id | val | Na |
---|---|---|
1 | 5 | 478 |
1 | 5 | 300 |
1 | 5 | 523 |
1 | 5 | 326 |
1 | 5 | 547 |
Output of the code
Obs | Pt | visit | Ae |
---|---|---|---|
1 | 1 | 1 | Headache |
2 | 1 | 2 | Kill |
3 | 1 | 3 | Pain |
4 | 1 | 4 | Knee |
5 | 2 | 1 | Tick |
6 | 3 | 2 | Fle |
SET and Merge are two very vital statements in SAS. Let us understand by taking 2 tables ‘one’ and ‘two’ and perform SET and Merge Operation on them.
Code - DATA one; | Output
|
Code - DATA two; INPUT id v3 v4; DATALINES; 1 1000 10000 2 1500 15000 3 2000 20000 4 800 30000 ; PROC SORT Data=two; BY id; RUN; | Output
|
The SET statement is the main tool to read observations from a SAS data set. The basics are so obvious you might not even consider them. Just SET and a data set name and there you go. Through SET statement the dataset gets appended while with merge the two datasets get merged on the basis of ID and extra variables are created if needed. Example below.
Code - DATA TestSet; Set one two; RUN; Output
| Code - DATA TestMerge; MERGE one two; BY id; PROC PRINT DATA= TestMerge; title "TestMerge"; RUN; Output
|
data a; input name $ 1. age; cards; a 12 b 15 c 18 ; run; procprint; run;
data b; input name $1. marks; cards; a 200 c 150 ; run; procprint; run;
procsortdata = a; by name; run; procsortdata = b; by name; run; data final; merge a(in=x) b(in=y); by name; if x and y thenoutput; run; proc print; run;
When we submit the DATA step for execution. SAS checks the syntax of the submitted code and compiles them. During the compile phase, SAS created 3 items:
ERRORS = specifies the maximum number of observations for which SAS issues complete error messages.
Note: - If there are more errors then what is specified in the option then a warning message displays which lets the user know that the limit that was set has been reached.
So, ERROR = 1 will only issue an error
Example :
data a input name $ 1. age; cards; a 20 b 23 procpint; run; 1st Error -
Before setting the options ERRORS = 1
This is a frequently asked question in SAS interview questions.
Answer – the RUN statement is used to define the step boundary, it tells the compiler that the previous set of SAS statements are to be executed. Additionally, sometimes it also helps to make the code easy to read and interpret in the log.
DATA TEST; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.; CARDS; A 95 96 99 M B 85 90 70 F C 60 70 80 M ;
DATA A; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.; CARDS; A 95 96 99 M B 85 90 70 F C 60 70 80 M ; RUN; DATA NUM (KEEP=_NUMERIC_); SET A; RUN; PROCPRINT; RUN;
DATA CHAR (KEEP=_CHARACTER_); SET A; RUN; PROCPRINT; RUN;
Generally, macro variable references shown so far have been direct macro references that begin with one ampersand: &name. However, indirect reference is also possible where macro variables name that belongs to a series is determined when the macro variable reference resolves. The macro facility provides indirect macro variable referencing, which allows you to use an expression (for example, DRUG&N) to generate a reference to one of a series of macro variables. For example, you could use the value of macro variable N to reference a variable in the series of macro variables named DRUG1 to DRUG10. If N has the value 8, the reference would be to DRUG8. If the value of N is 3, the reference would be to DRUG3.
Although for this example the type of reference you want is DRUG&N, the following example will not produce the results that you expect, which is the value of &N appended to DRUG:
%put &drug&n; /* incorrect */
This produces a warning message saying that there is no macro variable DRUG because the macro facility has tried to resolve &DRUG and then &N and concatenate those values.
To force the macro processor to rescan a macro variable reference, you use more than one ampersand in the macro variable reference. When the macro processor encounters multiple ampersands, its basic action is to resolve two ampersands to one ampersand. For example, to append the value of &N to DRUG and then reference the appropriate variable name, you use:
%put &&drug&n; /* correct */
Assuming that &N contains 6 when the macro processor receives this statement, it performs the following steps:
Thus using multiple ampersands (&) for reference to a macro variable is called indirect referencing and few characteristics of indirect referencing are -
Line hold specifiers are basically ‘@’ and ‘@@’ where ‘@@’ is used when we have multiple observations per line of raw data. It is used at the end of the INPUT statement. It tells SAS to hold that line of data and continue to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double trailing @. When we use @ only without specifying n, SAS will hold that line of data until it reaches either the end of the DATA step or an INPUT statement that does not end with a trailing @.
Example –
One has a lab testing data file. The ASCII file layout looks like as follows,
Where the first field is the test subject and the rest of the fields are repeated testing results for the subject.
One wants the final file has following format,
The difficult part of this example is that the numbers of tests vary from subjects to subjects. However, if one reads each line with a trailing @, one needs not to know the numbers of tests. The logic is straightforward. First, one reads the first field, and then reads in the rest of the fields and outputs it. Second, one needs to check when a line ends. If it ends, then one begins to read the next line.
data test1; length test_id $8 cnt result 8; infile cards eof=eof ; input @1 test_id $5. @ ; do while (1); cnt+1; input result ?? @; if result = '.' then do; cnt=0; input @@; leave; end; else output; end; return; eof: stop; cards; test1 11 2 test2 1 2 5 test3 2 ;
“input @1 test_id $5. @ ;” or “input result ?? @;” hold line position to prevent SAS from going to a new line. “if result = '.' then do;” tests if it is the beginning of a new line. “input @@;” holds the current line position. It is much simpler if one use double trailing @@ and a retain statement in this example. data test1(drop=dummy); length test_id dummy $8; retain test_id; input dummy @@ ; if input(dummy, ?? best.) ne . then do; result = input(dummy, 8.);
output:
end; else test_id=dummy; cards; test1 11 2 test2 1 2 5 test3 2 ;
The Program Data Vector is a logical area of memory that is created during the data step Processing.
- Suppose we have a data set
data a; input id name$; cards; 11 A 12 B ; run;
The name of the excel sheet containing the drugs code is “all_other_drugs.xlsx”
Proc import OUT= drugs_code_ndc(keep=NDC) /*this line keep NDC code only from the excel sheet and creating a dataset named “drugs_code_ndc”. */
Datafile= "%sysfunc(pathname(saslib))/all_other_drugs.xlsx" /*this line importing the excel file from saslib and the file name is “all_other_drugs.xlsx” */
DBMS=xlsx REPLACE; SHEET="sheet1"; GETNAMES=YES; Run; proc sql noprint; select distinct compress("'"||NDC||"'") into : DrugNDC_Excl_Code separated by "," from drugs_code_ndc; quit; %put &DrugNDC_Excl_Code; /* Note this can be be useful way to detect the value of variables.*/
Here we can use like “if drug_code in(&DrugNDC_Excl_Code) then drug =1; this will flag all the patients who have the drugs in the list.
All the patients who have the drugs in the list. How to export data in CSV file through SAS? Dataset Name – Pat_details
Obs | pat_id | first_date_all | all | Linaclotide | Lubiprostone | Plecanatide |
---|---|---|---|---|---|---|
1 | 11121 | 27-Jun-18 | 1 | 1 | 0 | 0 |
2 | 11394 | 16-Feb-18 | 1 | 1 | 0 | 0 |
3 | 13755 | 13-Mar-19 | 1 | 1 | 0 | 0 |
4 | 15461 | 24-Jan-18 | 1 | 0 | 1 | 0 |
5 | 13657 | 23-Mar-19 | 1 | 1 | 0 | 0 |
6 | 14327 | 19-Mar-19 | 1 | 0 | 0 | 1 |
7 | 18999 | 27-Mar-19 | 1 | 1 | 0 | 0 |
8 | 19133 | 14-Feb-18 | 1 | 1 | 0 | 0 |
9 | 16260 | 19-Apr-18 | 1 | 0 | 1 | 0 |
10 | 17194 | 4-Jan-18 | 1 | 1 | 0 | 0 |
Expect to come across this popular question in SAS scenario based interview questions.
This will export all the data in csv format at the location “saslib” at path mention above.
How to generate reports in rtf/word format in sas?Dataset Name – Pat_details
Obs | pat_id | first_date_all | all | Linaclotide | Lubiprostone | Plecanatide |
---|---|---|---|---|---|---|
1 | 11121 | 27-Jun-18 | 1 | 1 | 0 | 0 |
2 | 11394 | 16-Feb-18 | 1 | 1 | 0 | 0 |
3 | 13755 | 13-Mar-19 | 1 | 1 | 0 | 0 |
4 | 15461 | 24-Jan-18 | 1 | 0 | 1 | 0 |
5 | 13657 | 23-Mar-19 | 1 | 1 | 0 | 0 |
6 | 14327 | 19-Mar-19 | 1 | 0 | 0 | 1 |
7 | 18999 | 27-Mar-19 | 1 | 1 | 0 | 0 |
8 | 19133 | 14-Feb-18 | 1 | 1 | 0 | 0 |
9 | 16260 | 19-Apr-18 | 1 | 0 | 1 | 0 |
10 | 17194 | 4-Jan-18 | 1 | 1 | 0 | 0 |
Table all Linaclotide Lubiprostone Plecanatide/missing list;
Run; Ods rtf close;
This will export the frequency report in .rtf format at the location “saslib” at path mention above.
Data Allclm_MM; set mm_diag_data; if '01JAN2010'D <= date_svc <= '31DEC2018'D ; array diag(*) $diag1 – diag12; do i=1 to 12; if (icd_ver = "9" and diag(i) in :( "2030" )) or (icd_ver = "10" and diag(i) in :( "C900" )) then f_MM_diag = 1; end; if f_MM_diag = 1; run;
Write SAS code for counting the number of patients with the following drugs mentioned in the given table –Dataset Name – Pat_details
Obs | pat_id | first_date_all | all | Linaclotide | Lubiprostone | Plecanatide |
---|---|---|---|---|---|---|
1 | 11121 | 27-Jun-18 | 1 | 1 | 0 | 0 |
2 | 11394 | 16-Feb-18 | 1 | 1 | 0 | 0 |
3 | 13755 | 13-Mar-19 | 1 | 1 | 0 | 0 |
4 | 15461 | 24-Jan-18 | 1 | 0 | 1 | 0 |
5 | 13657 | 23-Mar-19 | 1 | 1 | 0 | 0 |
6 | 14327 | 19-Mar-19 | 1 | 0 | 0 | 1 |
7 | 18999 | 27-Mar-19 | 1 | 1 | 0 | 0 |
8 | 19133 | 14-Feb-18 | 1 | 1 | 0 | 0 |
9 | 16260 | 19-Apr-18 | 1 | 0 | 1 | 0 |
10 | 17194 | 4-Jan-18 | 1 | 1 | 0 | 0 |
We can count this by using FREQ procedure that prints all values of a given categorical variable in the Output window, along with the number and percentage of times each value appears. The FREQ procedure can work with both string (character) and numeric categorical variables. In the case mentioned above, we can code like –
Proc freq data = Pat_details;
Table all Linaclotide Lubiprostone Plecanatide/missing list;
Run;
This code will give the counts of the variable mentioned in the table statement and results will look like the one mentioned below -
Output –
Cumulative Cumulative
all Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 10 100.00 10 100.00
Cumulative Cumulative
Linaclotide Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 7 70.00 10 100.00
Cumulative Cumulative
Lubiprostone Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 2 20.00 10 100.00
Cumulative Cumulative
Plecanatide Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 1 10.00 10 100.00
Write sas code to split the table below on the basis of different drugs. Dataset Name – Pat_details
Obs | pat_id | first_date_all | all | Linaclotide | Lubiprostone | Plecanatide |
---|---|---|---|---|---|---|
1 | 11121 | 27-Jun-18 | 1 | 1 | 0 | 0 |
2 | 11394 | 16-Feb-18 | 1 | 1 | 0 | 0 |
3 | 13755 | 13-Mar-19 | 1 | 1 | 0 | 0 |
4 | 15461 | 24-Jan-18 | 1 | 0 | 1 | 0 |
5 | 13657 | 23-Mar-19 | 1 | 1 | 0 | 0 |
6 | 14327 | 19-Mar-19 | 1 | 0 | 0 | 1 |
7 | 18999 | 27-Mar-19 | 1 | 1 | 0 | 0 |
8 | 19133 | 14-Feb-18 | 1 | 1 | 0 | 0 |
9 | 16260 | 19-Apr-18 | 1 | 0 | 1 | 0 |
10 | 17194 | 4-Jan-18 | 1 | 1 | 0 | 0 |
Proc sort data = Pat_details; by pat_id; run; Data all_drug Linaclotide Lubiprostone Plecanatide; Set Pat_details; by pat_id; If all = 1 then output all_drug; If Linaclotide =1 then output Linaclotide; If Lubiprostone =1 then output Lubiprostone; If Plecanatide = 1 then output Plecanatide; Run;
The code above will create three different datasets with name all_drug Linaclotide Lubiprostone Plecanatide respectively in work library which has only respective drugs patients.
Count the patients who initiated more than one drugs in a month the Input dataset for the same is mentioned below - Input Dataset – All_first_drug
Obs | pat_id | first_date_Linaclotide | first_date_Lubiprostone | first_date_Plecanatide |
---|---|---|---|---|
1 | 18830 | 27-Jan-18 | 30-Jan-18 | |
2 | 17224 | 16-Feb-18 | 20-Feb-18 | |
3 | 11702 | 13-Mar-18 | 18-Mar-18 | |
4 | 19025 | 30-Jan-18 | 24-Jan-18 | |
5 | 19248 | 23-Mar-18 | 13-Mar-18 | |
6 | 14557 | 22-Mar-18 | 19-Mar-18 | |
7 | 16519 | 27-Mar-18 | ||
8 | 17499 | 28-Mar-18 | ||
9 | 18873 | 29-Mar-19 | 19-Apr-18 | |
10 | 11121 | 30-Mar-19 | ||
11 | 11394 | 31-Mar-19 | ||
12 | 13755 | 1-Apr-19 |
proc transpose data = All_first_drug out = t_ All_first_drug (where =(date ne .) rename=(_name_= drug_name col1=date));
by ps_indv_id; run; data t_ All_first_drug; set t_ All_first_drug; x=put(date,monyy7.); run; proc sql; create table final_data as select x, count(distinct ps_indv_id) from ( select distinct x, ps_indv_id, count(ps_indv_id) as cnt from t_ All_first_drug group by x, ps_indv_id having count(ps_indv_id)>1) group by x order by x; quit;
result –
Obs x _TEMG003 1 JAN2018 2 2 FEB2018 1 3 MAR2018 3
Macro in sas is the way of automation and it is a very powerful and flexible way to do automation.
Macro variables can be created using a number of different techniques and statements. However, the five most common methods are not only the most useful but also among the easiest to master. Since macro variables are used in so many ways within the macro language, learning how they are created can also serve as an excellent introduction to the language itself.
These methods include:
using the CALL SYMPUTX routine
The substr function extracts a text from the character variable, it could also be used to replace the values, when it is on the left side it assigns the new values, suppose you need to change the first two values from 91 to 00, you can assign by putting it to left side;
data a; x='(91) 9999265789'; substr(x,2,2)='00'; run; proc print data=a; run;
output –
Obs x 1 (00) 9999265789
A must-know for anyone heading into a SAS interview, this question is frequently asked in SAS interview questions.
SCAN function - The scan function searches for a particular string and puts the value in the target variable, the target variable length using the scan function is 200 chars, here value is of 200 char variable now, the delimiter by default is blank.
It also takes 3 arguments first being string second the nth word and third the delimiter (by default blank), it divides a string into the chunks divided by delimiters;
data a;
x='Champ are always best people’;
value=scan(x,5); run;
proc print data=a;
run;
output –
Obs x value
1 'Champs are always best people people
This is very important thing, consider the below example, the data is not consistent and you wanted to pick the last name;
data x; input @1 name $5. @7 marks; datalines; a b c 20 k d 15 a 25 n a b 20 k a 30 ; run; data master; set x; lname=scan(name,-1); run; proc print data=master; run; /********email id example**********/ data email; length id $25; input id$; datalines; a,b@n.com b@tt.com a@b@cc.com ; run; data a; set email; domain_name=scan(id,-1,'@'); run; proc print data=a; run;
Output –
Obs name marks lname 1 a b c 20 c 2 k d 15 d 3 a 25 a 4 n a b 20 b 5 k a 30 a Obs id domain_name 1 a,b@n.com n.com 2 b@tt.com tt.com 3 a@b@cc.com cc.com
How to export data in HTML file through SAS? Dataset Name – Pat_details
Obs | pat_id | first_date_all | all | Linaclotide | Lubiprostone | Plecanatide |
---|---|---|---|---|---|---|
1 | 11121 | 27-Jun-18 | 1 | 1 | 0 | 0 |
2 | 11394 | 16-Feb-18 | 1 | 1 | 0 | 0 |
3 | 13755 | 13-Mar-19 | 1 | 1 | 0 | 0 |
4 | 15461 | 24-Jan-18 | 1 | 0 | 1 | 0 |
5 | 13657 | 23-Mar-19 | 1 | 1 | 0 | 0 |
6 | 14327 | 19-Mar-19 | 1 | 0 | 0 | 1 |
7 | 18999 | 27-Mar-19 | 1 | 1 | 0 | 0 |
8 | 19133 | 14-Feb-18 | 1 | 1 | 0 | 0 |
9 | 16260 | 19-Apr-18 | 1 | 0 | 1 | 0 |
10 | 17194 | 4-Jan-18 | 1 | 1 | 0 | 0 |
ods html file="%sysfunc(pathname(saslib))/pat_details_data.html";
Proc print data = Pat_details; run;
Ods html close;
This will export all the data in html format at the location “saslib” at path mention above.
Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact, the following code will give the desired results.
Data Clinical_Analysis; Set mylib.clinical; If '01JAN2018'D <= Data <= '31MAR2019'D ; array diag(*) $diag1 - diag5; do i=1 to 5; if diag(i) in ('D45') then f_diag =1; end; run; proc sort data = Clinical_Analysis; by IDs date; run; Data diag_pat (keep = IDs YOB Date f_diag first_date diag age); set Clinical_Analysis; by Ids; format first_date date9.; retain diag first_date age; if first.Ids then do; diag=0; first_date =date; age =0; end; age = Year(first_date) - yob; if f_diag =1 then diag=1; if last.Ids; run; Proc freq data = diag_pat; title "All required counts"; tables diag age/missing list; run;
output:
Dataset of the code – Table: diag_pat
OBS | IDs | YOB | Date | f_diag | first_date | diag | Age |
---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 8-Mar-18 | 1 | 2-Mar-18 | 1 | 32 |
4 | 101 | 5-Jan-80 | 18-Apr-18 | 1 | 8-Jan-18 | 1 | 38 |
9 | 102 | 4-Jan-85 | 15-Feb-19 | 1 | 18-Jan-17 | 1 | 34 |
14 | 103 | 16-Jan-90 | 17-Mar-18 | 1 | 15-Feb-18 | 1 | 28 |
18 | 104 | 25-Jan-92 | 18-Apr-18 | 1 | 27-Feb-18 | 1 | 26 |
This is a frequently asked question in SAS interview questions for experienced.
Format for creating a simple index -
Data <dataset_name> (INDEX = (<variable-name></UNIQUE></NOMISS>)) Creating a single index in one go Example : data a(index=(seq /unique /nomiss)); Set a; Run; Creating multiple indexes in one go Example: data a(index =(seq /unique /nomiss State /unique Year /unique /nomiss) Set a; Run; Format for creating a composite index- Data <dataset_name> (INDEX = (index-name = <var1 var2 etc></UNIQUE></NOMISS>)) Creating a composite index – Example – data a(index=(quarter_month = (quarter month) /nomiss) Set a; Run;
Note: We can also use the proc’s to create the indexes in the newly generated output datasets.
Additionally, we could also use the proc datasets procedure and proc SQL to generate indexes.
Information From Dataset: A
Information From Dataset: B
The output of the UNION Operator
UNION Operator is used to club the specific variable information from both the tables. Apart from updating the information, it also removes the duplicate data and sorts the combined information.
The output of UNIONALL Operator
UNIONALL Operator works by appending the information from dataset B after the information from dataset A. It doesn’t remove the duplicates or sorts the resultant data.
Note: If the variables are existing in both the tables, but the order of the same is different. SAS throws the below error –
Expect to come across this popular question in SAS logical interview questions.
There are multiple ways to create table in proc SQL:
I. Create Table From Column Definition
We can create a new table without rows by using the CREATE TABLE statement to define the columns and attributes. We can specify a column’s name, type, length, informat, format and label.
procsql; createtable employee( ssn char(9), dob numformat = DATE9.informat = DATE9.label = "Date Of Birth", doh numformat = DATE9.informat = DATE9.label = "Date Of Hire", gender char(1) label = "Gender", estat char(10) label = "Employement Status"); insertinto employee values ("111111111",500,25000,"M","ACTIVE"); quit; procsql; select * from employee; quit;
II. Create Table from a Query Result
Many times, there are situations wherein we already have a structure of the table/view and we are trying to create a mirror table that is referring to columns of the earlier existing table. In such cases, we can use the same CREATE TABLE Statement and place it before the SELECT Statement. An additional benefit of using this approach of creating the table is that the data of the new table is derived on the basis of existing data from the parent table.
procsql; createtable employee1 as select * from employee quit; procsql; select * from employee1; quit;
III. Create Table Like an Existing Table
To create an empty table that has the same columns and attributes as an existing table, we use the LIKE clause while creating the table. One point to note is that an empty table is created using this method and no data is populated inside the table.
procsql;
createtable employee3 like employee;
quit;
A must-know for anyone heading into a SAS interview, this question is frequently asked in tricky SAS interview questions.
Connecting to teradata using pass through
Procsql; Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass” tdpid =rchgrp2 MODE= ‘teardata’); Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000 ); Disconnect from Teradata; Quit; Connecting to db2 using pass through: Procsql; Connect to db2( DATABASE=CRD_ PROD USER= “ &user_id.” PASSWORD=” &dbpass”); Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000 ); Disconnect from db2; Quit;
Other way using the library assignments:
Libname AM DB2 DATABASE= CARD_USA Schema=DDKAUTO
USER=”&user_ id.” PASSWORD= “&dbpass”; Procsql;
Create table work. Account_details as(Select Account_ Number, Enroll_ Date, Sales, City From AM.Master_ acct_ details Where Account_ Number is not null and Sales gt 40000);
Data test; input id sal; cards; 1 300 1 200 3 900 2 300 2 400 3 700 2 100 3 800 ; run; proc sort data =test out=test1; by id sal; run; proc sql; create table test2(drop=number) as select *, monotonic() as number from test1 group by id having number=max(number); quit;
data test; input name$ weight; cards; a 25 b 30 c 35 d 40 ; run; proc sql; select name, weight from test union all select 'Total', sum(weight) from test; quit;
The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile
statement;
example
data Test; infile 'C:\Users\Anshu\Desktop\test.txt' PAD; input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.; run; proc print data=Test; run;
It's no surprise that this one pops up often in SAS Macro interview questions.
Macro is a way of automating in SAS programs. There are many ways to create macro.
%macro get_var_dist (data_nm , categ_cut_off); /* Taking list of variables in the dataset along with type*/ proc contents data=&data_nm. out=_content_; run; /* Note - how data_nm is getting resolved. */ /* Fiding number of variables to define loop */ proc sql noprint; select count(*) into: nvars from _content_; quit; /* Note this is another way of defining macro variables */ %put "No of Variables is " &nvars. ; /* Note this can be a useful way to detect the value of macro variables.*/ %do i=1 %to &nvars. ; data _null_; set _content_; if _n_ = &i. then do; call symput("variable", name); call symput("var_type",type); end; run; /* Note - 1: One more way of defining variables. Please note here information is being passed from general SAS dataset to SAS macros using call symput 2: Please see how to run do a loop */ %put "Variable is " &variable. ; %put "Variable Type is " &var_type. ; proc sql noprint; select count(distinct &variable) into: distinct_categories from &data_nm.; quit; %put "Number of distinct categories is &distinct_categories." ; /* Note the syntax of writing if else condition */ /* %do .... %end; defines the block */ /* Missing command below includes missing value as a category */ /* Also note, if you don't use %eval it can throw errors */ /* if comparison involves floating numbers, you will need to use %sasevalf */ %if %eval(&distinct_categories.) <= %eval(&categ_cut_off.) %then %do; Title "Freq Dist for Variable &variable. as number of distinct category is &distinct_categories."; proc freq data=&data_nm. ; table &variable. / missing ; run; %end; %if %eval(&var_type.) = 1 %then %do; Title "Univariate Dist for Variable = &variable. "; proc univariate data=&data_nm. plot; var &variable. ; run; %end; %end; %mend get_var_dist; options symbolgen mlogic mprint; %get_var_dist(data_nm=sashelp.heart, categ_cut_off=25); %get_var_dist(data_nm=sashelp.asscmgr, categ_cut_off=10);
The SAS processing is in two steps
The use of double trailing is that, if you have datalines in one line then you can use @@;
data test1; input name $ age marks @@; datalines; am 12 95 ra 13 67 pa 14 98 sa 14 87 ; run; proc print data=test1; run;
Output
Obs | name | Age | Marks |
---|---|---|---|
1 | Am | 12 | 95 |
2 | Ra | 13 | 67 |
3 | Pa | 14 | 98 |
4 | Sa | 14 | 87 |
The use of single @ is in the holding the line, it basically is used in more than one input statement, the pointer basically hold the line after reading the values, it basically holds the value and checks the condition;
data test2; input @1 gender $1. @; if gender ne 'F' then delete; input @3 age @5 marks; datalines; F 13 56 M 12 78 F 13 78 M 56 90 ; run; proc print data=test2; run;
Output
Obs | Gender | age | Marks |
---|---|---|---|
1 | F | 13 | 56 |
2 | F | 13 | 78 |
data Test; input @1name $7. @9 sex $1. @10 marks; cards; Anshukr M123 MohitaS F32 ; run; proc print data=test; run;
Answer
Obs | name | sex | marks |
---|---|---|---|
1 | Anshukr | M | 123 |
2 | MohitaS | F | 32 |
The PAD option can be used to read the source data having the variable length records, so to read the variable length of records we can use the PAD option, the PAD option can be used with the infile
statement;
example
data Test; infile 'C:\Users\Anshu\Desktop\test.txt' PAD; input @1name $6. @10 sex $1. @12 marks 3. @14 salary 8.; run; proc print data=Test; run;
A common question in SAS scenario based interview questions, don't miss this one.
RETAIN statement causes a variable that is defined during the input statement to be retained across data set iterations, in other words, retain statement retains the variable value at each iteration and keeps on substituting it, the pdv does not get cleared when you use the retain statement, let’s try to understand the basic example explained below –
Table - mylib.clinical
OBS | IDs | YOB | Date | diag1 | daig2 | daig3 | daig4 | daig5 |
---|---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 2-Mar-18 | D45 | 0 | 0 | D45 | 0 |
2 | 100 | 2-Mar-86 | 5-Mar-18 | 0 | D45 | 0 | 0 | D45 |
3 | 100 | 2-Mar-86 | 8-Mar-18 | 0 | C50 | C50 | D45 | |
4 | 101 | 5-Jan-80 | 8-Jan-18 | 0 | 0 | D45 | D45 | 0 |
5 | 101 | 5-Jan-80 | 2-Feb-18 | D45 | 0 | 0 | D45 | 0 |
6 | 101 | 5-Jan-80 | 27-Feb-18 | 0 | D45 | 0 | 0 | D45 |
7 | 101 | 5-Jan-80 | 24-Mar-18 | D45 | 0 | D45 | 0 | 0 |
8 | 101 | 5-Jan-80 | 18-Apr-18 | D45 | 0 | 0 | D45 | 0 |
Suppose we have the above clinical table in the form of SAS Dataset in lib – ‘mylib’. Considering the fact the following code will give the desired results.
Data Clinical_Analysis; Set mylib.clinical; If '01JAN2018'D <= Date <= '31MAR2019'D ; array diag(*) $diag1 - diag5; do i=1 to 5; if diag(i) in ('D45') then f_diag =1; end; run; proc sort data = Clinical_Analysis; by IDs date; run; Data diag_pat (keep = IDs YOB Date f_diag first_date diag age); set Clinical_Analysis; by Ids; format first_date date9.;
retain diag first_date age; /*retaining diag first_date and age at every iteration */
if first. Ids then do; diag=0; first_date =date; age =0; end; age = Year(first_date) - yob; if f_diag =1 then diag=1; if last.Ids; run; Proc freq data = diag_pat; title "All required counts"; tables diag age/missing list; run;
output:
Dataset of the code – Table: diag_pat
OBS | IDs | YOB | Date | f_diag | first_date | diag | Age |
---|---|---|---|---|---|---|---|
1 | 100 | 2-Mar-86 | 8-Mar-18 | 1 | 2-Mar-18 | 1 | 32 |
4 | 101 | 5-Jan-80 | 18-Apr-18 | 1 | 8-Jan-18 | 1 | 38 |
The mean() function is used to calculate the mean of one observation across the variable. It is used to generate a horizontal data summary.
DATA A; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3; CARDS; A 95 96 99 B 85 90 70 C 60 70 80 ; RUN; DATA B; SET A; MEAN = MEAN(SUBJ1,SUBJ2,SUBJ3); RUN; PROCPRINT; RUN;
Proc Means is used to calculate the mean of a variable across all observations.
It is used for generating the vertical data summary (i.e. mode of operation is column-wise).
By default, it calculates N, Mean, Std Dev, Minimum, Maximum statistics.
In case you want to delete multiple delimiters in one go, you need to put all delimiters in the second argument in any order and use the SAS function compress.
data clean; x='ab,c,*&%$ ~d'; y=compress(x,'~$%&* ,'); run; proc print data=clean; run;
Output –
Obs x y 1 ab,c,*&%$ ~d abcd
It's no surprise that this one pops up often in advanced SAS interview questions.
Numeric Format ag’ is created, in the work library (is a temporary format) with the specified ranges:
Proc format lib=work; Value ag 0-<30=’young’ 30-<61=’old’; Run; Proc print data=sasuser.admit; Format age ag.;
Procsql; Connect to Teradata( USER= “ &user_id.” PASSWORD =” &dbpass” tdpid =rchgrp2 MODE= ‘teardata’);
Create table work. Account_details as (Select Account_ Number, Enroll_ Date, Sales , City From Master_ acct_ details Where Account_ Number is not null and Sales gt 40000);
Disconnect from Teradata; Quit;
The options nofmterr mprint mlogic symbolgen;
%macro datecompare; proc format; value $bckcolor 'Dates match No action'='green' 'PLZ CHECK'=red; run; %macro a; %do i=1 %to 5; data sasuser.a&i; set sasuser.admit; run; proc copy in =sasuser out=work memtype=data; select a&i; run; %end; %mend a; %a; proc sql noprint; select count(distinct memname) into:cntwork from dictionary.columns where upcase(libname) ='WORK'; quit; %put &cntwork; proc sql noprint; select count(distinct memname) into: cntuser from dictionary.columns where upcase(libname) ='SASUSER' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; %put &cntuser; proc sql; create table wrk as select libname,memname,crdate from dictionary.tables where upcase(libname) ='WORK' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; proc sql; create table suser as select libname,memname,crdate from dictionary.tables where upcase(libname) ='SASUSER' and upcase(memname) in ('A1','A2','A3','A4','A5'); quit; proc sort data=wrk (rename=(crdate=crdate1 libname=libname1)); by memname; run; proc sort data=suser; by memname; run; data final; merge wrk(in=a) suser(in=b); by memname; if a and b; run; data new; set final; if left(trim(crdate))=left(trim(crdate1)) then action='Dates match No action'; else action='PLZ CHECK'; run; proc sql; create table new as select memname,libname,crdate,libname1,crdate1,action from new; quit; ods listing close; ods rtf file='\\path \new.rtf'; proc report data=new nowd headline headskip split='*'; Title1 j = center font = arial h = 4 color = blue "Creation Date comparision for"; Title2 j = center font = arial h = 4 color = blue "datasets of library sasuser and work"; columns memname libname crdate libname1 crdate1 action; define memname/display width=9 spacing=3 center; define libname/display width=2 spacing=3 center; define crdate/display width=2 spacing=3 center; define libname1/display width=3 spacing=3 center; define crdate1/display width=9 spacing=3 center; define action/display width=9 spacing=3 center style ={background =$bckcolor.};; run; ods rtf close; ods listing; %mend datecompare; %datecompare;
The tranwrd function helps in the replacement of a string in a char variable. We can reverse the text/values in SAS using tranward function. It has many functions like reversing, replacing etc.
For example –
Converting the multiple occurence of a string; data x; input name$; datalines; highhigh high cc ; run; data a; set x; name=tranwrd(name,'high','hi'); run; proc print data=a; run;
output –
Obs name 1 hihi 2 hi 3 cc
In SAS, we can convert text into any case like LOWCASE, UPCASE and PROCASE.
LOWCASE – It converts the string into small letters or in lower case.
data lower; x='SAS'; y=lowcase(x); x=lowcase('CHAMP'); run; proc print data=lower; run;
output –
Obs x y 1 champ sas
UPCASE – It converts the string into capital letters or in upper case.
data uper; x='sAs'; y=upcase(x); x=upcase('ChaMP'); run; proc print data=uper; run;
output –
Obs x y 1 CHAMP SAS
PROCASE – It converts the string into proper letters or in other words we can say the first letter upper/upcase and rest in lowercase. Example-
data procase; x='sAs'; y=procase(x); x=procase('ChaMP'); run; proc print data=procase; run;
output –
Obs x y 1 Champ Sas
Let suppose we have a table admit in sasuser library and we have to fetch the student who has age greater than the average age then this can be code like the one below using symget.
%macro ch; proc sql; select avg(age) into:avg_age from sasuser.admit; quit; data x; set sasuser.admit; ageif=symget('avg_age'); if age gt ageif; run; %mend ch; %ch;
The mdy function creates a numeric date from the values of the month day and year.
data date_data; input name $ month year day; datalines; ram 10 1981 13 Geeta 04 1982 20 ; run; data date_date1; set date_data; attrib bdy format=date9.; bdy=mdy(month,day,year); run; proc print data= date_date1; run;
Output –
Obs name month year day bdy 1 ram 10 1981 13 13OCT1981 2 geeta 4 1982 20 20APR1982
we can use CATX function for it. CATX function with a series, Let’s say you want to concatenate the value of a variable;
data a; sp='|'; x1='a'; x2='b'; x3='c'; string=catx(sp,of x1-x3); run; proc print data=a; run;
output –
Obs sp x1 x2 x3 string 1 | a b c a|b|c
A common question in SAS interview questions for experienced, don't miss this one.
The N function counts the number of non-missing values in a row
data a; x1=2; x2=3; x3=4; nvars_nonmiss=n(of x1-x3); put nvars_nonmiss=; run; proc print data = a; run;
Output –
Obs x1 x2 x3 nvars_nonmiss 1 2 3 4 3
One of the most frequently posed SAS logical interview questions, be ready for it.
We can count missing value in sas using NMISS function.
For example –
data a; x1=2; x2=3; x3=.; nvars_miss=nmiss(of x1-x3); put nvars_miss=; run; proc print data = a; run;
The reverse function just reverse the string, if there are leading blanks they become trailing.
data a; x='abcdef'; change=reverse(x); put change; run;
Output –
fedcba
The INTNX function increments day, year or month specified a date by the parameters given. It is a very helpful function present in sas. Few examples below can show you how -
Here it increments the date by 6 weeks, means date after 6 weeks including current week.
data _null_; x=intnx('week', '23mar2012'd, 2); put x date9.; run; output - 01APR2012 The intnx function increments the day, year or month on the specified date, you can use put as well; data amit; x='13oct1981'd; z=put(intnx('month',x,3,'e'),date9.); run; proc print data=amit; run;
output -
Obs x z
1 7956 31JAN1982
The intnx function increments the day, year or month on the specified date, you can use the same argument for incrementing 1 month from the date;
data amit; x='13oct1981'd; z=put(intnx('month',x,1,'same'),date9.); put z=; run; Output - z=13NOV1981
The MONOTONIC function is quite similar to the internal variable _N_ in the DATA Step. We can use it to select the records according to their row number. For example, we choose the SSNs from the 501st line to the 888th line in the SSN dataset.
proc sql;
select * from ssn_data where monotonic() between 501 and 800 ;
quit;
you can also the monotonic() to subset the dataset in the where clause to limit the number of records read/outputted to the final dataset…
proc sql; create table class2 as select monotonic() as rowno, * from sashelp.class where 10 le monotonic() le 20; quit;
Because of Cartesian product - Cartesian product result-set contains the number of rows in the first table, multiplied by the number of rows in the second table.
data a; input name $ 1. marks; cards; a 100 b 200 c 300 ; run; procmeansdata = a summean; run;
Output
One of the most frequently posed SAS interview questions, be ready for it.
In this code, we are giving a data or you can say creating a table using ‘datalines’ but printing the rows which only have status as ‘PT’. We have used loop here to check each row and thus printing the desired results.
Obs | Name | idno | status | Benefits |
---|---|---|---|---|
1 | Richa | 1002 | FT | Eye/Dental |
2 | Simmi | 433 | FT | HMO |
Here in this code we have A (Amount) = 100000 and I (interest) = 0.0925. We need to find Amount and updated interest till five years also the interest increases by 10% every year.
Obs | A | i | Year |
---|---|---|---|
1 | 109250.00 | 0.10175 | 1 |
2 | 120366.19 | 0.11193 | 2 |
3 | 133838.17 | 0.12312 | 3 |
4 | 150315.99 | 0.13543 | 4 |
5 | 170673.18 | 0.14897 | 5 |
This code is only an illustration to create a table declares variables name, age, and dob with formats and taking values using datalines.
input name$ age dob date9.; this statement basically creating variables and giving format to dob as date9.
Obs | name | Age | Dob |
---|---|---|---|
1 | am | 67 | Oct 13, 1981 |
2 | pr | 66 | Apr 10, 1982 |
A staple in SAS logical interview questions, be prepared to answer this one.
Obs | J | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | x11 | x12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 13 |
This question is a regular feature in tricky SAS interview questions, be ready to tackle it.
We can clear the libref through one of the below approaches:-
We can use the below syntax to disassociate a specific library
Syntax : LIBNAME libref <CLEAR>
Example : Libname temp_lib clear where temp_lib is the libref that you want to deassign If, we would like to disassociate all the libref that we have declared in a SAS Session, then use the below-modified version
LIBNAME _ALL_ <CLEAR> Example: Libname _ALL_ clear
We can use the LIBNAME function to deassign a library reference. In case, a warning or an error occurs then a message is written to the SAS log.
%if (%sysfunc(libname(temp_lib))) %then %put %sysfunc(sysmsg());
where temp_lib is the libref that you want to deassign.
Note:- In both the above approaches libref is cleared if and only if there are no open files that are associated from that libref.
On the Explorer window go to libraries. Right click on the library that you want to design and select delete. Once you click on Delete you will receive a Delete confirmation with the text "Are you sure you want to remove the library reference named temp_lib?". Once you click Ok, the same is going to get deleted.
In this code, we are trying to show the characteristics of ‘do while’ loop. For each iteration, it’s increasing the counter and writing in the output dataset. Please refer the output table below -
Obs | I | X |
---|---|---|
1 | 1 | 1 |
2 | 2 | 2 |
3 | 3 | 3 |
4 | 4 | 4 |
5 | 5 | 5 |
id | val | Na |
---|---|---|
1 | 5 | 478 |
1 | 5 | 300 |
1 | 5 | 523 |
1 | 5 | 326 |
1 | 5 | 547 |
Output of the code
Obs | Pt | visit | Ae |
---|---|---|---|
1 | 1 | 1 | Headache |
2 | 1 | 2 | Kill |
3 | 1 | 3 | Pain |
4 | 1 | 4 | Knee |
5 | 2 | 1 | Tick |
6 | 3 | 2 | Fle |
SET and Merge are two very vital statements in SAS. Let us understand by taking 2 tables ‘one’ and ‘two’ and perform SET and Merge Operation on them.
Code - DATA one; | Output
|
Code - DATA two; INPUT id v3 v4; DATALINES; 1 1000 10000 2 1500 15000 3 2000 20000 4 800 30000 ; PROC SORT Data=two; BY id; RUN; | Output
|
The SET statement is the main tool to read observations from a SAS data set. The basics are so obvious you might not even consider them. Just SET and a data set name and there you go. Through SET statement the dataset gets appended while with merge the two datasets get merged on the basis of ID and extra variables are created if needed. Example below.
Code - DATA TestSet; Set one two; RUN; Output
| Code - DATA TestMerge; MERGE one two; BY id; PROC PRINT DATA= TestMerge; title "TestMerge"; RUN; Output
|
data a; input name $ 1. age; cards; a 12 b 15 c 18 ; run; procprint; run;
data b; input name $1. marks; cards; a 200 c 150 ; run; procprint; run;
procsortdata = a; by name; run; procsortdata = b; by name; run; data final; merge a(in=x) b(in=y); by name; if x and y thenoutput; run; proc print; run;
When we submit the DATA step for execution. SAS checks the syntax of the submitted code and compiles them. During the compile phase, SAS created 3 items:
ERRORS = specifies the maximum number of observations for which SAS issues complete error messages.
Note: - If there are more errors then what is specified in the option then a warning message displays which lets the user know that the limit that was set has been reached.
So, ERROR = 1 will only issue an error
Example :
data a input name $ 1. age; cards; a 20 b 23 procpint; run; 1st Error -
Before setting the options ERRORS = 1
This is a frequently asked question in SAS interview questions.
Answer – the RUN statement is used to define the step boundary, it tells the compiler that the previous set of SAS statements are to be executed. Additionally, sometimes it also helps to make the code easy to read and interpret in the log.
DATA TEST; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.; CARDS; A 95 96 99 M B 85 90 70 F C 60 70 80 M ;
DATA A; INPUT NAME$ SUBJ1 SUBJ2 SUBJ3 GENDER $1.; CARDS; A 95 96 99 M B 85 90 70 F C 60 70 80 M ; RUN; DATA NUM (KEEP=_NUMERIC_); SET A; RUN; PROCPRINT; RUN;
DATA CHAR (KEEP=_CHARACTER_); SET A; RUN; PROCPRINT; RUN;
Generally, macro variable references shown so far have been direct macro references that begin with one ampersand: &name. However, indirect reference is also possible where macro variables name that belongs to a series is determined when the macro variable reference resolves. The macro facility provides indirect macro variable referencing, which allows you to use an expression (for example, DRUG&N) to generate a reference to one of a series of macro variables. For example, you could use the value of macro variable N to reference a variable in the series of macro variables named DRUG1 to DRUG10. If N has the value 8, the reference would be to DRUG8. If the value of N is 3, the reference would be to DRUG3.
Although for this example the type of reference you want is DRUG&N, the following example will not produce the results that you expect, which is the value of &N appended to DRUG:
%put &drug&n; /* incorrect */
This produces a warning message saying that there is no macro variable DRUG because the macro facility has tried to resolve &DRUG and then &N and concatenate those values.
To force the macro processor to rescan a macro variable reference, you use more than one ampersand in the macro variable reference. When the macro processor encounters multiple ampersands, its basic action is to resolve two ampersands to one ampersand. For example, to append the value of &N to DRUG and then reference the appropriate variable name, you use:
%put &&drug&n; /* correct */
Assuming that &N contains 6 when the macro processor receives this statement, it performs the following steps:
Thus using multiple ampersands (&) for reference to a macro variable is called indirect referencing and few characteristics of indirect referencing are -
Line hold specifiers are basically ‘@’ and ‘@@’ where ‘@@’ is used when we have multiple observations per line of raw data. It is used at the end of the INPUT statement. It tells SAS to hold that line of data and continue to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double trailing @. When we use @ only without specifying n, SAS will hold that line of data until it reaches either the end of the DATA step or an INPUT statement that does not end with a trailing @.
Example –
One has a lab testing data file. The ASCII file layout looks like as follows,
Where the first field is the test subject and the rest of the fields are repeated testing results for the subject.
One wants the final file has following format,
The difficult part of this example is that the numbers of tests vary from subjects to subjects. However, if one reads each line with a trailing @, one needs not to know the numbers of tests. The logic is straightforward. First, one reads the first field, and then reads in the rest of the fields and outputs it. Second, one needs to check when a line ends. If it ends, then one begins to read the next line.
data test1; length test_id $8 cnt result 8; infile cards eof=eof ; input @1 test_id $5. @ ; do while (1); cnt+1; input result ?? @; if result = '.' then do; cnt=0; input @@; leave; end; else output; end; return; eof: stop; cards; test1 11 2 test2 1 2 5 test3 2 ;
“input @1 test_id $5. @ ;” or “input result ?? @;” hold line position to prevent SAS from going to a new line. “if result = '.' then do;” tests if it is the beginning of a new line. “input @@;” holds the current line position. It is much simpler if one use double trailing @@ and a retain statement in this example. data test1(drop=dummy); length test_id dummy $8; retain test_id; input dummy @@ ; if input(dummy, ?? best.) ne . then do; result = input(dummy, 8.);
output:
end; else test_id=dummy; cards; test1 11 2 test2 1 2 5 test3 2 ;
The Program Data Vector is a logical area of memory that is created during the data step Processing.
- Suppose we have a data set
data a; input id name$; cards; 11 A 12 B ; run;
The name of the excel sheet containing the drugs code is “all_other_drugs.xlsx”
Proc import OUT= drugs_code_ndc(keep=NDC) /*this line keep NDC code only from the excel sheet and creating a dataset named “drugs_code_ndc”. */
Datafile= "%sysfunc(pathname(saslib))/all_other_drugs.xlsx" /*this line importing the excel file from saslib and the file name is “all_other_drugs.xlsx” */
DBMS=xlsx REPLACE; SHEET="sheet1"; GETNAMES=YES; Run; proc sql noprint; select distinct compress("'"||NDC||"'") into : DrugNDC_Excl_Code separated by "," from drugs_code_ndc; quit; %put &DrugNDC_Excl_Code; /* Note this can be be useful way to detect the value of variables.*/
Here we can use like “if drug_code in(&DrugNDC_Excl_Code) then drug =1; this will flag all the patients who have the drugs in the list.
Expect to come across this popular question in SAS scenario based interview questions.
This will export all the data in csv format at the location “saslib” at path mention above.
Table all Linaclotide Lubiprostone Plecanatide/missing list;
Run; Ods rtf close;
This will export the frequency report in .rtf format at the location “saslib” at path mention above.
Data Allclm_MM; set mm_diag_data; if '01JAN2010'D <= date_svc <= '31DEC2018'D ; array diag(*) $diag1 – diag12; do i=1 to 12; if (icd_ver = "9" and diag(i) in :( "2030" )) or (icd_ver = "10" and diag(i) in :( "C900" )) then f_MM_diag = 1; end; if f_MM_diag = 1; run;
We can count this by using FREQ procedure that prints all values of a given categorical variable in the Output window, along with the number and percentage of times each value appears. The FREQ procedure can work with both string (character) and numeric categorical variables. In the case mentioned above, we can code like –
Proc freq data = Pat_details;
Table all Linaclotide Lubiprostone Plecanatide/missing list;
Run;
This code will give the counts of the variable mentioned in the table statement and results will look like the one mentioned below -
Output –
Cumulative Cumulative
all Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 10 100.00 10 100.00
Cumulative Cumulative
Linaclotide Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 7 70.00 10 100.00
Cumulative Cumulative
Lubiprostone Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 2 20.00 10 100.00
Cumulative Cumulative
Plecanatide Frequency Percent Frequency Percent
-----------------------------------------------------------------------
1 1 10.00 10 100.00
Proc sort data = Pat_details; by pat_id; run; Data all_drug Linaclotide Lubiprostone Plecanatide; Set Pat_details; by pat_id; If all = 1 then output all_drug; If Linaclotide =1 then output Linaclotide; If Lubiprostone =1 then output Lubiprostone; If Plecanatide = 1 then output Plecanatide; Run;
The code above will create three different datasets with name all_drug Linaclotide Lubiprostone Plecanatide respectively in work library which has only respective drugs patients.