Advanced SAS Programming Techniques Flashcards
What is an index?
A file that stores values in ascending order for a variable or variables and includes information about the location of those values in the data file. It helps you access values directly rather than having SAS go down the rows until it finds a value.
Syntax to create an INDEX in a DATA statement (for a simple and composite index)
This creates two simple indexes
For a simple index:
data datasetname (index=(column1 column2/unique));
set datasetname;
run;
For a composite index:
data datasetname (index=(indexname=(column1 column2)));
set datasetname
run;
What option can you use to see information in the SAS log about index creation or index usage?
Default is n
options msglevel = i;
When is an index not used? (3)
With a subsetting IF statement in a DATA step
With particular WHERE expressions
If SAS thinks it is more efficient to read the data sequentially
Syntax to create or delete indexes on existing datasets with PROC DATASETS
NOLIST is optional. It suppresses the printing of the directory of SAS files in the SAS log and as ODS output.
proc datasets library = libref NOLIST;
modify datasetname;
index delete index-name;
index create index-specification;
quit;
Syntax to create or delete indexes on an existing dataset with PROC SQL
proc sql;
create unique index index-name
on table-name(column1, column2);
drop index index-name from table-name;
quit;
What information is printed out by the CONTENTS procedure or the CONTENTS statement in PROC DATASETS? (5)
- general and summary info
- engine/host dependent info
- alphabetic list of variables and attributes
- alphabetic list of integrity constraints
- alphabetic list of indexes and attributes
Syntax for PROC CONTENTS
Syntax for PROC DATASETS with CONTENTS statement (2)
nolist is optional and suppresses the printing of the directory of SAS files in the SAS log and as ODS output.
proc contents data=libref.dataset;
run;
proc datasets library=libref nolist;
contents data=dataset;
quit;
proc datasets nolist;
contents data=libref.dataset;
quit;
How do you list the contents of all files in a SAS library with PROC CONTENTS and with PROC DATASETS?
proc contents data=libref._all_;
run;
proc datasets library=libref nolist;
contents data=_all_;
quit;
What happens to an index if you sort the data in place with the FORCE option in PROC SORT?
The index file is deleted
How do you copy a data set to a new location using PROC DATASETS?
Using PROC COPY?
What happens if the data set contains an index?
A new index file is automatically created for the new data file
proc datasets library = old-libref nolist;
copy out = new-libref;
select datasetname;
quit;
proc copy out = new-libref in = old-libref
[move];
select datasetname;
run;
quit;
How do you rename an indexed dataset so as to preserve the index?
Note the index is automatically renamed as well
proc datasets library=libref nolist;
change old-datasetname = new-datasetname;
quit;
How do you rename variables within an indexed data set so as to preserve the index?
If you rename a variable for which there is a simple index, the index is also renamed. If the variable is used in a composite index, the composite index automatically references the new variable name.
proc datasets library=libref nolist;
modify datasetname;
rename oldvarname1 = newvarname1
oldvarname2 = newvarname2;
quit;
How to combine raw data files vertically using FILENAME?
filename fileref (‘file-path1’ ‘file-path2’);
COMPRESS function
default characters-to-remove is blank spaces
Allows you to remove characters from a string
compress(source-string, characters-to-remove);
today function
Returns the current date from the system clock as a SAS date value
INTCK function
INTCK(interval, start-date, end-date)
Returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values.
e.g.
monthnum = intck(‘month’, ‘25aug2000’d, ’05sep2000’d)
monthnum = 1
INTNX function
INTNX(interval, start-from, increment)
Gives the date that corresponds to the beginning of the next interval
e.g.
next = intnx(‘month’, ‘25aug2000’d, 1)
next = 01sep2000
Syntax for combining SAS data sets vertically using PROC APPEND
base names the data set to which you want to add obs
proc append base=sas-data-set data= sas-data-set;
run;
What is more efficient to combine two SAS data sets vertically: PROC APPEND or a DATA step? Why?
PROC APPEND
Because it only reads the data in the DATA= data set, not the BASE= data set.
The DATA step SET statement would read both data sets
What option can you use when you’re combining SAS data sets vertically using PROC APPEND and the DATA= dataset has more variables than the BASE= dataset?
Note that this will drop the extra variable
FORCE
proc append base=SAS-dataset data=SAS-dataset force;
run;
Can you append (combine vertically) datasets when a common variable has different lengths in each dataset?
What option can you use to do it?
Note that truncating may occur
No, you will get an error
Use the FORCE option
What happens when you’re appending datasets where a common variable has different data types?
How can you append the datasets?
What happens when you do this?
You’ll get an error.
Use the FORCE option
For the variable with the different data types, you’ll get missing values in the DATA= dataset.
What happens when you’re appending datasets with PROC APPEND where the BASE= dataset has more variables than the DATA= dataset?
You get missing values for the extra variable