Programa P1-3 Flashcards

Data step internals Data Handling Data step processing

1
Q

What is the use of the (1) ON THE ODS HTML statement?

A

The use of the (1) on the ODS HTML statement is to ensure a new HTML destination is opened and closed
without affecting the default HTML output which the SAS Display Manager uses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the PUT Function.

A

The PUT function is used to convert numeric values to character values or character values to other character values.

targetvariable=PUT(variable,format.);
* PUT function create new character variables based on the input variable in the first argument, and the format in the second argument

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the INPUT Function.

A

The INPUT function reads the value of a variable using a specified Informat, as opposed to writing the value of a variable having applied a
Format.

targetvariable=INPUT(variable,informat.);

*The INPUT function is used to convert character values to numeric values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the layered structure of the SAS System where is the DATA step situated?

A

The Supervisor layer:
* This core layer is fully portable between systems.
* It defines the environment used by the SAS System
*It contains the most-used core routines .e.g
- DATA step processor
- Data set management
- Command parsing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the layered structure of the SAS System where is the PROC step situated?

A

The Applications Layer:
* This is the ‘top’ layer, representing all the procedures and ancillary services.
* The layer is fully environment-independent
* It represents approximately 70% of the total code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the DATA step processor is responsible for?

A

The compilation and execution of a DATA step is controlled by the DATA step processor. It’s responsible for:
* Checking the syntax of DATA step code;
* Compiling DATA step source code;
Optimising the executable image;
* Executing the resulting machine code;
* Accessing data through the input / output Engine Supervisor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the layered structure of the SAS System where is the OS + Memory/storage situated?

A

The Host Layer:
* Provides the interface to the host environment and controls the following:
- Using operating system calls to perform resource allocation to each task;
- Memory management;
- Dynamic loading and unloading of programs;
- Efficient storage of SAS data sets;
- Input / Output services;
- Generation of host dependent machine code;
- Full-screen support and error handling services.
* The Host layer represents 10% of the code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In detail, explain the compilation phase.

A
  1. Compile:
    * Translation from source code to machine code.
    * The first action of the DATA step is to scan the SAS statements for syntax errors. If any errors are found, messages are written to the Log and the DATA step stops.
    * A token at a time is transferred to the DATA step compiler or procedure parser.
    * Upon reaching a step boundary, the DATA step processor stops processing the SAS statements and the completed step is compiled, or parsed in the case of a procedure. The compiled step is then passed for execution.
  2. CREATE:
    * Definition of Input and Output files, including variable names, their locations and attributes.
    * Results in an input buffer (area of memory) is created for each raw data files being input= The header portion of the dataset is defined.
    * Creation of the (LPDV): It is used at execution time to hold the current observation and to determine which variables are to be initialised to missing (ITM), at each DATA step
    iteration.
    * Optimisation of code and passing of information to the I/O Engine Supervisor, which determines the index to be used.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Briefly describe what the DATA Step Processor is.

A

The DATA Step Processor
* The DATA step is a basic building block of any SAS program
- Used to read in data from a file, perform calculations or manipulations on the data and then
output the observation to a SAS data set. These actions are repeated for each input data record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In short words, explain the Compilation phase

A
  1. COMPILE;
    *syntax scan
    * SAS source code translation to machine language
    * Definition of input and output files
  2. CREATE:
    * input buffer (if reading any non-SAS data),
    * Program Data Vector (PDV),
    * and data set descriptor/ header information
    * set variable attributes for output SAS data set
    * capture variables to be initialized to missing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List compile-time only statements

A

⇒ drop, keep, rename
⇒ label
⇒ retain
⇒ length
⇒ format, informat
⇒ attrib
⇒ array
⇒ by
⇒ where

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the Execution phase

A

This is where the data is read into the LPDV, usually one record at a time, and the SAS statements in the
DATA step are executed

1. BEGIN:
* The DATA statement is executed and options on the statement are processed
* An automatic variable N is set up by the SAS supervisor. N increments by 1, each time the DATA statement executes. Thus, the DATA step counts the number of times it loops.

  1. ASSIGN
    * The Assign stage generates an ‘Initialise to Missing’ (ITM) instruction which sets the requisite storage areas in the LPDV to missing.
  2. DATA READ:
    * The machine code invokes the I/O Engine Supervisor, which selects the next observation from a data set, or the next raw data record from an external file
    * Test to see if the end-of-file marker has been reached.
    * If it has then the data set is closed and execution phase is complete.
  3. READ:
    * A raw data record is read into the LPDV via the Input Buffer. A SAS observation is read directly into the LPDV.
    *Special variables are given their values and control is then passed to the next executable statement, following the read statement
  4. EXECUTE:
    * With values in the LPDV, other program statements are executed.
    * New values are calculated and existing values are manipulated.
  5. WRITE:
    *At the end of the DATA step or when an OUTPUT statement is reached, the machine code instructs the I/O Engine Supervisor to copy the values of variables to be kept, from the LPDV to the output data set(s).
  6. RETURN;
    *At the end of the DATA step or when a RETURN statement is reached, the process flow loops back to the top of the DATA step and step 1 begins again.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the DATA step debugger in SAS?

A

The SAS System provides the DATA step debugger as a means of routing out logical errors during the execution phase of a DATA step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the DATA step debugger in SAS?

A

A means of routing out logical errors during the execution phase of a DATA step. Allowing step by step examination of the DATA step execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the DATA step processor is responsible for?

A

The compilation and execution of a DATA step is controlled by the DATA step processor. It’s responsible for:
* Checking the syntax of DATA step code;
* Compiling DATA step source code;
Optimising the executable image;
* Executing the resulting machine code;
* Accessing data through the input / output Engine Supervisor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In summary, what happens during the Execution phase?

A

The Execution Phase then loops through the DATA Step performing the following operations:
* Increments N by 1 and initially executes the DATA statement, evaluating any data set options;
* Sends an Initialise To Missing instruction which resets all variables which are not retained to missing;
* Checks if there is a record to be read into the LPDV;
* If there is a record, this is then read into the LPDV. If a record is not present and the end-of-file marker is detected, the DATA Step completes execution;
* Further statements are executed sequentially within the DATA step;
* On encountering the RUN statement, or an OUTPUT statement, a record is generated in the output table(s);
* On encountering the RUN statement, or a RETURN statement, the DATA step loops and performs the same instructions again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the difference between a function and a call routine?

A

They are similar to functions and there is often a function with a call routine of the same name.
For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define what a function is.

A

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the The LOWCASE Function?

A

The LOWCASE function provides a simple method of making the input character argument lower case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is The PROPCASE Function?

A

The PROPCASE function takes a character argument and capitalises the first character of each word, then lowercases the remaining characters of each word.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the The FIND Function?

A

The FIND function can be used to locate a specified set of characters within a string. It returns the position at which the specified substring first occurs.
* Similar func to INDEX except it allows modifiers and starting pos to be specified.

Target_var=FIND(string,substring<,modifiers><,startpos>);

i.e=.
find(proddesc, “Light”, “i”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the COMPRESS Function?

A

The COMPRESS function removes specific characters from a string.

Target_var = COMPRESS(str<,chars><,modifiers>);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define what a function is.

A

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does the COMPBL function do?

A

The COMPBL function searches a string (the first argument) for multiple blanks, and translates them to a single blank.

Target_char_var = COMPBL(character_variable);

25
Q

What is the TRANSLATE function?

A

The TRANSLATE function is used to replace specific characters within a string.

Target_var = TRANSLATE(str, to, from ;
e.g
**data null;
length first_15 $ 15 last_4 $ 4 showcard $ 19;
actcard”0987/6543/2100/4456”;

first_15=substr(actcard,1,15);
last-4=subsr(actcard,16,4);

first_15=translate(first_15, "**********","0123456789/");
showcard=cat(first_15,last_15);
put showcard= run;**
26
Q

What is the TRANWRD function?

A

The TRANWRD function removes or replaces all occurrences of a string pattern within a
string.

Target_var = TRANWRD(str, from ,to);
**data correction;
length new_email new_postcode $ 50;;
set pdata.customer(keep=email postown);

new_email=transwrd(email, "hotmail.co.uk", "hotmail.com")
new_postcode=tranwrd(postcode,"not central", "");   run;

proc print data=correction;
var email new_email postcode new_postcode;
where email contains “hotmail.co.uk”;
run;**
*The default length of the variable returned from the TRANWRD function is 200 bytes

27
Q

Differentiate between the LENGTH, LENGTHN and LENGTHC Functions

A

These functions are used to return the length of character strings, but have subtle differences, which are summarised below:

Function Trailing Blanks Length if string is blank
* LENGTH: Trailing blanks Excluded -Len if string is blank 1
* LENGTHN Trailing blanks Excluded -Len if string is 0
* LENGTHC: Trailing blanks-Len if string is 1

28
Q

Explain the MISSING function.

A

The MISSING function checks for a missing value within its argument and returns a 1 if it detects a missing value, otherwise it returns a 0.

syntax: missing(argument)
eg. where missing(overtime)=1;

29
Q

How is the LAG function used?

A

The function creates a queue (in memory), which stores the value of the variable from previous observations. Multiple LAG functions
can be used, each maintaining a separate queue.
example;
** data epi_summary;
set pdata.epidemic_summary;
diff= weektot-lag(weektot);
pct_diff=diff/lag(weektot);
run;

proc print data=epi_summary;
format pct_diff percent8.2;
run; **

30
Q

What issues can arise from using the LAG function and how can it be resolved?

A
  • Issue *:The LAG function will only remember the value before the lag function was ran.
  • solution *: By running the lag function at the beginning of the DATA step, ensuring that none of
    them are within conditional blocks or statements. The LAG function will recall every previous value.
31
Q

When would you use a LAG function inside a conditional block?

A

When using FIRST. and LAST. BY group processing. The reason being, that it will only set the LAG value at the beginning or end of the BY group, depending on how it
is coded .e.g.;

** proc sort data=pdata.orders out=orders;
by orddate;
run;

data order_summary (keep= orddate totord ordchange);
set orders;
by orddate;

if first.orddate then totord=0;
totord+ordprice;

if last.orddate then do;
prev_totord=lag(totord);
ordchange=totord-prev_totord;
output;
end; run; **
32
Q

What does PROC FORMAT do?

A

Proc FORMAT is used to create user defined formats in the SAS system

33
Q

Give an example of formatting missing values.

A

For reporting purposes, missing character and numeric values can be handled using formats.

** proc format;
value missing_num_fmt .=”Unknown”;
value $missing_char_fmt “”=”Unknown”;
run;

proc print data=pdata.results;
format age missing_num_fmt. class $missing_char_fmt.;
run; **

  • The MISSING system option can also be used to display missing numeric values differently.
    options missing=X;

** proc print data=pdata.results;
format age missing_num_fmt. class $missing_char_fmt.;
run;

options missing=. **

34
Q

How can you format multiple coded values and ranges?

A
  1. The comma syntax is used in situations where a variable contains multiple coded values that are non-consecutive
  2. Hyphens in between the range.
  3. A combinations of commas and hyphens can be used to define a series of values that are collectively covered within a single format label.
35
Q

Is it possible to create a format from a data set?

A

To create or modify a format definition using a data set, specify the CNTLIN= option on the PROC FORMAT statement. Note that the data set must contain certain key columns such as FMTNAME, START, END (with numeric formats), LABEL and TYPE. e.g.;

** data prods (keep=fmtname type start label);
set pdata.products(rename=(prodno=start proddesc=label));
fmtname=”prodfmt”;
type=”C”;
run;

proc format cntlin=prods;
run;

proc print data=pdata.orders;
var orddate ordno prodno ordprice;
format orddate date9. prodno $prodfmt.;
run; **

36
Q

how can Formats be used to Perform a Table Lookup?

A

User-defined Formats are often used to add columns to a data set.

data prods (keep=orddate ordno quantity price prodno proddesc);
set pdata.orders;
proddesc=put(prodno, $prodfmt.);
run;

37
Q

Explain exporting a format definition to a data set.

A
  • The ability to output the definition of one or more formats to a data set.
  • This provides an effective way of modifying a format definition.
  • Exporting the definition is done by using the CNTLOUT= option on the Proc FORMAT statement.
38
Q

What is PICTURE FORMATS?

A
  • Picture formats that display numeric values and date values in a predefined template using PROC FORMAT.
  • This can be particularly useful for including currency symbols and making large numbers more readable. e.g.

data overtime;
set pdata.employee(where=(overtime ne .) keep=empno overtime budget);

otbalnce=budget-overtime;
label otbalnce="Balance"; run;

proc print data=overtime label;
format otbalnce debcred.;
run;

39
Q

How can you format dates using PICTURE FORMAT?

A

Using the DATATYPE= option, date, time or datetime formats can be created.
** proc format;
picture dmy (default=10) other=”%d-%m %Y” (datatype=date);
run;

proc print data=pdata.employee noobs label;
var empno dob date_joined;
format dob date_joined dmy.;
run; **

40
Q

How can you store a format in a permanent library?

A
  • The user must specify the LIBRARY= option on the Proc FORMAT statement i.e
    ** proc format library=pdata; **
  • To access the permanently stored format, specify the FMTSEARCH= Global System option i.e.
    options fmtsearch=(pdata)
  • The FMTSEARCH option can be used to list more than one library. The order the format libraries are entered will determine the order in which they are searched
41
Q

Define Data step options

A

Wherever a data set name is used in a SAS program, data set options can be included. Some are used to override SAS system option values for a particular data set, but many are of use in a data management context.
* Data set options are enclosed in parentheses after the name of the data set that they apply to.

42
Q

What is the order of execution for DROP, KEEP, RENAME and WHERE Data set options?

A

The DROP, KEEP, RENAME and WHERE options are always executed in alphabetical order.

43
Q

Name some logical operators.

A

BETWEEN-AND
* Selects rows where the value of a variable lies within a range of values. This includes the range boundary values.
CONTAINS (?)
* Selects rows where a specified string of characters exists within a character value. The position is irrelevant, but it is case sensitive. “?” is a synonym for “CONTAINS”.
LIKE
* Selects rows where the values of a character variable match a specified pattern. The pattern is defined using the percent and underscore characters, where % allows any number of characters in that position, whilst the underscore denotes any single character in that position.
IS MISSING or IS NULL
* Selects rows where the value of the specified variable is missing or null.

44
Q

What is the use of Wildcard Operators?

A

Wildcard operators can be used with data set options such as DROP and KEEP in order to list ranges or groups of variables. e.g.
* Hyphen wildcard: Lists a continuous range of variables that have a common prefix and a numeric suffix
* Double hyphen: Lists a continuous range of variables by specifying the name variable at the start of the range and the name of the variable at the end of the range. Based on the order of columns
* Colon wildcard is used to specify all variables that begin with a particular prefix
* The NUMERIC wildcard is used to refer to all numeric variables within a table.
* The CHARACTER wildcard is used to refer to all character variables within a table.

45
Q

Explain bounded DO loops.

A

The number of times the loop will iterate is fixed, even in cases where the start, stop and increment values are determined by variable values.

46
Q

What are nested DO loops?

A

Nesting DO loops within DO loops is very useful when factors need to be varied at different rates relative to one another

47
Q

Discuss conditional termination

A

Bounded and nested loops have a fixed number of iterations defined by their start, stop and by values. Suppose that it’s not known how many times the loop executes
and the requirement is for the loop to continue until a condition is met, or while a condition remains true. This requires the use the DO WHILE or DO UNTIL structures

  • DO WHILE: The loop continues while the condition is true.
  • Expression is evaluated at the top of the loop so the code in the loop is not necessarily executed at all.
  • DO UNTIL: The loop executes until the condition becomes true
  • Evaluated at the bottom of the loop, so the code within the loop must be executed at least once
48
Q

Explain and ARRAY

A

An array is a simple way of referring to many variables by a single name.
* Arrays can only be done in a DATA step and are a COMPILE only step
*The only reference ONLY Numeric variables or ONLY character variables

ARRAY arrayname {n} $ length array-elements (initial-values);

49
Q

What does the OBS= data set option do?

A

The OBS= data set option is used to specify the number of the last observation to be processed.

50
Q

List uses of an ARRAY

A
  • Create similar variables;
  • Help read certain data structures;
  • Repeat actions for variables;
  • Perform table look-ups.
  • The array will create variables in the LPDV if they do not already exist
51
Q

Define a SELECT group.

A
  • The collection of SELECT, WHEN and OTHERWISE statements is known as a SELECT group, which must be closed using an END statement
  • SELECT groups are an alternative to using IF/THEN/ELSE statements, but are generally considered to be a better choice when there are multiple conditions to evaluate, as the code is easier to read.
  • SELECT groups represent a more defensive way of evaluating conditions, as
    they force the use of the OTHERWISE clause where there is no match on any of the WHEN
    statements.
52
Q

Describe SELECT Statements

A
  • SELECT statements are an effective way of evaluating a group of mutually exclusive conditions, which are based on a common expression. Each condition is specified using a
    WHEN statement, which defines one or more SAS statements that are only executed when the condition is true.
  • SELECT statements can be followed by one or more WHEN statements and can optionally
    include an OTHERWISE statement to handle the cases not catered for by the WHEN expressions.
53
Q

What is the DIM() function ?

A
  • This function returns the number of elements in an array.
    The syntax is:
    num = dim(arrayname) ;
  • It is often used as the upper bound of a DO loop and avoids having to re-code the loop when the number of elements in the array changes.
54
Q

Define what a function is.

A

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

55
Q

What is the difference between a function and a call routine?

A

They are similar to functions and there is often a function with a call routine of the same name.
For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.

56
Q

What is the use of processing Arrays with Functions?

A

Arrays provide a convenient way of processing a group of variables. Functions such as SUM, MEAN etc. can be used with arrays, where the arguments for the function take the
form:

variable=function(OF variable-list);

  • Processing Arrays with Functions where the variable list can be the name of an array or a list of array elements *
57
Q

What is the difference between a function and a call routine?

A

They are similar to functions and there is often a function with a call routine of the same name.
For example there is the CATS function and the CALL CATS() routine. However, the main difference is that the call routine cannot be used in an assignment statement.

58
Q

Define what a function is.

A

A SAS function is a routine that returns a single value resulting from zero or more arguments passed to that function.

59
Q

What is the use of processing Arrays with Functions?

A

Arrays provide a convenient way of processing a group of variables. Functions such as SUM, MEAN etc. can be used with arrays, where the arguments for the function take the
form:

variable=function(OF variable-list);

  • Processing Arrays with Functions where the variable list can be the name of an array or a list of array elements *