Programa P4-5 Flashcards

1
Q

Explain a SET statement

A

The SET statement is used for READING SAS data sets. It is an executable statement and can be placed under program control.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

List examples of SET Applications

A

Single File Read:
* SET statement to read a SAS data set observation by observation i.e.
* set pdata.demog; *

Concatenation:
* Concatenating files is achieved by listing the data sets on the SET statement i.e.
* set pdata.demog1 pdata.demog2 *
* will concatenate both data sets together. The resulting data set will
have all observations from table ‘A’ at the top of the data set, those from table ‘B’ at the end.
Multiple SET statements:
* Combines datasets: The output data set will contain variables from all input datasets, with any common variables overwriting values from earlier SET statements.
* The DATA step will end when an end-of-file marker is reached in any of the input datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When would you use the NOSORTED Option?

A
  • By using the NOTSORTED option on the BY statement, FIRST. and LAST. can also be used with Grouped data.
  • forms groups within the data but allows the same group to be repeated and also allows groups to appear out of ascending or descending sequence, the required information can be extracted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you identify the end of a Data Set Using the END= Option?

A

The END option is used to find out when the last row is read from a SAS table. This option is included on the SET statement and defines a column in the Logical Program Data Vector which is set to a value of 1 for the last row in the table, otherwise it contains the value .i.e.

set pdata.demog end=last_observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to Determining the Number of Observations using the NOBS= Option

A
  • Sometimes it is necessary to determine the number of observations in the data set. This
    can be achieved by using the NOBS= option on the SET statement.
  • Adding a STOP; statement before the SET statement ensures that the data is not read it. If used with a PUT stateent, SAS will grab the nobs and outputs them to the log.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the use of Sampling from SAS Data Files

A

When analysing a large volume of data, it is often useful to be able to take samples of the data in order to reduce processing time and the associated cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define the RAND Function and CALL STREAMINIT Routine

A
  • The RAND function will return a stream of random numbers based on the distribution argument passed to it:

targetvariable = RAND(distribution);

  • The CALL STREAMINIT routine is used prior to the RAND function to specify a seed value used for any subsequent RAND functions. The CALL STREAMINIT routine needs to be used just once per DATA step.

CALL STREAMINIT(seed);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

TRUE/FALSE
You can generate a stream of random numbers with negative integers

A

To generate a reproducible stream of random numbers (Pseudorandom) then the seed
value must be any positive integer. Any nonpositive seed (or simply not using the CALL STREAMINIT routine) will cause SAS to generate a seed from the system clock and therefore the random numbers generated by the RAND function will not be reproducible
* Using the CALL STREAMINIT routine, with a positive integer ‘seed’, the same stream of
random numbers will be generated every time this step runs (Pseudorandom).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

TRUE/FALSE
The power of merging lies in Match-Merging, where rows are matched on a key and merged - i.e. with a BY statement present

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List components of Match-Merging

A

Match-Merging
* Requires common BY variables on both data sets;
* Only observations with matching BY variables will be paired;
* BY variables must have the same names and types in both data sets. It is desirable
for their lengths to correspond also (though with care it is possible to perform a merge where BY variables have different lengths);
* Data sets must be sorted or indexed by the BY variables;
* FIRST. and LAST. are generated automatically;
* Duplicate BY values can be present in any data set;
* Any number of data sets can be merged;
* The usual data set options apply;
* Possible selection statements against two data sets with IN= variables called A and
B would be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain uniqueness of data in terms of merging data

A

Uniqueness:
* Refers to how often values are repeated within a variable or column.
* 100% uniqueness occurs when every value of a variable is different e.g. user IDs in a data set that contains logon security details.
* Low levels of uniqueness are found in e.g. gender variables or yes/no type status flags.
* In between are variables that contain some element of uniqueness, such as people’s
surnames.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain cardinality of relationships in terms of merging data

A

uniqueness of the data helps in understanding the cardinality
of the relationship between data sets, in terms of the key variables that may be used to merge them.

Cardinality Relationship
* One to One: EmpID in Employee data set to EmpID in a Retired
Employees data set.
* One to Many or vice versa: AccNo in Account Details data set to AccNo in Transaction data set.
* Many to Many: CustNo in Orders data set to CustNo in Deliveries data
set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the MSGLEVEL= option used?

A

The MSGLEVEL= system option is used to specify the amount of detail that is printed in the SAS log when SAS code is executed.

  • The default value is N, which restricts the output to notes, warnings and error messages.
  • The alternative value is I, which outputs further information specifically relating to merge
    processes, use of indexes and sort procedures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False ?
Provide the corrected version for any false statements regarding DATA step merging.

a) Common BY variables must have the same name and length, but it is permissible for
them to be of different types;
b) Only observations with matching BY variables will be paired;
c) Data sets must first be sorted or indexed by the BY variables;
d) FIRST. and LAST. are generated automatically;
e) Duplicate BY values can be present in any data set;
f) Match merging allows only two data sets to be merged.

A

a) Common BY variables must have the same name and length, but it is permissible for them to be of different types.

Corrected:
Common BY variables must have the
same name and type, but it is permissible for them to have different lengths.

f) Match merging allows two data sets to be merged.

Corrected:
Match merging allows multiple data sets to be merged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe Data Summarisation?

A

Data Summarisation’ describes the process of collapsing data, in order to gain a higher level view of key factors and generate certain statistics such as totals, averages,
minimum values and maximum values. Methods such as PROC MEANS and PROC TRANSPOSE can achieve this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Write the syntax for PROC MEANS

A

Basic Syntax:
PROC MEANS <option(s)> <statistic-keyword(s)>;
BY variable-list;
CLASS variable-list;
VAR variable-list;
FREQ variable;
ID variable-list;
TYPES requested-combinations-of-class-variables;
OUTPUT <OUT=SAS-data-set> <output-statistic-list>;
RUN;</output-statistic-list>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the CLASS and VAR statements used for in PROC MEANS?

A
  • Adding a CLASS statement to the procedure introduces the ability to group the analysis by one or more classification / categorisation variables. Classification variables can be character or numeric, but tend to have discrete values by which to group the results.
  • The VAR statement is used to identify one or more analysis (numeric) variables for which
    a series of default statistics are output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is an output statement added to a PROC MEANS procedure?

A
  • Saving Results
    A common use of the MEANS Procedure is to produce an output SAS data set. Adding an OUTPUT statement to the step, now creates and output SAS data set
  • It is possible to generate more than one output data set within a single Proc MEANS step by using multiple OUTPUT statements.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the automatic variables generated in a PROC MEANS procedure and can you explain them?

A
  • The FREQ variable gives the number of observations at each ‘level’.
  • The STAT variable shows the name of the five (default) statistics produced for the output
    data set.
  • The TYPE variables gives values for the whole data set, for the data set broken down by the values of the class variable and values for the data set broken down by combinations of the class variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why might you use a MISSING option in a PROC MEANS step?

A

By default, a missing value for a classification variable is excluded from the analysis. It is important to be aware of this, as it can sometimes lead to the omission of significant data from the report
* The MISSING option overrides the default behaviour and use a missing value as legitimate value of a class
variable.

21
Q

Explain TYPE statements.

A

TYPES Statement
The default action of Proc MEANS is to generate type values for all combinations of the
specified class variables. By using a TYPES statement with the procedure, the process
can be restricted to specific combinations of the class variables

22
Q

What is the use of the NWAY option in a PROC MEANS step?

A

NWAY option is used to obtain the summary for the highest value of TYPE, giving the greatest degree of interaction between the class levels.
Rather than subsetting the output data set, simply use the NWAY option on the Proc MEANS statement:

23
Q

What are the key difference between a CLASS statement and a BY statement?

A
  • With any BY statement, its use in a procedure requires that the data is already sorted in the appropriate way, or has an index. This is not necessary with the CLASS statement.
  • The TYPE variable will have a value of 0 with a BY statement, but will have a value of 2n - 1 on a CLASS statement, where n is the number of class variables.
  • Missing values are ignored with the CLASS statement unless the MISSING option is specified. With the BY statement, missing values are treated as a valid
    classification
24
Q

Write the syntax for PROC TRANSPOSE.

A

Proc TRANSPOSE

Syntax:
PROC TRANSPOSE
<DATA=SAS-data-set>
<OUT=SAS-data-set>
<PREFIX=name>
<NAME=name>;
VAR variable-list;
ID variable;
COPY variable-list;
BY variable-list;
RUN

25
Q

TRUE/FALSE?

PROC TRANSPOSE processes character variables by default

A

FALSE
Proc TRANSPOSE processes just the numeric variables by default.

26
Q

Explain the use of the VAR Statement in a PROC TRANSPOSE step.

A

The default variable called NAME is used to hold the previous variable names, whilst variables that have resulted from the transposed observations have default names of
COL1-COLn, where n is the number of observations in the input data set.

27
Q

What does the ID Statement do in a PROC TRANSPOSE step?

A

An ID statement specifies a variable in the original data set whose values are to be used as the new variable names:

28
Q

Name the default value for the system option

A

VALIDVARNAME is ANY

Either variable naming rule can be set using an OPTIONS statement:

OPTIONS VALIDVARNAME= V7 | ANY ;

29
Q

What makes the BY statement so useful in PROC TRANSPOSE?

A
  • When Proc TRANSPOSE is run without a BY statement, one observation is output for each
    variable being transposed.
  • When a BY statement is included, the same is true, except that each variable being transposed is output once for each BY group.
30
Q

What do the NAME and PREFIX options do in PROC TRANSPOSE?

A
  • NAME= option is used to specify the name of a variable to use instead of NAME.
  • PREFIX= option gives the ability to supply a prefix for the variables names (overriding the default COL#).
31
Q

Define what macro SAS language is

A

The SAS macro language gives the capability to write dynamic programs whose statements can change depending on entities called macro variables
- in other words, macro are dynamic codes that change based on circumstances
- They are modified by a form of text
substitution and are identified by the ampersand (&). The end of the macro variable reference is identified with a dot (.).

32
Q

How is a macro variable processed?

A

During the compilation phase, when the SAS Supervisor sees an ampersand followed by a non-blank character (a macro variable) the macro facility is triggered. In turn, the macro facility determines the value for the macro variable and passes the value back on to the
input stack.

33
Q

There are two types of macro variables;

A
  • Automatic - defined by the macro processor, stored in internal work areas called Symbol Tables. SAS will scan its symbol tables for a macro variable of that name and if found, it will retrieve its value and place the
    value upon the input stack.
  • User-defined- defined in code by the SAS programmer.
34
Q

Name two ways that the values of SAS macro variables can be determined

A

Using SASHELP.VMACRO

  • The content of the global symbol table can be accessed via the SASHELP.VMACRO view.
    proc print data=sashelp.vmacro;
    where scope=”AUTOMATIC”;
    run;

Using %PUT Statement
* The %PUT statement writes text to the SAS log window. i.e.

%put automatic; OR
%put Today is &sysdate9.

35
Q

How are User-Defined Macro Variables created?

A
  • the simplest method comes from using the %LET statement.
  • Allows you to create a macrovariable and put it on the global symbol table with the value given.

%LET macrovariable = <value>;
e.g</value>

%let table=demog;
proc print data=pdata.&table;

36
Q

What are some rules for Macrovariables?

A

Macro variables:
* Can have a maximum length of 65,534;
* Their length is determined by the value assigned to them;
* Contain only character data, although numeric arithmetic can be achieved;
* Are independent of SAS data set variables;
* Have the same naming conventions as SAS data set variables (up to 32 characters etc);
* Have some reserved names that cannot be used such as COPY,OPEN, SAVE and UNTIL

37
Q

What is the benefit of creating macrovariables in a DATA step?

A
  • The %let statement is one way but it limits the value you provide to whatever you can type in your SAS program.
  • Call SYMPUTX routine, is another way of creating a macro variable and
    giving it a value, but this time from within DATA step execution

call symputx (argument1,argument2);

  • This routine opens up the possibility of passing data set variables as argument2 and therefore creating data-driven macro variables
38
Q

Explain PROC CONTENTS

A
  • Proc CONTENTS reports on the header information of a single SAS data set or data sets in
    a whole library.
  • When used with ALL, the NODS option lists the library members only, without printing the contents of each data set.
  • Proc CONTENTS also has the functionality to create an output SAS table containing the header information of the table or tables being examined.
39
Q

Explain PROC COPY

A
  • The most widely way of copying SAS data sets is Proc COPY.
    This SAS procedure allows copying of some or all the SAS data sets from one library to another

PROC COPY IN=Libref OUT=Libref ;
SELECT member-list < /
<MEMTYPE=mtype> ;
EXCLUDE member-list < /
MEMTYPE=mtype>

40
Q

Explain PROC DATASETS

A

Proc DATASETS can be used for a range of tasks including:
* Changing the name of a SAS data set;
* Copying data sets;
* Modifying data sets;
* Deleting data sets;
* Auditing data sets.
* Always use Proc DATASETS to rename data sets, as it changes the name in the Descriptor Portion of the data set as well. A host command will not do this.
* Proc datasets does not read the data
* Ends in a QUIT;

41
Q
A
42
Q

Explain PROC COMPARE

A

Proc COMPARE compares the values of variables in two SAS data sets and reports the differences found.
Proc COMPARE can be used for:
* Finding the differences between two SAS data sets;
* Verifying changes made to a Master File;
* Comparing variables within the same data set;
* Calculating the percentage difference between variables.

43
Q

Give the syntax for PROC COMPARE

A

PROC COMPARE
<DATA=Libref.dsname>
<COMPARE=Libref.dsname>;
VAR variable-list; /* base Variables /
WITH variable-list; /
Compare Variables /
ID variable-list; /
Variables to match observations /
BY variable-list; /
Variables defining BY-groups *

44
Q

In PROC COMAPARE how can the magnitude of difference at which the procedure reports unequal values be controlled?

A

By changing the METHOD= option and including a CRITERION= option. e.g.
method=absolute criterion=5
- The effect of these option settings, is that values will only be regarded as being unequal if they differ by
more than five.

45
Q

Explain the OPTIOS statement

A

SAS System options are global settings that control a wide range of tasks including the following:
* Reading of external files;
* Processing of data sets;
* Formatting of reports;
* Macro processing;
* Error handling;
* Interaction with the operating system.
The OPTIONS statement is used to change one or more system options.

46
Q

What is PROC OPTIONS?

A

SAS system options can be listed to the log by using Proc OPTION. e.g.

proc options;
run;

*To control the number of options listed, a number of options can be added. e.g.

proc options long group=sasfiles;
run;

47
Q

What ways can be used to retrieve system option values?

A
  • There are two techniques to save and restore values of system options after you have changed them with an OPTIONS statement.
  1. GETOPTION(defaultvalue) or GETOPTION(startupvalue).
  2. Macrovariables: Once stored in
    a macro variable, the values of the system options can then first be changed and then ultimately reset back to their original values using the macro variables. e.g

data null;
ps=getoption(“pagesize”, startupvalue”);

call symputx("ps", ps); run; %put startup PS =%ps.;
48
Q

Fill in the blanks with the right procedure;

a) proc…… data= compare=
b) proc…… in= out=
c) proc…… library=
d) proc…… data= out=

A

a) COMPARE
b) COPY
c) DATASETS
d) CONTENTS