SAS Flashcards

1
Q

Code to import data

A
data work.datasetname; 
input age weight height; 
label age = 'Patient Age'; 
cards; 
24 130 65
30 150 70; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

printing data

A

proc print run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

UNIVARIATE procedure does what

A

for each variable prints summary statistics
extreme observations, stem and leaf, basic stats (mean/median/mode/deviation/stdev/range/IQR), quartiles, t-test, sign, signed rank
USE TO EXPLORE NEW DATASET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

proc UNIVARIATE code

A
proc univariate data = setName plot; (plot not necess.)
var weight; 
histogram weight; (not necessary)
Title1='Age study'; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the CORR procedure do

A

shows simple statistics (N of each group, mean, standard deviation, max, label)
Correlations between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

proc CORR code

A

proc corr data = work.setName;
var age height;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

code to create new dataset

A

data work.newSet;
set oldSet;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

create new dataset and add new variable and fill it (code)

A
data work.clenedSet; 
set oldSet; 
bp = .; 
IF x = 6 THEN bp = 1; 
IF x = 1 OR x = 2 OR x = 3 OR x = 4 OR x = 5 THEN bp = 0; 
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does cross-tabulation with the FREQ procedure do?

A

shows two tables: one way freq and cross tabulations of two variables
can see who used what treatment
percentages
frequency tables for variables in analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

code to show frequencies of dataset proc FREQ

A

proc freq data = setName;
tables age age*weight;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what to add to FREQ procedure to see how many missing values

A

tables age age*weight/MISSING;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

proc TTEST code

A
proc ttest data = setName; 
class group; (groups we want to compare)
var height; (compare groups on this variable)
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does proc TTEST do to missing values

A

excludes them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

proc TTEST equality of variance results

A

if Folded F >0.05 assume equal variances, else say variances unequal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what test to use when unequal variances for proc TTEST

A

Satterthwaite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What test to use when equal variances TTEST

A

Pooled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

proc TTEST if P

A

Reject null and say difference in heigh between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

proc TTEST F-test null and alternative

A

Null is equal variances, alternative is unequal variances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to use Cochran with proc TTEST

A

proc ttest data = setName COCHRAN;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When to use cochran

A

produces p-value for unequal variances

if folded f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to compare our mean height to mean value under the null of 60 (code)

A

proc ttest data setName H0=60;
var height;
run;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how to check normality of data

A

proc univariate data = work.setName plot;
var height;
histogram height;
run;

shows box plot, histogram, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how to do before and after TTEST

A

proc ttest data = setName;
paired before*after
run;

null is before-after=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

ANOVA code

A
proc anova data = work.setName;
class food; (7 different groups ate diff food)
model height = food; (compare heights of people who ate different food)
run;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
ANOVA F-statistic calculation
variance between groups/variance between groups should be near 1 if null correct F statistic 6.67 suggests difference in means of groups
26
When to use Tukey
see which anova means are different. | multiple comparison
27
Tukey code
proc anova data = work.setName; class food; (7 different groups ate diff food) model height = food; (compare heights of people who ate different food) means food/tukey; run;
28
how to interpret Turkey output
comparisons significant at 0.05 indicated by **. those groups different
29
Why use ANOVA over t-test
faster. running many t-tests will increase chances that results are shown by chance
30
recode to put heights into 4 groups
data work.heightsgrouped; set work.heights; group=.; (initiate variable and handle missingness) if height >= 10 AND height
31
Write ANOVA code to test effect of age on group
``` proc anova data = work.setname; class group; model age = group; run; ```
32
what kind of comparisons do we do in anova
one continuous variable (age) to one categorical variable (group)
33
where do we write continuous and categorical in anova
``` continuous always on let and categorical on right. class will alway be categorical ```
34
anova using proc GLM code
``` proc glm data = work.setname; class group; model age = group run; ``` same out put but more options for GLM
35
how to sort data in increasing order (Code)
proc sort data = work.setname; by group; run;
36
how to create box plot for data (code)
proc boxplot data work.setname; plit age*group; run;
37
how to interpret box plot
horizontal line is median plus is the mean top and bottom are 1st and 3rd quartiles center shaded box made of 50% of data
38
how do we get all the attributes of our dataset and their characteristics (code)
proc contents data = work.dataset; | run;
39
what does CONTENTS procedure display
alphabetic list of variables and attributes variable name, type, length, etc length of variable important if merging datasets number is the position in the dataset
40
how to correlate things together
the CORR procedure
41
the CORR procedure code
proc corr data = dataset; var carb ener etohn fat sugaraw sugaref; with lexpectbirth; run;
42
what does CORR procedure output
simple statistics of each variable listed pearson correlation coefficients top number is correlation and bottom number is p-value
43
how to create correlation matrix with all the variables (code)
proc corr data = setname; var age height weight blah blah3 blah2; run;
44
why would we do a correlation matrix
to find out which predictors may be informative when predicting response understand how predictors are related because that affects how they jointly model the response variable
45
How to make scatterplot (code)
``` ods graphics on; proc gplot data = work.set; plot age*hegiht; run; ods graphics off; ```
46
reg procedure code
proc reg data = work.set; model lebirth=ener; run;
47
how to structure the model statement of REG procedure
first variable is response and right variable is predictor
48
what is root MSE
provided by REG procedure | estimate of standard deviation of the Y (response variable)
49
what is R-square
percent of the variation that the model explains | __% of total variation is explained by the model
50
what does it mean when regression intercept 27
When energy is 0, life expectancy after birth is 27
51
what does it mean when slope is 0.014
for every unit increase in energy consumption, life expectancy after birth goes up 0.01
52
what is null hypothesis proc reg data = work.set; model lebirth=ener; run;
energy consumption not related to life expectancy
53
how to transform data to look at squared life expectancy? (code)
data work nurtirion2; set work.nutrition; le2= lebirth*lebirth; run; proc reg data = work.nutrition2; model le2=ener; run;
54
Why squaere data
logarithmic trend taken away - more linear now | assume linear relationship so if look at graph want to see linearityy
55
what if transformed data and root MSE went up and R square didnt improve
not best transformation | when do transformation hope to improve linearity and our explanation of the variation
56
How to look if multicollinearity may exist (code)
``` ODS graphics on; PROC CORR DATA=work.nutrition nomiss plots=matrix(histogram); VAR carbs ener etohn fat sugraw sugref; RUN; ODS graphics off; ```
57
if put 2 response variables in model that are correlated with each other
will get errors and hard to decipher which variable is trying to give you which info
58
how to decide which variable not to use
context of model (study obesity more sense to include BMI than weight), include 1, run model, see fit, try another
59
bad model if can explain a variable throguh
linear combo
60
if had model of solid predictors DF would say
1 | otherwise says 0 or B
61
code for backward selection
in model line at end add '/SELECTION = BACKWARDS
62
what is backward selection
put all variables in, consider one and if doesn't add any info to model remove keep going backward until finds model that gives most info removes variable that's least significant (contributing less)
63
what is forward selection
start with no variables and consider one that adds most info moves on until no info added after adding variable
64
what is stepwise election
biased. forward, add next var (added 2), steps back and asks if addition of 2nd makes 1st less significant at each step consider removing or adding variable can make diff decision at each step
65
what does selection kick off before the start
linear things
66
which things are kicked off models using section first
non-significant p value
67
default significance for backward selection
0.1
68
default significance of stepwise selection
0.15
69
if data miner use ____ selection because want best model
stepwise
70
what complexity model do we want
simpler
71
how to see residuals
run model with ods graphics
72
what should we not see from residuals
trend between residuals and fitted values or residuals and any variable
73
residual vs. predicted values
want random - no patterns
74
standardized residuals
expect 95% between +-2
75
leverage
what'lll happen if pull that observation out of model. | points with high leverage = big infliuence on the line
76
Q-Q plot
quartiles vs. residuals
77
how do we expect residuals to be distributed
normally
78
how do we transform age square
data agetransform; set age; age2= age**2; run; double star is power
79
why do proc means
exploratory analysis | tells how many people in which category
80
how to perform exploratory analysis on age categories and a dichotomous variable
``` proc meas data = name; class age; var dichotomous; run; ```
81
``` PROC MEANS DATA=ear; class antibo; var clear1; RUN; ``` what does output mean (N obs, N, mean)
N Obs: How many people were on that antibiotic N: How many had first ear problem Mean: Percentage that had a recovery in the 14 day period
82
how to look at model of antibiotic one vs antibiotic 2 as reference (compare from 1 to 2)
PROC LOGISTIC DATA=ear DESCENDING; CLASS antibo; MODEL clear1 = antibo / LACKFIT; run;
83
PROC LOGISTIC DATA=ear DESCENDING; CLASS antibo; MODEL clear1 = antibo / LACKFIT; run; how to interpret OR 2.247
OR of 1 vs 2 | Odds of recovery are 2.247 times greater for antibiotic 1 vs antibiotic 2
84
how to check if OR significant
make sure CI doesnt contain 1 | look at p values
85
PROC LOGISTIC DATA=ear DESCENDING; CLASS antibo; MODEL clear1 = antibo / LACKFIT; run; what if don't put class variable
SAS looks at it as continuous variable | ALWYAS USE CLASS STATEMENT IN LOG REGRESSION TO MAKE MORE INTERPRETABLE
86
Odds = (write out standard model example)
e^-1.864 (constant) (e^1stBvalue)^X1 (e^2ndBvalue)^2 etc
87
e^-1.864 (constant) (e^1stBvalue)^X1 (e^2ndBvalue)^2 etc every unit increase in X increases odds of Y being 1 by
e^b
88
male e^B = 2.454 | intrepret
if subtract 1 from value get % increase or decrease in odds caused by being male odds of owning a gun increase by 145%
89
educ b=-0.056 | exp(b) = 0.946
year's education decreases odds by 5.4%
90
10 year age affect on odd
Exp(B)^10 1.008^10 = 1.083 odds go up
91
how to format vaiable (code)
proc format; | value hospformat 1= 'Hospitalized' 2='Not Hospitalized';
92
what do formats do
start their own folder and can call formats in proc freq
93
how to do chi square test of independence (code)
``` proc freq data=h1n1; format Hospitalization hospformat. Age ageformat.; tables Hospitalization * Age / chisq; weight Count; run; ```
94
what do we put after format
Any time it’s a format with a dot dot tells SAS it’s a formating statement
95
``` proc freq data=h1n1; format Hospitalization hospformat. Age ageformat.; tables Hospitalization * Age / chisq; weight Count; run; ``` * means
Star says hospitalization versus age.
96
/chisq tells sas
specific analysis I want done on frequency table is chi square
97
what does chisq show
frequency table | chi square value
98
rule of thumb for doing chi square
need to have 5 in each square | will warn if datasets have less than 5 - USE FISHERS instead
99
if chi square value p value
reject null hypothesis that age is independent of hospitalization
100
chi square likelihood ratio based on
regression analysis
101
mantel-haenszel chi square
ordinal test of association * Good if have ordinal categories * Looking for association b/w rows and columns assuming there is order for the columns
102
Phi coefficient
Usually just for 2x2 tables, for which -1
103
Contingency coefficient
C=sqrt(ϕ2/(N+ϕ2))
104
Cramer’s V
measure of association
105
expected chi square and fishers exact code
ods graphics on; proc freq data=h1n1; format Hospitalization hospformat. Age ageformat.; tables Hospitalization * Age / expected chisq; weight Count; exact fisher pchi; run;
106
when to use exact chi square
Use exact chi square when we don’t have this assumption covered – expected counts >5
107
Fishers exact
don’t have to worry about restrictions chi square has
108
poisson regression
Different style of regression based on Poisson distribution
109
Poisson distribution
common distribution for counts
110
log odds can't span
0
111
format questions to be answered as yes or no (code)
proc format; value qaformat 1='Yes' 2='No' 3='Dunno'; run;
112
code to check agreement between self questionnaire and interview
``` proc freq data=cough; format saq qaformat. int qaformat.; tables saq * int /agree; weight count; run; ```
113
``` proc freq data=cough; format saq qaformat. int qaformat.; tables saq * int /agree; weight count; run; ``` what does two sided test mean
testing if equals to zero
114
``` proc freq data=cough; format saq qaformat. int qaformat.; tables saq * int /agree; weight count; run; ``` what does one sided testing means
null is kappa 0q
115
which kappa to report
ONE SIDED because want positive kappa don't look at weighted kappa WANT BASIC KAPPA dont report exact p value
116
CI for kappa
shouldn't include zero
117
negtaive kappa value means
no agreement
118
kappa breakdowns
poor (
119
code to produce exact test for kappa
``` proc format; value physformat 1=’Minimal' 2=’Moderate' 3=’Large’ 4 =‘Excessive’; run; proc freq data=phys; format phys1 physformat. phys2 physformat.; tables phys1* phys2/agree; weight count; test agree; exact agree; run; ```
120
McNemar's test tests for
Symmetry shown when doing agreement is the probability that 1 physician rates it a 1 nad naother 3 same as probabilyu as 1st rates it a 3 and another rates it a 1 If table is not symmetric is what it indicates is that one phsyician tends to say that the ectopy’s are lartger than another physician – bias 27 minimal by 1 physician and only 15 minimal for another – bias towrard sayign thigns are smlaler (want 15 and 27 to be closer to each other)
121
H0 and Ha of McNemar's
null is symmetric, alternative is assymetric
122
How to interpret McNemar's
if p value
123
Parametric test: paired t-test | Nonparametric
Wilcoxon signed rank
124
How to get nonparametric correlation along with parametric (code)
proc corr data=oc pearson spearman; var before after; run;
125
difference between spearman and pearson
Pearson only looking at linear association Spearman works off of ranks. rank data to compare data. looking at non-linear association
126
code for matched pairs t-test
``` ODS GRAPHICS ON; PROC TTEST DATA=work.contraceptives; PAIRED before * after; TITLE ‘Example of Matched Pairs’; RUN; ODS OFF; ```
127
``` ODS GRAPHICS ON; PROC TTEST DATA=work.contraceptives; PAIRED before * after; TITLE ‘Example of Matched Pairs’; RUN; ODS OFF; ``` what does this do
match observations before and after to determine if same | assume before and after are normally distributed
128
paired t-test H0 and Ha
H0: before = after H1: before != after
129
what if reject null of paired t-test
say observations before not distributed same way as observations after
130
Wilcoxon signed rank used when
nonparametric for matched paires t-test
131
wilcoxon signed rank code
PROC UNIVARIATE DATA=oc; VAR diff; RUN; look at Signed rank
132
when to do signed rank vs t-test
based on normality. often people do non-parametric test so don't have to assume normality nonparametric tests harder to prove something, so significant in nonparametric will be significant in parametric
133
independent samples parametric: t-test non-parmetric
Mann Whitney U (Wilcoxon Signed Rank)
134
t-test independent samples code
PROC TTEST DATA=pain; CLASS Physiotherapy; VAR pri; RUN;
135
PROC TTEST DATA=pain; CLASS Physiotherapy; VAR pri; RUN; class statement tells SAS
where to get 2 independent samples physic classified in 2 groups trying to see if pain rating same in 2 groups
136
Mann Whitney U code
nonparam test for indep samples PROC NPAR1WAY DATA=pain WILCOXON; CLASS Physiotherapy; VAR pri; RUN;
137
What to look at when doing Wilcoxon Rank sum (Mann Whitney U)
T-approximation | doing 2-sided test to determine if differences are zero
138
Kruskal Wallis Test code
PROC NPAR1WAY DATA=pain WILCOXON; CLASS analg; VAR pri; RUN;