Midterm 2 Flashcards

(63 cards)

1
Q

What are the consequences of violating these assumption?

A

P-values will not be meaningful

Parameters may not be accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is independence?

A

Knowing the error of one or a subset of datapts provides no knowledge of the error of any others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 ways non-independence arises?

A

Heterogeneity in the dataset (ignoring natural subsets related to response
Replicate measurements per test subject incorrectly inflates dfresidual
Nested data( ignoring subs smoking or another hierarchy caused heterogeneity in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 3 warning signs of non-independence in a study?

A

Too many data points
Indication of any kind of repeated measurements
Any implausible result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is homogeneity of variance?

A

Assumption that the scatter around old he model is of equal magnitude throughout the fitted model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the ideal approach for homogeneity of variance?

A

Residuals spread equally above and below fitted line. Plot residuals~ fitted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is normality of error?

A

When the residuals are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which type of plot is a more of a precise way to visualize distribution normality??

A

QQ Norm. If the QQnorm plot fits a straight line between -2 and +2 then the data is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Shapiro-will test

A

It test for normality giving a specific p-value for the null hypothesis that the data is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we fix non normality and inhomogeneity?

A

Transform the variables. Ex apply log, sqrt, or exp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If residuals~ frequency plot is a right hand tail how do we fix the problem

A

Transform the response variable with log, sqrt or 1/y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If residual~frequency is a left hand tail, how do we fix the problem

A

Transform the variable with e^y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is linearity/additivity

A

A linear relationship between the response and the explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is non linearity detected

A

Plot the residuals against each of the explanatory variable. If linear the plot for each variable should show an equal distribution of points above and below zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What other strategies to fix non-linearity

A

Inclusion of variable interactions

Inclusion of higher powers of the explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is model criticism

A

Testing key assumption of general linear models
Be normally distributed with mean zero
Not systemically vary different values of he predicted response
Not systematically vary for different values of any of the explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

From a scientific understanding what is conflicted a best model

A

Fewest explanatory variables that yield model who small p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

An accurate predictive model is considered a beat model of?

A

Highest r^2 without regard for number variables but avoid over-fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the three principles of model choice

A

Economy of variables
Considerations of mariginality
Multiplicity of p-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is economy of variables

A

The simpler the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is multiplicity of p values

A

If you calculate enough p values some models will be significant just be chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is considerations of marginality

A

The simplest terms have priority and inclusion of interaction terms requires the inclusion of their simpler parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the goal of economy of variables

A

To identify the minimum adequate model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to deal with the problem of multiplicity of p values

A

Reduce the p value cutoff
Use more specialized statistical tests
Reduce the number of explanatory variables by combining multiple terms into a single term
Focus, don’t fish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the importance of marginality
Hierarchies must be respected in model formulae
26
How is type 4 goal of finishing the subset of variable to explain some response achieved?
1. Attempt to build a model using all first, second and third order terms 2. Build simpler model including only first and second order terms 3. Build simpler models removing terms not deemed significant 4. Build simpler model including only first order terms
27
What is adjusted r^2
One metric for measuring how "economically" a model describes a data set of rules
28
What is the formula for adjusted R^2
(Total MS-Residual MS)/ total MS where total MS= total ss/total df 1 + ( R^2-1) (dftotal/(df total-df model)
29
How to determine how "economically" a model describe a data set
1. Use adjusted r^2 | 2. Use prediction intervals (increases as you add insignificant explanatory variables
30
What are possible pitfalls with automated methods
Temptation just to let the computer do the thinking and neglect other relevant info Slightly different automated procedure can give different models Don't take overall p value of final model too literally
31
What is a stepwise regression
An automated procedure for selecting a subset of variables in a model Backwards stepwise regression Forward stepwise regression
32
What is a backward stepwise regression
Build full model and remove that variable that contributes that lease Build new model and remove variable that contributes least y~x1+ x2+x3
33
What is forward stepwise regression
y~X1 y~x2 y~3 Y~x3+x1. Y~x3+x2 Stop when additional variable does not improve adjusted r2
34
What is akaike information criterion (aic)
Alc= nln(rss/n) + 2k Lower aic means better model
35
What is the purpose of addingrandom effects
Random effects add a new type of variance to estimate Estimate how the individual to individual variation compares to the variation due to other effects
36
What does nested data allow us to quantify
Random effects
37
What is population modeling?
The use of property of individual to predict future populations
38
What is "r"
The net reproduction per individual per unit time R= birth rate(f)- death rate (d)
39
What is the Malthusian model
Change of p= r*p P(t+1)= p(t) + r* p(t)
40
In the Malthusian model what are the behaviors of r?
r> 0 deltaP>0 pop grows R=o deltaP= constant R
41
What is the total fertility rate (tfr)
The average number of children born per women over her lifetime
42
What is carrying capacity (k)
Maximum number of individual that can be sustained in a particular habitat
43
What is logistic growth
This occurs when population size is limited by carrying capacity
44
What is the logistic model equation
``` deltaP= rmax * p(1-p/k) this creates a parabola Where r= max pop growth rate K= carrying capacity P= number of individuals DeltaP= pop growth/ unit time ``` P(t+1)= P(t) + rmax*P(t)(1-Pt/k)
45
What is the behavior of K in a logistic model
When P> K then delta deltaP
46
What are control structures
Manage how many times commands in a program are executed
47
What are loops
Run a set of commands a specified number of times until some condition is true
48
What are conditional
Run a set of commands not if some condition is true
49
What is the "for" command used for
To repeat a set of commands a set number of times Ex. For( I in 1:10) { print(i)}
50
How to use a loop to add up a list of numbers
Use the counter command Counter
51
How to loop a vector
Vector
52
How do we determine population model parameters
Fit the model to real data
53
How automate the processing of optimizing the values of r and k
Define the fitness to be minimized
54
What are equilibrium points
Occur where the values are unchanged between time steps This can be determined by setting both the Malthusian and logistic model p or k=0
55
What determine stability of equilibrium points
Perturbation analysis algebraically or graphically Substitute Pt= P* + Pt
56
When does equilibrium occurs
P(t+1)= F(Pt) = Pt F(P) > 1 unstable F(P)
57
What is the stretching factor
(1+rmax) pt tells of theres shrinkage or growth
58
What type of diagram will indicate whether the model converged to an equilibrium point
A cobweb diagram
59
In the logistic model on perturbation analysis when is are the conditions when r> 0
0
60
Perturbation analysis when r
-1
61
What leads to chaotic behavior
Increasing r values leads to first oscillatory | The bifurcation diagram shows chaotic behavior
62
What is the Ricker model
Pt+1= pt* e^rmax(1-pt/k) | Never produces negative populationd but is noiser than logistic model
63
What are the four assumptions?
Independence Normality of error Homogeneity of variance Linearity/additivity