Midterm 2 Flashcards Preview

Bio 180 > Midterm 2 > Flashcards

Flashcards in Midterm 2 Deck (63):
1

What are the consequences of violating these assumption?

P-values will not be meaningful
Parameters may not be accurate

2

What is independence?

Knowing the error of one or a subset of datapts provides no knowledge of the error of any others

3

What are the 3 ways non-independence arises?

Heterogeneity in the dataset (ignoring natural subsets related to response
Replicate measurements per test subject incorrectly inflates dfresidual
Nested data( ignoring subs smoking or another hierarchy caused heterogeneity in data

4

What are 3 warning signs of non-independence in a study?

Too many data points
Indication of any kind of repeated measurements
Any implausible result

5

What is homogeneity of variance?

Assumption that the scatter around old he model is of equal magnitude throughout the fitted model

6

What is the ideal approach for homogeneity of variance?

Residuals spread equally above and below fitted line. Plot residuals~ fitted values

7

What is normality of error?

When the residuals are normally distributed

8

Which type of plot is a more of a precise way to visualize distribution normality??

QQ Norm. If the QQnorm plot fits a straight line between -2 and +2 then the data is normally distributed

9

What is the Shapiro-will test

It test for normality giving a specific p-value for the null hypothesis that the data is normally distributed

10

How do we fix non normality and inhomogeneity?

Transform the variables. Ex apply log, sqrt, or exp

11

If residuals~ frequency plot is a right hand tail how do we fix the problem

Transform the response variable with log, sqrt or 1/y.

12

If residual~frequency is a left hand tail, how do we fix the problem

Transform the variable with e^y

13

What is linearity/additivity

A linear relationship between the response and the explanatory variables

14

How is non linearity detected

Plot the residuals against each of the explanatory variable. If linear the plot for each variable should show an equal distribution of points above and below zero

15

What other strategies to fix non-linearity

Inclusion of variable interactions
Inclusion of higher powers of the explanatory variables

16

What is model criticism

Testing key assumption of general linear models
Be normally distributed with mean zero
Not systemically vary different values of he predicted response
Not systematically vary for different values of any of the explanatory variables

17

From a scientific understanding what is conflicted a best model

Fewest explanatory variables that yield model who small p-value

18

An accurate predictive model is considered a beat model of?

Highest r^2 without regard for number variables but avoid over-fitting

19

What are the three principles of model choice

Economy of variables
Considerations of mariginality
Multiplicity of p-values

20

What is economy of variables

The simpler the better

21

What is multiplicity of p values

If you calculate enough p values some models will be significant just be chance

22

What is considerations of marginality

The simplest terms have priority and inclusion of interaction terms requires the inclusion of their simpler parts

23

What is the goal of economy of variables

To identify the minimum adequate model

24

How to deal with the problem of multiplicity of p values

Reduce the p value cutoff
Use more specialized statistical tests
Reduce the number of explanatory variables by combining multiple terms into a single term
Focus, don't fish

25

What is the importance of marginality

Hierarchies must be respected in model formulae

26

How is type 4 goal of finishing the subset of variable to explain some response achieved?

1. Attempt to build a model using all first, second and third order terms
2. Build simpler model including only first and second order terms
3. Build simpler models removing terms not deemed significant
4. Build simpler model including only first order terms

27

What is adjusted r^2

One metric for measuring how "economically" a model describes a data set of rules

28

What is the formula for adjusted R^2

(Total MS-Residual MS)/ total MS where total MS= total ss/total df


1 + ( R^2-1) (dftotal/(df total-df model)

29

How to determine how "economically" a model describe a data set

1. Use adjusted r^2
2. Use prediction intervals (increases as you add insignificant explanatory variables

30

What are possible pitfalls with automated methods

Temptation just to let the computer do the thinking and neglect other relevant info

Slightly different automated procedure can give different models

Don't take overall p value of final model too literally

31

What is a stepwise regression

An automated procedure for selecting a subset of variables in a model

Backwards stepwise regression

Forward stepwise regression

32

What is a backward stepwise regression

Build full model and remove that variable that contributes that lease

Build new model and remove variable that contributes least

y~x1+ x2+x3

33

What is forward stepwise regression

y~X1 y~x2 y~3

Y~x3+x1. Y~x3+x2

Stop when additional variable does not improve adjusted r2

34

What is akaike information criterion (aic)

Alc= nln(rss/n) + 2k

Lower aic means better model

35

What is the purpose of addingrandom effects

Random effects add a new type of variance to estimate

Estimate how the individual to individual variation compares to the variation due to other effects

36

What does nested data allow us to quantify

Random effects

37

What is population modeling?

The use of property of individual to predict future populations

38

What is "r"

The net reproduction per individual per unit time

R= birth rate(f)- death rate (d)

39

What is the Malthusian model

Change of p= r*p

P(t+1)= p(t) + r* p(t)

40

In the Malthusian model what are the behaviors of r?

r> 0 deltaP>0 pop grows
R=o deltaP= constant
R

41

What is the total fertility rate (tfr)

The average number of children born per women over her lifetime

42

What is carrying capacity (k)

Maximum number of individual that can be sustained in a particular habitat

43

What is logistic growth

This occurs when population size is limited by carrying capacity

44

What is the logistic model equation

deltaP= rmax * p(1-p/k) this creates a parabola
Where r= max pop growth rate
K= carrying capacity
P= number of individuals
DeltaP= pop growth/ unit time

P(t+1)= P(t) + rmax*P(t)(1-Pt/k)

45

What is the behavior of K in a logistic model

When P> K then delta deltaP

46

What are control structures

Manage how many times commands in a program are executed

47

What are loops

Run a set of commands a specified number of times until some condition is true

48

What are conditional

Run a set of commands not if some condition is true

49

What is the "for" command used for

To repeat a set of commands a set number of times

Ex. For( I in 1:10) { print(i)}

50

How to use a loop to add up a list of numbers

Use the counter command

Counter

51

How to loop a vector

Vector

52

How do we determine population model parameters

Fit the model to real data

53

How automate the processing of optimizing the values of r and k

Define the fitness to be minimized

54

What are equilibrium points

Occur where the values are unchanged between time steps

This can be determined by setting both the Malthusian and logistic model p or k=0

55

What determine stability of equilibrium points

Perturbation analysis algebraically or graphically

Substitute Pt= P* + Pt

56

When does equilibrium occurs

P(t+1)= F(Pt) = Pt

F(P) > 1 unstable

F(P)

57

What is the stretching factor

(1+rmax) pt tells of theres shrinkage or growth

58

What type of diagram will indicate whether the model converged to an equilibrium point

A cobweb diagram

59

In the logistic model on perturbation analysis when is are the conditions when r> 0

0

60

Perturbation analysis when r

-1

61

What leads to chaotic behavior

Increasing r values leads to first oscillatory
The bifurcation diagram shows chaotic behavior

62

What is the Ricker model

Pt+1= pt* e^rmax(1-pt/k)
Never produces negative populationd but is noiser than logistic model

63

What are the four assumptions?

Independence
Normality of error
Homogeneity of variance
Linearity/additivity