Flashcards in Stats Exam 4 Deck (32):

1

## Chi Square Null

###
Ho: There is no relationship between the two categorical variables.

Alternative

2

## Chi Square Alternative

### Ha: There is a relationship between the two categorical variables.

3

## Assumptions of The Chi-Square Test for Independence

###
1. The sample should be random.

2. In general, the larger the sample, the more accurate and reliable the test results are. All expected counts need to be greater than 1, with at least 80% exceeding 5 to ensure reliable use of the test. Note: this rule applies only to expected frequencies. It is acceptable for an observed frequency to be 0, provided the expected frequencies meet the criterion

4

## Linear correlations

###
-Have two components: direction & size

-Both described by “r”(sample) or “ρ” (rho, population)

r = Pearson’s Correlation Coefficient

5

## Properties of linear correlation coefficient r

###
- Range: -1 ≤ r ≤ 1

Scale is irrelevant (based on standardized scores)

Only measures strength of linear associations

DOES NOT IMPLY CAUSALITY

6

## r^2

###
r2 = proportion of the variation in y that is determined by x

7

## interpreting r

###

0.5 0.9 : correlation is very strong

r = ±1.00 : correlation is perfect

8

## Is the given r value statistically significant?

###
A weak correlation (small r) can be significant.

A moderate/large correlation can occur by chance alone and be statistically insignificant.

If r is NOT significant…

the best predictor of x is x_

the best predictor of y is y_

9

##
Regression line

### = A “best fit line”, y = mx + b.

10

## Residuals

### variation not explained by the regression model

11

## Least Squares Property

###
Linear regression produces the smallest possible sum of squares for residuals.

S.O.S. Residuals= Unexplained Variation

12

## If no significant correlation exists, the best estimate of Y is

### the MEAN of Y

13

## F statistic

### Mean Square Regression / Means Square Residual

14

## F Test for Regression

###
Tells us if the regression model is statistically significant.

15

## Multiple Regression

###
Bivariate regression can be extended to multivariate data

-When 2 or more independent variables may be related to a dependent variable

Advantages

-Improved predictive value (r square)

-Estimates are more precise

16

## r^2 (R^2)

###
Multiple Coefficient of Determination

r2 still equals the amount of variation in one variable, explained by other predictor variables.

17

## Adjusted Coefficient of Determination

###
Adding predictor variables will increase r2, even if the contribution is trivial.

The best regression equation may not have the largest r2.

For multiple regression, use an adjusted r2

k = number of predictor variables

The adjusted r square increases only if the new variables contribution is more than what would be expected by chance alone.

18

## What independent variables to include as predictors?

###
1) Consider thCommon sense & practical considerations.

- Bear “Age” may be predictive, but impractical to measure.

2) e standardized coefficients

- Independent variables converted to Z scores. “Standardized coefficients” indicated relative strength influence

3) Evaluate several regression models

Choose the model with the highest adjusted r square and the fewest variables possible

Avoid multicollinear variables (head width & ear tip distance)

Choose the equation with the lowest P value (based on F statistic in ANOVA table)

19

## χ2

###
χ2 = quantifies difference from expected frequencies

Small χ2 >> due to random variation.

Large χ2 >> unlikely to occur by chance.

No negative values & always a one tailed test

20

##
If the observed frequencies perfectly match the expected frequencies…

### you would see 0

21

##
If the observed frequencies are vastly different than the expected frequencies…

### you would see a big value

22

## chi square calculations

### Sigma[(O-E)^2/E]

23

## Chi-Square Distribution

###
Skewed right

χ2 = 0 to ∞

Different curve for every degree of freedom

Degrees of freedom = (rows–1)*(columns–1)

24

## Chi Square Review

###
Evaluates C C relationships

Compares expected to observed frequencies

Tests of Independence

Expected = P*n = (row total*column total)/total

Used to test any frequency related hypothesis

E.g.: Car accidents are 5 times more common on weekdays than on weekends.

25

## Number Needed to Treat to prevent one case

### The number of subjects we would need to treat, to prevent one case of disease

26

## Risk

###
Probability for a condition/disease

27

## Risk Ratio

### a ration of two sample risks

28

## Risk & Risk Ratio Hypotheses

### H0: RR = 1.0 Ha: RR ≠ 1.0

29

## CI for Risk Ratio

###
--RR captures 1.0 Fail to reject H0,Fail to support Ha

--RR does not include 1.0 Reject H0, support Ha

30

## Risk Ratio Caveat

###
Only applies to “natural” data, that are not “case controlled”.

Includes Prospective Studies

Randomized Controlled Trials (E.g. Salk Vaccine)

The natural incidence of disease is observed

Excludes most Retrospective Studies

Case controlled studies

Experimenter decides how many cases of each condition to include

Odds Ratios can be used in these cases

Odds compare the incidence of one condition to another (not to the total)

31

## Odd

### A ratio of the incidence of one condition to it’s complimentary condition. Not a probability.

32