Test #3 Flashcards

(47 cards)

1
Q

What does covariation mean?

A

To covary means that two things go together or are associated (opposite of independence) (we are now looking at bivariate relationships!)
> cases with a certain value on variable A are likely to have a certain value on variable B
> when x tends to happen, does y tend to happen > thinking probabilistically, not guaranteed
ex. income and life expectancy covary, or are related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does independence mean?

A

It is the opposite of covariation > no association between variables > cases with a particular value on one variable do not have a particular value on the other variable (can’t use one as a predictor of another)
ex. number of siblings and life expectancy are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a hypothesis?

A

A tentative statement about a relationship between two or more variables (a type of statement about covariation)
> must be testable (research hypothesis) and stated in an unambiguous manner

*two tailed and one tailed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are 3 tactics to help determine whether something is a causation or just correlation?

A

Replication, statistical controls, and experimental design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are contingency tables (crosstabulation) used for?

A
  • can use for any level of measurement, but primarily for two nominal or ordinal variables
  • values inside table cells provide joint (conditional) frequency distributions

(‘total’ column and rows = marginals/marginal totals)

*IV across top and DV down the left side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the rules for how many categories to have with a contingency table?

A

DV: do not have more than 7 categories - ensure sample is fairly large
IV: have no more than 3-4 categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hypothesis testing is the second main application of inferential statistics > what does it measure and what is the use?
in what way do we normally think with hypothesis testing?

A
  • measures the likelihood that a relationship between variables exists in the population
  • gives the probability that an observed relationship in the sample is due to random chance alone (ex. sampling error)
  • the stronger the relationship between variables in the sample, the less likely it is attributed to sampling error > there is statistical evidence that a relationship exists in the population
    *generally we are thinking deductively with hypothesis testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is statistical significance?

A

The likelihood that a relationship as big as one observed in a sample could be due to sampling error alone
> asking if an observed relationship is just due to chance
> different from “importance” the relationship doesn’t need to matter it just means that there is a low prob. that the results are due to change or sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does it mean to test the Ho (what is the Ho)

A

Ho (the null hypothesis) says that there is NO relationship between the variables under consideration

Through our testing of statistical significance, we will either reject or fail to reject (accept) the null (Ho)
> if we reject Ho, we’re saying there IS a realtionship
> if we accept or fail to reject, we’re saying there is NO relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When testing the level of significance with alpha levels what are the 3 assumptions with the idea of probability

A

Assumptions:
- we used probability sampling methods
- there was no relationship between our IV and DV in the population
- a large number of samples had been taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 alpha levels we mention? What do these mean?

A

0.05 (95% confidence)
0.01 (99% confidence)
0.001

Represents the probability of incorrectly concluding that there is a significant relationship when there is none

Refers to the level of risk that we want to take of being “wrong”
- smaller = taking more risk but can say there is a higher level of significance
- usually we select 0.05 or 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Chi-Square Test for Independence and when is it appropriate to use?

A

It compares observed frequencies (O) with expected frequencies (E) (what you would’ve observed if there was no relationship)
*testing to see if there is or isn’t an association

It is appropriate for testing relationships between nominal and ordinal variables, assuming that random sampling was used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 7 steps to find Chi-Square

A
  1. State the claim and identify the null and alternative hypothesis as Ho and H1
  2. Specify the level of significance, represented as a (alpha)
  3. Choose the appropriate test statistic given the levels of measurement of the variables, the purpose of the test, and any other necessary assumptions (in this case it’s X^2)
  4. Identify the critical value of the test statistic to indicate under what condition the null hypothesis should be rejected or not rejected
  5. Calculate X^2 using the data from the sample
  6. Compare X^2 to the critical value and decide to reject or fail to reject the null hypothesis (IF X^2 IS BIGGER THAN CRITICAL VALUE THEN IT IS SIGNIFICANT AND WE REJECT THE NULL)
  7. Interpret the decision in terms of the original claim
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define ‘degrees of freedom’
How do we find this?

A

It is the number of values in a calculated statistic that are free to vary (do not have a fixed value)

dF = (r-1) (c-1)
r = rows in table
c = columns in table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between 1 tailed and 2 tailed hypothesis?

A

1 tailed = directional statement - positive or negative (can only do this with two quantitative variables)

2 tailed = nondirectional - doesn’t specify a direction or form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a spurious relationship? example?

A

It is an invalid relationship - we don’t know if x or y is the cause

example = study about nurses’ caffeine consumption and suicide risk
- found a positive relationship between consumption and happiness

problem? can’t rly tell which is the cause or effect
*determined this just shows correlation between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does chance is lumpy mean?

A
  • because of huge sample sizes and long timeframes, you’re bound to find patterns and covariations in experiments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In a contingency table, what variable is on the top and on the left side? What are the total column and rows called? how do you find the percentages in the cells?

A

Independent variable (occurs first in time) on the top and dependent variable on the left

Totals are called the marginals

Each inside box is a cell, you percentage down and compare across

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the 85%/15% split rule?

A

If there is any gap wider than 85-18% between variables’ results then there is not enough variation in the data (50/50 split = most variation possible)

20
Q

What is fo and fe in the X^2 equation and how do you find each?

A

fo = the cell frequencies observed in the bivariate table
> this is easy to find, it’s just the data we are given in the table

fe = the cell frequencies that would be expected if the variables were independent
> ex. comparing men and women who would be willing to live common law > multiply the total number of men by the proportion of the total population who said “yes” > do this for total men x total no > total women x total yes > and total women x total no > then use the formula for each cell’s fo and fe and then sum up all of these numbers

21
Q

For nominal-level variables, what are the two commonly used measures of association?

A
  • Chi square-based measures: phi or Cramer’s V
  • PRE measure: Lambda
22
Q

In assessing any bivariate association, what are the 3 progressive questions we ask?

A
  1. Does an association exist
  2. If an association exists, how strong is the association (strength)
  3. What is the pattern or direction of the association (direction)
23
Q

When we eyeball a bivariate table, how can we examine if an association exist?

A

look at the column percentages –> do the conditional distributions of Y differ across the various categories of X?
- if conditional distributions change, there is a relationship between the variables

24
Q

What would no association between variables and a perfect association between variables look like on a bivariate table?

A

no association: every cell has the same number in it - no variation and no therefore association

perfect association: each value of the dependent variable is associated with one, and only one, value of the independent variable (example in photos)
> means you can predict with 100% certainty

25
How can we quickly measure the strength of association in a bivariate table?
Find the maximum different/ biggest difference in the column percentages for any row of the table (biggest number in first row minus the smallest number in first row, and so on)
26
other name for a bivariate table?
cross-tab
27
what is the fancy name for column percentages (in a cross-tab)
conditioanal distributions
28
if both variables are _______, we can discuss direction as well as pattern
ordinal
29
What is the range of Phi and Cramer's V?
Range in value from 0 (no association) to 1 (perfect association)
30
What is Phi? when should it be used?
Phi = the square root of (Chi square / n - total number of cases) *used for 2x2 tables - good for nominal variables to tell us about the strength of the relationships
31
What is Cramer's V? when should it be used?
V = square root of ... chi square / (n)(min r-1, c-1) *used for tables larger than 2x2 because it adjusts for the size of the table
32
Phi and Cramer's V determine the _____ of a relationship, they don't identify the ______
strength pattern
33
Explain what a PRE measure is?
PRE = proportionate reduction of error - looks at how well we can predict values of the DV based on values of the IV - compares 2 situations > one in which we don't use the IV for making predictions and one in which we do *the fewer errors in making predictions after we know the IV, the stronger the relationship is
34
How does lambda compare to phi/V? what level of measurement can the variables be?
lambda is a PRE measure for bivariate tables - like phi/V, it is used to measure the strength of association and values range from 0-1 but unlike phi/V, lambda has a more direct and meaningful interpretation > it tells us the improvement in predicting Y while taking X into account - lambda can use two nominal variables or one nominal and one ordinal - bc it doesn't give us a direction
35
What is the formula for Lambda?
Lambda = (E1-E2) / E1 E1 = errors when you don't know the IV and don't use the IV for prediction (look on the right margin - E1 = every answer that is not the mode) E2 = errors when you do know the independent variable (look in each column - E2 = every answer that is not the mode in each column) errors means the total number of cases that were not the answer you picked (which is always the mode) - so errors = every response except the mode - in the column
36
What are the 3 limitations of Lambda?
- no direction - asymmetric - it matters which variable is IV and DV - can produce false zeroes > if this happens then use a CHI-square based measure
37
Explain the Pairs Concept relating to Gamma and Kendall's tau
- whereas Lambda is based upon guessing exact values (uses modal values), Gamma and Kendall's tau are based on guessing the ordinal arrangement of values - for any given pair of cases, we guess that their ordinal ranking on one variable will correspond positively or negatively to their ordinal ranking on the other variable
38
What levels of measurement can Gamma use?
Only uses 2 ordinal variables
39
What is a Concordant pair?
aka same or similar - the value of each variable is larger (or each is smaller) for one case than for the other ex. Mary (case) > 2(variable), 3(variable) John (case) > 1(variable), 2(variable)
40
What is a Discordant pair?
aka opposite or dissimilar - the value of one variable for a case is larger than the value for the other case, but the direction is reversed for the second variable ex. Mary (case) > 2(variable), 3(variable) Steve (case) > 3(variable), 2(variable)
41
What is a tied pair?
The value of one variable for a case is the same as the value for the other case for either x variable, y variable, or both variables
42
What's the limitation of gamma?
It ignores tied pairs so it can inflate the result
43
How do you find the number of unique pairs of respondents?
Total # of unique pairs of respondents = n (n-1) / 2 n = total number of respondents
44
How does Gamma and Kendall's Tau detect assocation? (also characteristics of Gamma)? (what does it assume)
They measure the strength and direction of association by counting the number of similar vs. dissimilar pairs - take on values between 0-1 - the larger the difference between the number of similar and dissimilar pairs, the stronger the association Gamma = symmetrical > measures (-1 to 0 to 1), best to use when X and Y have relatively few categories, and assumes the association is linear
45
What is the formula for Gamma?
G = (ns - nd) / (ns + nd) ns = number of pairs of respondents ranked the same on both variables (similar pairs) > multiply f of each cell by total of all f's in cells BELOW AND TO ITS RIGHT (sum the products) nd = number of pairs of respondents ranked differently on the two variables (dissimilar pairs) > multiply f of each cell by total of all f's in cells BELOW AND TO ITS LEFT (sum the products)
46
In Gamma, if more of the pairs are concordant than discordant = _______ association If more of the pairs are discordant than concordant = _______ association
positive negative
47
What is type 1 and type 2 error? As you lower your alpha level (ex. 0.05-0.01), which error increases and decreases?
type 1: rejection of a true Ho > you said there is a relationship but there is not (false positive) type 2: acceptance of a false Ho > said no relationship but there is one (false negative) a lower alpha level (higher test of significance) your type 2 errors will increase while type 1 errors will decrease