Test #3 Flashcards
(47 cards)
What does covariation mean?
To covary means that two things go together or are associated (opposite of independence) (we are now looking at bivariate relationships!)
> cases with a certain value on variable A are likely to have a certain value on variable B
> when x tends to happen, does y tend to happen > thinking probabilistically, not guaranteed
ex. income and life expectancy covary, or are related
What does independence mean?
It is the opposite of covariation > no association between variables > cases with a particular value on one variable do not have a particular value on the other variable (can’t use one as a predictor of another)
ex. number of siblings and life expectancy are independent
What is a hypothesis?
A tentative statement about a relationship between two or more variables (a type of statement about covariation)
> must be testable (research hypothesis) and stated in an unambiguous manner
*two tailed and one tailed
What are 3 tactics to help determine whether something is a causation or just correlation?
Replication, statistical controls, and experimental design
What are contingency tables (crosstabulation) used for?
- can use for any level of measurement, but primarily for two nominal or ordinal variables
- values inside table cells provide joint (conditional) frequency distributions
(‘total’ column and rows = marginals/marginal totals)
*IV across top and DV down the left side
What are the rules for how many categories to have with a contingency table?
DV: do not have more than 7 categories - ensure sample is fairly large
IV: have no more than 3-4 categories
Hypothesis testing is the second main application of inferential statistics > what does it measure and what is the use?
in what way do we normally think with hypothesis testing?
- measures the likelihood that a relationship between variables exists in the population
- gives the probability that an observed relationship in the sample is due to random chance alone (ex. sampling error)
- the stronger the relationship between variables in the sample, the less likely it is attributed to sampling error > there is statistical evidence that a relationship exists in the population
*generally we are thinking deductively with hypothesis testing
What is statistical significance?
The likelihood that a relationship as big as one observed in a sample could be due to sampling error alone
> asking if an observed relationship is just due to chance
> different from “importance” the relationship doesn’t need to matter it just means that there is a low prob. that the results are due to change or sampling error
What does it mean to test the Ho (what is the Ho)
Ho (the null hypothesis) says that there is NO relationship between the variables under consideration
Through our testing of statistical significance, we will either reject or fail to reject (accept) the null (Ho)
> if we reject Ho, we’re saying there IS a realtionship
> if we accept or fail to reject, we’re saying there is NO relationship
When testing the level of significance with alpha levels what are the 3 assumptions with the idea of probability
Assumptions:
- we used probability sampling methods
- there was no relationship between our IV and DV in the population
- a large number of samples had been taken
What are the 3 alpha levels we mention? What do these mean?
0.05 (95% confidence)
0.01 (99% confidence)
0.001
Represents the probability of incorrectly concluding that there is a significant relationship when there is none
Refers to the level of risk that we want to take of being “wrong”
- smaller = taking more risk but can say there is a higher level of significance
- usually we select 0.05 or 0.01
What is the Chi-Square Test for Independence and when is it appropriate to use?
It compares observed frequencies (O) with expected frequencies (E) (what you would’ve observed if there was no relationship)
*testing to see if there is or isn’t an association
It is appropriate for testing relationships between nominal and ordinal variables, assuming that random sampling was used
What are the 7 steps to find Chi-Square
- State the claim and identify the null and alternative hypothesis as Ho and H1
- Specify the level of significance, represented as a (alpha)
- Choose the appropriate test statistic given the levels of measurement of the variables, the purpose of the test, and any other necessary assumptions (in this case it’s X^2)
- Identify the critical value of the test statistic to indicate under what condition the null hypothesis should be rejected or not rejected
- Calculate X^2 using the data from the sample
- Compare X^2 to the critical value and decide to reject or fail to reject the null hypothesis (IF X^2 IS BIGGER THAN CRITICAL VALUE THEN IT IS SIGNIFICANT AND WE REJECT THE NULL)
- Interpret the decision in terms of the original claim
Define ‘degrees of freedom’
How do we find this?
It is the number of values in a calculated statistic that are free to vary (do not have a fixed value)
dF = (r-1) (c-1)
r = rows in table
c = columns in table
Difference between 1 tailed and 2 tailed hypothesis?
1 tailed = directional statement - positive or negative (can only do this with two quantitative variables)
2 tailed = nondirectional - doesn’t specify a direction or form
What is a spurious relationship? example?
It is an invalid relationship - we don’t know if x or y is the cause
example = study about nurses’ caffeine consumption and suicide risk
- found a positive relationship between consumption and happiness
problem? can’t rly tell which is the cause or effect
*determined this just shows correlation between variables
What does chance is lumpy mean?
- because of huge sample sizes and long timeframes, you’re bound to find patterns and covariations in experiments
In a contingency table, what variable is on the top and on the left side? What are the total column and rows called? how do you find the percentages in the cells?
Independent variable (occurs first in time) on the top and dependent variable on the left
Totals are called the marginals
Each inside box is a cell, you percentage down and compare across
What is the 85%/15% split rule?
If there is any gap wider than 85-18% between variables’ results then there is not enough variation in the data (50/50 split = most variation possible)
What is fo and fe in the X^2 equation and how do you find each?
fo = the cell frequencies observed in the bivariate table
> this is easy to find, it’s just the data we are given in the table
fe = the cell frequencies that would be expected if the variables were independent
> ex. comparing men and women who would be willing to live common law > multiply the total number of men by the proportion of the total population who said “yes” > do this for total men x total no > total women x total yes > and total women x total no > then use the formula for each cell’s fo and fe and then sum up all of these numbers
For nominal-level variables, what are the two commonly used measures of association?
- Chi square-based measures: phi or Cramer’s V
- PRE measure: Lambda
In assessing any bivariate association, what are the 3 progressive questions we ask?
- Does an association exist
- If an association exists, how strong is the association (strength)
- What is the pattern or direction of the association (direction)
When we eyeball a bivariate table, how can we examine if an association exist?
look at the column percentages –> do the conditional distributions of Y differ across the various categories of X?
- if conditional distributions change, there is a relationship between the variables
What would no association between variables and a perfect association between variables look like on a bivariate table?
no association: every cell has the same number in it - no variation and no therefore association
perfect association: each value of the dependent variable is associated with one, and only one, value of the independent variable (example in photos)
> means you can predict with 100% certainty