Lecture 5-8 Flashcards
What is the central limit theorem? (CLT)
the larger the sample is, the sampling distribution (plotting the mean on histogram ) , the distribution will eventually result a normal distribution. Even if the variable is not normally distributed or skewed, such as Income
What is a Sampling Distribution
Similiar to a frequency distribution.
The sampling distribution charts or graphs the probability of getting an useful value such as the mean.
Relies on repeated samples, and larger sample size
What is the population parameter?
population-level statistics
What does the law of probability? Two types of probability? Examples?
Theoretical probability;
How likely an event is going to happen, theoretically. I.e. Flipping coin- 1/2 or 50% chance of heads or tails
represented from 0-1, or %
Empirical probability: What happens in actual reality. Flipping a coin- if got get 6 heads in 6 flips, your empirical probability is 100%
What is the Law of Large Numbers in Probability?
When the number of tests increases, the empirical will converge with the theoretical probability
How does the Law of Large Numbers apply in Sampling?
The larger the sample size(n) is, the more likely it is to be close to the actual population
What is a Sampling Distribution
Similiar to a frequency distribution
Review- what is inferential statistics?
??
What is the standard error of the Mean? (shorthand Std. Error in SPSS)
What are the two factors that determine the Std. Error of the Mean?
Std. Error a measurement of the Error of the Sample mean from the true Population mean.
1) n/ sample size
2) the variation/ std. deviation in the sample size(i.e. income report of a sample)
What does the confidence interval mean?
Based on sample standard error calculation, figuring out a range of values that the True Mean of the population would be
How to calculate the 95% confidence interval of a Mean?
By adding or subtracting 1.96 of the standard deviations from the mean
95%CI= Sample Mean +/- 1.96(Std.Er)
In his Central Limit Theorem simulation, what did the lecturer do?
…
What kind of relationship does the Sample size n have with the Std. Error of the Mean?
Reverse relationship. Bigger the n, smaller the Std. Error of mean
Why does the Std. Error of the Mean Matter?
You can calculate the true mean of the population based on the formula and calculation of Std. Error of mean(confidence intervals)
What does statistical significance actually mean? How is it determined?
The generalizability to the population
By running significance tests
Refresher: What is a research hypothesis? What is the difference between a research and a null hypothesis?
The research hypothesis is a theory or assumption researchers come up with based on prior knowledge and evidence.
A research and a null hypothesis state two opposite statements and seem to contradict one another. But the purpose of both is to work towards proving the research theory
Why do we want to disprove Null-hypothesis? And how?
Based on Karl Popper’s philosophy.
We want to disprove or discredit a null hypothesis because DISproving there is No relationship between two variables is actually One way to establish there IS a relationship.
What does falsification. mean and why is it important?
Falsification means setting of to disprove certain hypothesis. the process of repeated attempts to “disprove” or “discredit” a hypothesis. It’s the only way to “proof” or verify a hypothesis
What does falsification protect us against?
Confirmation bias and one-sided evidence that only “supports” a hypothesis
What are the two types of research theory? what does it affect?
Non-directional.
Directional: theorizes relationship and direction of relationship The type of
What is a significance level and when is it used?
It is used before doing a significance test. The level in social science by convention is the 95% Confidence Interval.
What are the two things we are most concerned about in testing a bivariate relationship?
Magntitude: the strength of relationship
Reliability: generalizability of relationship(statistical significance)
How is statistical signficance related to null hypothesis?
p> =0.05- reject null hypothesis. There is a relationship.
What do Type i & ii errors mean?
Type i: Defining a relationship between two variables as generalizable when it isn’t. Also called False positive. 5% error rate of Type I error.
Type ii: Defining there is no relationship when there is a relationship. False negative.
What do Type i & ii errors mean?
Type i: We are saying there is a relationship in the population when there isn’t. False positive. 5% error rate in social sciences.
Type ii: We are saying there is no relationship when there is. False negative.
What is p value?
It’s called probability value. The measuring value of statistical significance. Certain threshhold value determines if relationships between two variables are generalizable to population, or just random occurences within the sample
What is the p value in social sciences for rejecting null hypothesis? In other disciplines?
In social sciences, a relationship Or a difference is significant at 95 percent. This is written in REVERSE in p values, as p represents the probability of Type I errors occuring. p< .05, or p smaller than 5%.
In biochemical disciplines(chemo treatment) p>= .995
What are the mean controversies and problems of statisitical significance tests?
statisitical significance tests
- does not measure the Strength of a relationship
- has type i error rate
- statisitical significance tests depend on sample size
What is a chi-square test and when are chi-squares used? What is an example?
A significance test. It is used with comparing “no relationship scenario” and determine if there WAS a relationship between “categorical variables” I.e. Guy used an extremely simplified If there Is a relationship between two sexes and employment rate. Chi-square is used to distinguish between “no relationship” and “yes relationship”. If p was less than 0.05
what is the degree of freedom?
analyzed in chi square
how many “unconstraint” or”free” cells there are in a crosstabs
relationship between degree of freedom and chi-square?
Shape of chi-sqaure distribtution depends on degree of freedom
When can chi-sqaure NOT be applied?
With dataset that are less than 5
What is a chi-sqaure distribution?
….
What is the chi-square formula?
chi-square= Sum(observed- expected value)Squared/ expected
What are two ways to state a null or research hypothesis?
Either by words or by a math formula
F(frequency)o= F(expected)
What do we focus on in this class in terms of chi-square
Pearson chi-square
What is the default significance cut-off level? What do we compare to?
.05.
How do you interpret the chi square tests?
state p-value, compare it to significance cut-off level. if it’s less than .05, then there is some relationship between two variables, and we REJECT the null hypothesis.
Can we interpret directionality and strength of relationship based on chi-sqaure re
No
What logic is lambda based on?
Error reduction- calculates if knowing the frequency counts of both variables help to reduce the errors of guessing one or another. If it does help, then there is a relationship.
What MofA are for nominal-level variables? what is the limitation of these measures?
Phi, Cramer’s V and Lambda are measures of associations for nominal variables. Because nominal have arbitrary values, the MoA can’t infer the directionality of relationship between the variables.
What MofA are for ordinal-level variables?
Gamma, Tau-b, Tau-c, Somer’s d, Speaman’s rho
Which MofA are covered In Pairs? what does it mean to be”tied on pairs”?
Comparing the two variables
What is a similar pair?
Variable pairs moving in the same relationship direction - either positive or negative
-rule of thumb- cannot be in the same column or row
What does it mean for two values to be “tied on the same independent/ dependent variable”?
both values have the SAME attribute in a variable. i.e. same income or age. They are found in the same row or column
How many cells can a cell form similar pairs with?
There is no limit. Again, similar pair means that both cell share the relationship with the variables in same direction. But doesn’t have to share the same “strength of relationship”
When is MofA Spearman’s rho used?
Continuous ordinal variables are involved. i.e. scale of 1-10
When is correlation, scatterplots and regressions used?
only with two ratio-level variables
What is a null hypothesis again?
The underlying assumption stating that there is NO FOUND association whatsoever between two variables. Yet the study result that fails to reject a null hypothesis should be treated EQUALLY as important
What are tests and concepts of inferential statistics?
Tests ran: correlations, scatterplots, regression testing
Concepts: p value, MofA, Type I & II errors,
What is the significance cut-off for Pearson’s correlation values?
p= +/-.05
What are most important take-aways from this course?
- Overarching concepts
- Differences bw descriptive and inferential statistics
- types of descriptive statistics
- Types of research questions and methodologies suited for types of analysis
- Logic used in the analysis and tests
What is the line of best fit and when is it NOT appropriate?
Only when the relationships are linear. When the relationship is “curvilinear”(the most common type of relationship tho), the line of best fit is NOT appropriate
Examples of curvilinear relationships?
Income and happiness.
There is a “slow down curve” after income reaches 80,000
What is a type i error in a correlation? Can it be eliminated?
False positives
No, not really
What is an spurious relationship in stats?
Two variables that seem causally or strongly related, but really are not
When is line of best fit required to be plotted?
Linear relationships in Pearson’s r correlations and scatterplots
Two variables that can be plotted on scatterplot?
Rating of personality and physical appearance
Can MoA and Measures of Significance determine causality?
No. They are descriptive stats.T he nature of variables of ordinal and nominal offer too little information for causal claims.
Even when MoA determines the strength of a relationship,
they cannot determine causality.
Criteria for causal claims? Which one is the hardest to satisfy?
- Cause come BEFORE the effect (not hard)
- Factual or empirical relationship (not hard. sign. tests and MoA can determine)
- Cause can NOT be explained by other variables- often IMPOSSIBLE
What are examples of
descriptive stat tests?
measures of association, significance test, central tendencies, standard deviations
What does regression calculation create? What does OLS stand for? What does it do?
Regression such as the Ordinary Least Squares creates a LINEAR EQUATION for the set of data, which then creates the line of best fit
Is it stat significant when p is greater than .05, or .995?
The p must be smaller than .05, and the relationship must be greater than 95%
What is the connection between a p value and type i error/false positive?
P values tells you the chances of making a type i error or false positive
A false positive of what?
there is an association between two variables
How do we run a sigf. test for Pearson’s r / correlations?
Determine the p value of the Parson’s r, by a process called interpreting the t-distribution
How do we interpret p<0.05?
There is a stat. sigf. relationship
does it automatically mean there is a “strong relationship” between two variables?
No.
What is the OLS linear equation? what does each legend stand for?
y= a+bx x=iv y=dv a=baseline or constant b= slope/regression coefficient i.e. Y= (0.5)x
Can OLS regression be plotted as Pearson correlation as well?
..
Do you also run a sigf. test with regression tests? Why?
Yes. Always. That’s the first step of establishing that a relationship even exists = rejecting the hypothesis
In OLS, how to interpret the unstandardized B coefficient in relation to the independent variable?
Interpret in unit of 1s
Can you do inferential stats without ratio-level variable?
Yes
Can you include a binary variable (male vs female) in OLS?
Yes. the value is either 0 or 1
What is the significant tests appropriate for each level of variables?
Nominal and ordinal variables: Chi-square
One Ratio and one nominal/ ordinal: ANOVA
Both ratio: Correlation
What are important limitations of significant tests pointed out by critics?
- five percent of time, making type i error of false positive
- Does not tell how strong the relationship is
- Testing larger sample sizes is more likely to to produce significance