Lecture 5-8 Flashcards

1
Q

What is the central limit theorem? (CLT)

A

the larger the sample is, the sampling distribution (plotting the mean on histogram ) , the distribution will eventually result a normal distribution. Even if the variable is not normally distributed or skewed, such as Income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Sampling Distribution

A

Similiar to a frequency distribution.
The sampling distribution charts or graphs the probability of getting an useful value such as the mean.
Relies on repeated samples, and larger sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the population parameter?

A

population-level statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the law of probability? Two types of probability? Examples?

A

Theoretical probability;
How likely an event is going to happen, theoretically. I.e. Flipping coin- 1/2 or 50% chance of heads or tails
represented from 0-1, or %
Empirical probability: What happens in actual reality. Flipping a coin- if got get 6 heads in 6 flips, your empirical probability is 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Law of Large Numbers in Probability?

A

When the number of tests increases, the empirical will converge with the theoretical probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the Law of Large Numbers apply in Sampling?

A

The larger the sample size(n) is, the more likely it is to be close to the actual population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Sampling Distribution

A

Similiar to a frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Review- what is inferential statistics?

A

??

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the standard error of the Mean? (shorthand Std. Error in SPSS)
What are the two factors that determine the Std. Error of the Mean?

A

Std. Error a measurement of the Error of the Sample mean from the true Population mean.

1) n/ sample size
2) the variation/ std. deviation in the sample size(i.e. income report of a sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the confidence interval mean?

A

Based on sample standard error calculation, figuring out a range of values that the True Mean of the population would be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate the 95% confidence interval of a Mean?

A

By adding or subtracting 1.96 of the standard deviations from the mean

95%CI= Sample Mean +/- 1.96(Std.Er)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In his Central Limit Theorem simulation, what did the lecturer do?

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What kind of relationship does the Sample size n have with the Std. Error of the Mean?

A

Reverse relationship. Bigger the n, smaller the Std. Error of mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why does the Std. Error of the Mean Matter?

A

You can calculate the true mean of the population based on the formula and calculation of Std. Error of mean(confidence intervals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does statistical significance actually mean? How is it determined?

A

The generalizability to the population

By running significance tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Refresher: What is a research hypothesis? What is the difference between a research and a null hypothesis?

A

The research hypothesis is a theory or assumption researchers come up with based on prior knowledge and evidence.
A research and a null hypothesis state two opposite statements and seem to contradict one another. But the purpose of both is to work towards proving the research theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do we want to disprove Null-hypothesis? And how?

A

Based on Karl Popper’s philosophy.
We want to disprove or discredit a null hypothesis because DISproving there is No relationship between two variables is actually One way to establish there IS a relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does falsification. mean and why is it important?

A

Falsification means setting of to disprove certain hypothesis. the process of repeated attempts to “disprove” or “discredit” a hypothesis. It’s the only way to “proof” or verify a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does falsification protect us against?

A

Confirmation bias and one-sided evidence that only “supports” a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two types of research theory? what does it affect?

A

Non-directional.

Directional: theorizes relationship and direction of relationship The type of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a significance level and when is it used?

A

It is used before doing a significance test. The level in social science by convention is the 95% Confidence Interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the two things we are most concerned about in testing a bivariate relationship?

A

Magntitude: the strength of relationship
Reliability: generalizability of relationship(statistical significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is statistical signficance related to null hypothesis?

A

p> =0.05- reject null hypothesis. There is a relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do Type i & ii errors mean?

A

Type i: Defining a relationship between two variables as generalizable when it isn’t. Also called False positive. 5% error rate of Type I error.
Type ii: Defining there is no relationship when there is a relationship. False negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What do Type i & ii errors mean?

A

Type i: We are saying there is a relationship in the population when there isn’t. False positive. 5% error rate in social sciences.
Type ii: We are saying there is no relationship when there is. False negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is p value?

A

It’s called probability value. The measuring value of statistical significance. Certain threshhold value determines if relationships between two variables are generalizable to population, or just random occurences within the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the p value in social sciences for rejecting null hypothesis? In other disciplines?

A

In social sciences, a relationship Or a difference is significant at 95 percent. This is written in REVERSE in p values, as p represents the probability of Type I errors occuring. p< .05, or p smaller than 5%.
In biochemical disciplines(chemo treatment) p>= .995

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the mean controversies and problems of statisitical significance tests?

A

statisitical significance tests

  1. does not measure the Strength of a relationship
  2. has type i error rate
  3. statisitical significance tests depend on sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a chi-square test and when are chi-squares used? What is an example?

A

A significance test. It is used with comparing “no relationship scenario” and determine if there WAS a relationship between “categorical variables” I.e. Guy used an extremely simplified If there Is a relationship between two sexes and employment rate. Chi-square is used to distinguish between “no relationship” and “yes relationship”. If p was less than 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is the degree of freedom?

A

analyzed in chi square

how many “unconstraint” or”free” cells there are in a crosstabs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

relationship between degree of freedom and chi-square?

A

Shape of chi-sqaure distribtution depends on degree of freedom

32
Q

When can chi-sqaure NOT be applied?

A

With dataset that are less than 5

33
Q

What is a chi-sqaure distribution?

A

….

34
Q

What is the chi-square formula?

A

chi-square= Sum(observed- expected value)Squared/ expected

35
Q

What are two ways to state a null or research hypothesis?

A

Either by words or by a math formula

F(frequency)o= F(expected)

36
Q

What do we focus on in this class in terms of chi-square

A

Pearson chi-square

37
Q

What is the default significance cut-off level? What do we compare to?

A

.05.

38
Q

How do you interpret the chi square tests?

A

state p-value, compare it to significance cut-off level. if it’s less than .05, then there is some relationship between two variables, and we REJECT the null hypothesis.

39
Q

Can we interpret directionality and strength of relationship based on chi-sqaure re

A

No

40
Q

What logic is lambda based on?

A

Error reduction- calculates if knowing the frequency counts of both variables help to reduce the errors of guessing one or another. If it does help, then there is a relationship.

41
Q

What MofA are for nominal-level variables? what is the limitation of these measures?

A

Phi, Cramer’s V and Lambda are measures of associations for nominal variables. Because nominal have arbitrary values, the MoA can’t infer the directionality of relationship between the variables.

42
Q

What MofA are for ordinal-level variables?

A

Gamma, Tau-b, Tau-c, Somer’s d, Speaman’s rho

43
Q

Which MofA are covered In Pairs? what does it mean to be”tied on pairs”?

A

Comparing the two variables

44
Q

What is a similar pair?

A

Variable pairs moving in the same relationship direction - either positive or negative
-rule of thumb- cannot be in the same column or row

45
Q

What does it mean for two values to be “tied on the same independent/ dependent variable”?

A

both values have the SAME attribute in a variable. i.e. same income or age. They are found in the same row or column

46
Q

How many cells can a cell form similar pairs with?

A

There is no limit. Again, similar pair means that both cell share the relationship with the variables in same direction. But doesn’t have to share the same “strength of relationship”

47
Q

When is MofA Spearman’s rho used?

A

Continuous ordinal variables are involved. i.e. scale of 1-10

48
Q

When is correlation, scatterplots and regressions used?

A

only with two ratio-level variables

49
Q

What is a null hypothesis again?

A

The underlying assumption stating that there is NO FOUND association whatsoever between two variables. Yet the study result that fails to reject a null hypothesis should be treated EQUALLY as important

50
Q

What are tests and concepts of inferential statistics?

A

Tests ran: correlations, scatterplots, regression testing

Concepts: p value, MofA, Type I & II errors,

51
Q

What is the significance cut-off for Pearson’s correlation values?

A

p= +/-.05

52
Q

What are most important take-aways from this course?

A
  • Overarching concepts
  • Differences bw descriptive and inferential statistics
  • types of descriptive statistics
  • Types of research questions and methodologies suited for types of analysis
  • Logic used in the analysis and tests
53
Q

What is the line of best fit and when is it NOT appropriate?

A

Only when the relationships are linear. When the relationship is “curvilinear”(the most common type of relationship tho), the line of best fit is NOT appropriate

54
Q

Examples of curvilinear relationships?

A

Income and happiness.

There is a “slow down curve” after income reaches 80,000

55
Q

What is a type i error in a correlation? Can it be eliminated?

A

False positives

No, not really

56
Q

What is an spurious relationship in stats?

A

Two variables that seem causally or strongly related, but really are not

57
Q

When is line of best fit required to be plotted?

A

Linear relationships in Pearson’s r correlations and scatterplots

58
Q

Two variables that can be plotted on scatterplot?

A

Rating of personality and physical appearance

59
Q

Can MoA and Measures of Significance determine causality?

A

No. They are descriptive stats.T he nature of variables of ordinal and nominal offer too little information for causal claims.
Even when MoA determines the strength of a relationship,
they cannot determine causality.

60
Q

Criteria for causal claims? Which one is the hardest to satisfy?

A
  1. Cause come BEFORE the effect (not hard)
  2. Factual or empirical relationship (not hard. sign. tests and MoA can determine)
  3. Cause can NOT be explained by other variables- often IMPOSSIBLE
61
Q

What are examples of

descriptive stat tests?

A

measures of association, significance test, central tendencies, standard deviations

62
Q

What does regression calculation create? What does OLS stand for? What does it do?

A

Regression such as the Ordinary Least Squares creates a LINEAR EQUATION for the set of data, which then creates the line of best fit

63
Q

Is it stat significant when p is greater than .05, or .995?

A

The p must be smaller than .05, and the relationship must be greater than 95%

64
Q

What is the connection between a p value and type i error/false positive?

A

P values tells you the chances of making a type i error or false positive

65
Q

A false positive of what?

A

there is an association between two variables

66
Q

How do we run a sigf. test for Pearson’s r / correlations?

A

Determine the p value of the Parson’s r, by a process called interpreting the t-distribution

67
Q

How do we interpret p<0.05?

A

There is a stat. sigf. relationship

68
Q

does it automatically mean there is a “strong relationship” between two variables?

A

No.

69
Q

What is the OLS linear equation? what does each legend stand for?

A
y= a+bx
x=iv
y=dv
a=baseline or constant
b= slope/regression coefficient
i.e. Y= (0.5)x
70
Q

Can OLS regression be plotted as Pearson correlation as well?

A

..

71
Q

Do you also run a sigf. test with regression tests? Why?

A

Yes. Always. That’s the first step of establishing that a relationship even exists = rejecting the hypothesis

72
Q

In OLS, how to interpret the unstandardized B coefficient in relation to the independent variable?

A

Interpret in unit of 1s

73
Q

Can you do inferential stats without ratio-level variable?

A

Yes

74
Q

Can you include a binary variable (male vs female) in OLS?

A

Yes. the value is either 0 or 1

75
Q

What is the significant tests appropriate for each level of variables?

A

Nominal and ordinal variables: Chi-square
One Ratio and one nominal/ ordinal: ANOVA
Both ratio: Correlation

76
Q

What are important limitations of significant tests pointed out by critics?

A
  • five percent of time, making type i error of false positive
  • Does not tell how strong the relationship is
  • Testing larger sample sizes is more likely to to produce significance