Slides 13, 14 ΰ³€ Flashcards

1
Q

What is the research question regarding prices on Amazon and the UCLA bookstore?

A

Are the prices on Amazon different than the prices at the UCLA bookstore?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How many UCLA courses were sampled for the price comparison?

A

201 UCLA courses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many paired data points were found for the price comparison?

A

68 paired data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the formula for calculating the price difference in the dataset?

A

UCLA Bookstore price - Amazon price

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or False: Consistency matters when analyzing paired data.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the importance of subtracting using a consistent order in paired data analysis?

A

Your results won’t make sense if the order of subtraction is inconsistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the term β€˜difference of two unpaired group means’ refer to?

A

It refers to comparing the average of two groups without creating a new column of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What must be checked separately when working with unpaired data?

A

The Central Limit Theorem (CLT) conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the research question regarding newborns from smoking and non-smoking mothers?

A

Is there convincing evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who don’t smoke?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many cases are included in the smoking group and the non-smoking group in the dataset?

A

Smoking group: 50 cases; Non-smoking group: 100 cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two conditions that must be checked for the CLT when using unpaired data?

A
  • Independence
  • Normality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the T statistic represent when the population standard deviations are unknown?

A

It is used for inference on the difference of two means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is represented by ΞΌn and ΞΌs in the context of birth weights?

A
  • ΞΌn = population mean of nonsmoking mothers
  • ΞΌs = population mean of smoking mothers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the significance level used in the hypothesis test for birthweights?

A

Ξ± = 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What should be used to calculate degrees of freedom when working with more than one sample?

A

The smaller sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the purpose of calculating the standard error in the analysis?

A

To estimate the variability of the point estimate of the population difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What statistical method is used to model the difference of sample means?

A

t-distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is required to conclude the hypothesis test based on the t-score?

A

Determine whether to use pt() or 1 - pt() based on the t-score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Fill in the blank: The weight variable represents the weights of the _______.

A

newborns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What type of data is represented in the ncbirths dataset?

A

A random sample of mothers and their newborns in North Carolina.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the first condition for the Central Limit Theorem in unpaired data?

A

The observations are independent, both within and between samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a common method to check for outliers in data?

A

Visual inspection of the data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is needed to find the area in the tails (the p-value) using R?

A

The degrees of freedom (df)

If working with more than one sample, use the smaller sample size to calculate df.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does a p-value indicate about the strength of evidence?

A

The evidence may be too weak to detect a real difference

P-value is linked to sample sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What type of error occurs when a real difference is not detected?
False Negative Type 2 Error ## Footnote This can happen with small sample sizes.
26
What happens to the ability to find a difference if sample sizes are larger?
We tend to have a better shot at finding a difference if one exists.
27
What does the research question 'Is Version B harder than Version A of a particular exam?' address?
It compares the difficulty of two versions of an exam.
28
What is a potential bias in administering exam versions to students?
Giving students at the front Version A and those at the back Version B.
29
What is the significance of independence in the context of the exams?
Independence within and between groups is satisfied due to random assignment.
30
How is the t-score related to the null distribution?
The t-score is plotted on the null distribution to assess significance.
31
What is the significance level (Ξ±) used in the example?
0.01
32
What is the conclusion if the p-value is larger than the significance level?
Do not reject the null hypothesis.
33
What is a confidence interval around a mean used for?
To estimate the average mercury concentration in a sample.
34
What defines paired observations in statistics?
Each observation in one set has a special correspondence with exactly one observation in another set.
35
What is a research question that involves non-paired samples?
Is there convincing evidence that newborns from mothers who smoke have a different average birth weight than those from mothers who don't smoke?
36
What notation represents a sample statistic or point estimate?
Notation for sample statistic / point estimate
37
What is the purpose of box-and-whisker plots?
To compare distributions between groups.
38
What is a limitation of box-and-whisker plots?
They can hide some variation.
39
What is recommended when plotting the distribution of a single group?
Use a histogram rather than a box plot.
40
What is the purpose of ANOVA?
To compare averages across more than two groups.
41
What is a risk of doing multiple t-tests when comparing more than two groups?
Inflates the chance of making a type 1 (false positive) error.
42
What is the condition for CLT that must be satisfied?
Sample size must be sufficiently large.
43
What should you check when verifying conditions for each sample?
Confirm that the samples are independent of each other.
44
What is the risk of performing multiple t-tests when comparing more than 2 groups?
It inflates the chance of making a Type 1 (false positive) error ## Footnote This occurs because each test increases the likelihood of finding a significant result purely by chance.
45
What does ANOVA stand for?
Analysis of Variance ## Footnote ANOVA is used to determine if there are statistically significant differences between the means of three or more independent groups.
46
What is the main purpose of ANOVA?
To check whether there is evidence that at least one pair of groups is different from each other.
47
What does ANOVA assess in terms of variability?
It assesses the variability of the group means relative to the variability among individual observations within each group.
48
What are the three conditions that must be met to perform an ANOVA?
* The data must be normally distributed * The groups must have similar variances * The observations must be independent
49
What is the null hypothesis (H0) in the context of ANOVA?
The average on-base percentage is equal across the three positions.
50
What is the alternative hypothesis (HA) in the context of ANOVA?
The average on-base percentage varies across some (or all) groups.
51
What is a key aspect to check when performing ANOVA on batting performance data?
The variability (measured with standard deviation) should be approximately equal between the groups.
52
What is 'data fishing'?
Inspecting the data before selecting groups for comparison, which can inflate the Type 1 Error rate.
53
What is Mean Square Between Groups (MSG) in ANOVA?
It measures the variability between group means and is calculated as a scaled variance formula for means.
54
How is the degrees of freedom for MSG calculated?
dfG = k - 1, where k is the number of groups.
55
What does Mean Square Error (MSE) represent in ANOVA?
It represents the variability within the groups and serves as a benchmark for expected variability if the null hypothesis is true.
56
What is the formula for calculating the degrees of freedom for MSE?
dfE = n - k, where n is the total number of observations and k is the number of groups.
57
What is the relationship between MSG and MSE in ANOVA?
ANOVA needs both MSG (between-group variability) and MSE (within-group variability) to determine if there are significant differences.
58
What is within-group variation in ANOVA?
Variations caused by differences within individual groups, as not all values within each group are the same.
59
True or False: ANOVA can be used to compare just two groups.
False ## Footnote ANOVA is designed for comparing three or more groups.
60
Fill in the blank: ANOVA allows researchers to _______ many groups simultaneously.
consider
61
What is a potential consequence of 'data snooping'?
It can lead to incorrect conclusions about the randomness of assignments.
62
What type of data set was used in the batting performance example for ANOVA?
Batting records of 429 MLB players who had at least 100 bats in the 2018 season.
63
What does within-group variation refer to?
Variations caused by differences within individual groups, as not all values within each group are the same.
64
What is the purpose of calculating variability in ANOVA?
To assess variability within groups and between groups.
65
What are MSG and MSE in the context of ANOVA?
* MSG - measure of the variability across/between the groups * MSE - measure of the variability within the groups
66
When is the null hypothesis considered true in ANOVA?
When any differences among the sample means are only due to chance.
67
What does a larger F statistic indicate?
Stronger evidence against the null hypothesis.
68
What is an F-test used for in ANOVA?
To evaluate hypotheses using the F statistic.
69
How is a p-value computed in the context of ANOVA?
From the F statistic using an F distribution with two associated parameters: df1 and df2.
70
What must be checked to validate ANOVA conditions?
* Independence of observations * Nearly normal data * Similar variance among groups
71
What is the first condition for ANOVA?
All observations must be independent.
72
What is the second condition for ANOVA regarding data distribution?
The data in each group must be nearly normal, checking for outliers and symmetry.
73
What is the third condition for ANOVA concerning variance?
The variance inside each group must be approximately equal to the other groups.
74
What should be done if ANOVA shows statistically significant results?
Move onto pairwise comparisons (with the Bonferroni correction).
75
What is an example of a research question related to ANOVA?
Is batting performance related to player position (outfielder, infielder, and catcher) in MLB?
76
What does a p-value smaller than the significance level indicate?
Reject the null hypothesis.
77
What type of software is commonly used for ANOVA calculations?
Statistical software.
78
What is the significance of checking for outliers in ANOVA?
To ensure the data distribution does not significantly deviate from normality.
79
What is the relevance of box-and-whisker plots in checking ANOVA conditions?
To visually assess the distribution, symmetry, and variance of the groups.
80
Fill in the blank: The F statistic is computed from the ratio of _______.
MSG to MSE.
81
True or False: ANOVA assumes that all groups have identical variances.
False.
82
What is the primary goal of ANOVA?
To determine if there are statistically significant differences between the means of three or more independent groups.
83
What does the constant variance condition in ANOVA imply?
The variance in the groups is about equal from one group to the next.
84
What should be considered when checking independence in ANOVA?
Common sense regarding potential reasons why independence may not hold.
85
What was the increase in downloads from January to August?
From about 75 per day to about 95 per day
86
Which car manufacturers have similar IQR and variation?
Ford, Nissan, Toyota, and Volkswagen
87
Which car manufacturers have similar IQR to each other but less than the previous group?
Honda and Mitsubishi
88
What does the y-axis represent in the boxplot showing different forms of dancing?
The number of injuries
89
True or False: Groups plotted in a box plot should always be a categorical ordinal variable.
False
90
What is a common method for arranging groups in a box plot?
Sort them by median value
91
What is the alpha level used in the ANOVA test mentioned?
0.05
92
What is the p-value indicating strong evidence against chance in the ANOVA test?
0.03
93
What should be done after finding statistical significance in an ANOVA test?
Conduct pairwise t-tests
94
What statistical problem arises when running multiple tests?
Type 1 Error rate increases
95
What method is used to resolve the issue of Type 1 Error in multiple tests?
Bonferroni Correction
96
What is the modified alpha level using Bonferroni correction for three comparisons?
0.0167
97
What are the pooled standard deviation and degrees of freedom from the ANOVA summary table?
s pooled = 13.61, df = 161
98
What is the main purpose of the ANOVA procedure?
To examine the big picture and identify if differences exist among groups
99
True or False: Identifying specific differences as statistically significant is straightforward after ANOVA.
False
100
What does the ANOVA test indicate about the means in each of the classes?
At least one class performed significantly different than the others
101
What are the steps to control the false positive error rate when conducting multiple pairwise comparisons?
Use Bonferroni correction to the alpha level and pooled standard deviation estimate