W3 Flashcards

(60 cards)

1
Q

Why do we describe data?

A

Datasets are often large and complex, so summaries help give insights and make the data easier to process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What types of summaries can we use to describe data?

A

Numerical and visual summaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are frequency distributions?

A

They count the number of responses for each value of a variable and show it as a percentage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What numerical and visual summaries suit nominal data?

A

Numerical: Frequency table, mode, Visual: Bar chart, pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What numerical and visual summaries suit ordinal data?

A

Numerical: Frequency table, percentiles, median, Visual: Bar chart, pie chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What numerical and visual summaries suit interval and ratio data?

A

Numerical: Average, standard deviation, min, max, percentiles, Visual: Histogram, scatter plot, box plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three types of descriptive statistics?

A

Measures of shape: Skewness, kurtosis, Measures of location: Mean, mode, median, - Measures of variability: Range, interquartile range, variance, standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the limitation of descriptive statistics?

A

They only give an impression and aren’t enough for conclusions — you need to run tests for that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we do hypothesis testing?

A

We want to know about a big population but only have a sample, so we use it to test assumptions about the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does a good sample help in hypothesis testing?

A

If your sample is representative, you can generalize findings to the full population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does inferential statistics try to do?

A

It tries to find the true value in the population using sample statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a hypothesis in statistics?

A

It’s an educated guess about a population, stated as H0 (null) and H1 (alternative), and they must be mutually exclusive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps of hypothesis testing?

A
  1. State the hypothesis 2. Define the method & gather data 3. Make a decision using statistics and thresholds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the P-value used for?

A

It’s used to help decide whether to reject the null hypothesis based on the likelihood of the observed result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a P-value?

A

It’s the probability of getting your result (or something more extreme) if the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a low P-value mean?

A

The result is unlikely under the null hypothesis, so we may reject H0 and consider H1 instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we interpret the P-value in a distribution?

A

If your result falls in the middle, it likely has no effect. If it’s in the tail, it might be significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What determines statistical significance?

A

Whether the P-value is smaller than the chosen level of significance (e.g., 0.05).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the two types of hypothesis tests?

A
  1. Two-tailed (two-sided) 2. One-tailed (one-sided)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a two-tailed test?

A

You test whether there’s any difference — e.g., men and women have different Instagram followers (without saying who has more).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a one-tailed test?

A

You test for a specific direction — e.g., women have more Instagram followers than men.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the main difference between one-sided and two-sided tests?

A

Two-sided tests check for any difference. One-sided tests check for a difference in a specific direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are one-sided tests used for?

A

Only for t-tests. They cannot be used for ANOVA, regression, or chi-squared tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does R do by default for p-values?

A

R gives the p-value for a two-sided test. For a one-sided test, divide the p-value by 2 or indicate it directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What test do you use when the dependent variable is metric and you compare one sample with a population value?
One sample t-test.
26
What test do you use for two dependent samples with a metric variable?
Paired t-test.
27
What test do you use for two independent samples with a metric variable?
Independent samples t-test.
28
What test do you use for more than two independent samples with a metric variable?
One-way ANOVA.
29
What test do you use for non-metric dependent variables?
Chi-square test — goodness-of-fit for 1 sample, and test of independence for 2+ samples.
30
What do you use when independent variables are metric?
Correlation (1 variable) or regression (2+ variables).
31
What is a chi-square goodness-of-fit test used for?
Comparing one sample's distribution with a known or expected distribution using proportions.
32
What does a chi-square test help determine?
Whether the distribution of responses in categories is statistically different from what we expected.
33
What are the assumptions for the chi-square test?
You need at least 5 observations per group and mutually exclusive categories.
34
What is the null hypothesis (H0) in a chi-square test?
The proportions of cases in the groups are equal.
35
What is the alternative hypothesis (H1) in a chi-square test?
The proportions of cases in the groups are not equal.
36
What R code is used to run a chi-square test with equal expected proportions?
chisq.test(table(df$var), p = rep(1/3,3))
37
What does a p-value < 0.05 mean in a chi-square test?
The result is statistically significant — we reject H0 and conclude the observed distribution is not equal.
38
Can you test with unequal expected proportions in a chi-square test?
Yes! You use chisq.test(table(df$string), p = c(0.35, 0.25, 0.40)) to test based on specific expectations.
39
What does it mean if the observed values differ from expected with low p-value?
It means there's a significant difference between what we expected and what actually happened.
40
What kind of plots help visualize chi-square results in R?
Bar plots of observed vs. expected frequencies.
41
What is the chi-squared test for independence used for?
It’s used when we have 2 or more different samples and want to test if there is a relationship between them. Each sample has 2+ categories.
42
What are the assumptions for the chi-squared test for independence?
Each group needs at least 5 expected frequencies and categories must be mutually exclusive.
43
What is the null hypothesis (H0) in a chi-squared test for independence?
There is NO relationship between the two variables.
44
In the chi-square test for independence, what is the alternative hypothesis (H1)?
There IS a relationship between the two variables.
45
Give a real-life example of the chi-squared test for independence.
Testing if gender (male/female) is related to learning medium preference (e-book, video, book).
46
What does the R code chisq.test(df$Gender, df$StudyMethod) do?
It performs a chi-square test to see if gender and study method are related.
47
How are expected values calculated in chi-square independence?
Automatically based on the assumption of independence.
48
What does it mean if the observed values differ a lot from expected values in a chi-square test?
There may be a relationship between the two variables, and we can reject the null hypothesis.
49
What is a 1 sample t-test used for?
Comparing the sample mean with a population mean to see if they are significantly different.
50
What kind of variable is needed for a 1 sample t-test?
A metric variable (ratio, interval, or continuous), assumed to be normally distributed.
51
What are the hypotheses in a 1 sample t-test?
H0: Sample mean = / ≤ / ≥ population mean, H1: Sample mean ≠ / < / > population mean
52
Give an example scenario for a 1 sample t-test.
A researcher tests if a sample's depression scores differ from the population value of 4.0.
53
What R code is used for a 1 sample t-test?
t.test(df$depression_score, mu=4)
54
What does it mean if the sample mean falls outside the confidence interval?
The difference is statistically significant and you can reject the null hypothesis.
55
What if the p-value is less than 0.05?
It shows significance — you reject the null hypothesis.
56
How can you visualise the data before doing a t-test?
Use mean(), sd(), and boxplot() in R to get an idea of the data's distribution.
57
What is a paired sample t-test used for?
Comparing the mean scores of two sets of observations from the same people (e.g., before and after measurements).
58
What is the hypothesis for a paired sample t-test?
H0: Mean difference = 0, H1: Mean difference ≠ 0
59
What does the null hypothesis (H0) mean in the paird t-test test?
The average difference between the two scores is zero — no significant change.
60
Give an example of when to use a paired t-test.
A company measures employee satisfaction in 2022 and again in 2023 using the same survey.