W3 Flashcards by Eline Olmos van Velden

Why do we describe data?

Datasets are often large and complex, so summaries help give insights and make the data easier to process.

How well did you know this?

Not at all

Perfectly

What types of summaries can we use to describe data?

Numerical and visual summaries.

How well did you know this?

Not at all

Perfectly

What are frequency distributions?

They count the number of responses for each value of a variable and show it as a percentage.

How well did you know this?

Not at all

Perfectly

What numerical and visual summaries suit nominal data?

Numerical: Frequency table, mode, Visual: Bar chart, pie chart

How well did you know this?

Not at all

Perfectly

What numerical and visual summaries suit ordinal data?

Numerical: Frequency table, percentiles, median, Visual: Bar chart, pie chart

How well did you know this?

Not at all

Perfectly

What numerical and visual summaries suit interval and ratio data?

Numerical: Average, standard deviation, min, max, percentiles, Visual: Histogram, scatter plot, box plot

How well did you know this?

Not at all

Perfectly

What are the three types of descriptive statistics?

Measures of shape: Skewness, kurtosis, Measures of location: Mean, mode, median, - Measures of variability: Range, interquartile range, variance, standard deviation

How well did you know this?

Not at all

Perfectly

What is the limitation of descriptive statistics?

They only give an impression and aren’t enough for conclusions — you need to run tests for that.

How well did you know this?

Not at all

Perfectly

Why do we do hypothesis testing?

We want to know about a big population but only have a sample, so we use it to test assumptions about the population.

How well did you know this?

Not at all

Perfectly

How does a good sample help in hypothesis testing?

If your sample is representative, you can generalize findings to the full population.

How well did you know this?

Not at all

Perfectly

What does inferential statistics try to do?

It tries to find the true value in the population using sample statistics.

How well did you know this?

Not at all

Perfectly

What is a hypothesis in statistics?

It’s an educated guess about a population, stated as H0 (null) and H1 (alternative), and they must be mutually exclusive.

How well did you know this?

Not at all

Perfectly

What are the steps of hypothesis testing?

State the hypothesis 2. Define the method & gather data 3. Make a decision using statistics and thresholds

How well did you know this?

Not at all

Perfectly

What is the P-value used for?

It’s used to help decide whether to reject the null hypothesis based on the likelihood of the observed result.

How well did you know this?

Not at all

Perfectly

What is a P-value?

It’s the probability of getting your result (or something more extreme) if the null hypothesis is true.

How well did you know this?

Not at all

Perfectly

What does a low P-value mean?

The result is unlikely under the null hypothesis, so we may reject H0 and consider H1 instead.

How well did you know this?

Not at all

Perfectly

How do we interpret the P-value in a distribution?

If your result falls in the middle, it likely has no effect. If it’s in the tail, it might be significant.

How well did you know this?

Not at all

Perfectly

What determines statistical significance?

Whether the P-value is smaller than the chosen level of significance (e.g., 0.05).

How well did you know this?

Not at all

Perfectly

What are the two types of hypothesis tests?

Two-tailed (two-sided) 2. One-tailed (one-sided)

How well did you know this?

Not at all

Perfectly

What is a two-tailed test?

You test whether there’s any difference — e.g., men and women have different Instagram followers (without saying who has more).

How well did you know this?

Not at all

Perfectly

What is a one-tailed test?

You test for a specific direction — e.g., women have more Instagram followers than men.

How well did you know this?

Not at all

Perfectly

What is the main difference between one-sided and two-sided tests?

Two-sided tests check for any difference. One-sided tests check for a difference in a specific direction.

How well did you know this?

Not at all

Perfectly

What are one-sided tests used for?

Only for t-tests. They cannot be used for ANOVA, regression, or chi-squared tests.

How well did you know this?

Not at all

Perfectly

What does R do by default for p-values?

R gives the p-value for a two-sided test. For a one-sided test, divide the p-value by 2 or indicate it directly.

How well did you know this?

Not at all

Perfectly

What test do you use when the dependent variable is metric and you compare one sample with a population value?

One sample t-test.

What test do you use for two dependent samples with a metric variable?

Paired t-test.

What test do you use for two independent samples with a metric variable?

Independent samples t-test.

What test do you use for more than two independent samples with a metric variable?

One-way ANOVA.

What test do you use for non-metric dependent variables?

Chi-square test — goodness-of-fit for 1 sample, and test of independence for 2+ samples.

What do you use when independent variables are metric?

Correlation (1 variable) or regression (2+ variables).

What is a chi-square goodness-of-fit test used for?

Comparing one sample's distribution with a known or expected distribution using proportions.

What does a chi-square test help determine?

Whether the distribution of responses in categories is statistically different from what we expected.

What are the assumptions for the chi-square test?

You need at least 5 observations per group and mutually exclusive categories.

What is the null hypothesis (H0) in a chi-square test?

The proportions of cases in the groups are equal.

What is the alternative hypothesis (H1) in a chi-square test?

The proportions of cases in the groups are not equal.

What R code is used to run a chi-square test with equal expected proportions?

chisq.test(table(df$var), p = rep(1/3,3))

What does a p-value < 0.05 mean in a chi-square test?

The result is statistically significant — we reject H0 and conclude the observed distribution is not equal.

Can you test with unequal expected proportions in a chi-square test?

Yes! You use chisq.test(table(df$string), p = c(0.35, 0.25, 0.40)) to test based on specific expectations.

What does it mean if the observed values differ from expected with low p-value?

It means there's a significant difference between what we expected and what actually happened.

What kind of plots help visualize chi-square results in R?

Bar plots of observed vs. expected frequencies.

What is the chi-squared test for independence used for?

It’s used when we have 2 or more different samples and want to test if there is a relationship between them. Each sample has 2+ categories.

What are the assumptions for the chi-squared test for independence?

Each group needs at least 5 expected frequencies and categories must be mutually exclusive.

What is the null hypothesis (H0) in a chi-squared test for independence?

There is NO relationship between the two variables.

In the chi-square test for independence, what is the alternative hypothesis (H1)?

There IS a relationship between the two variables.

Give a real-life example of the chi-squared test for independence.

Testing if gender (male/female) is related to learning medium preference (e-book, video, book).

What does the R code chisq.test(df$Gender, df$StudyMethod) do?

It performs a chi-square test to see if gender and study method are related.

How are expected values calculated in chi-square independence?

Automatically based on the assumption of independence.

What does it mean if the observed values differ a lot from expected values in a chi-square test?

There may be a relationship between the two variables, and we can reject the null hypothesis.

What is a 1 sample t-test used for?

Comparing the sample mean with a population mean to see if they are significantly different.

What kind of variable is needed for a 1 sample t-test?

A metric variable (ratio, interval, or continuous), assumed to be normally distributed.

What are the hypotheses in a 1 sample t-test?

H0: Sample mean = / ≤ / ≥ population mean, H1: Sample mean ≠ / < / > population mean

Give an example scenario for a 1 sample t-test.

A researcher tests if a sample's depression scores differ from the population value of 4.0.

What R code is used for a 1 sample t-test?

t.test(df$depression_score, mu=4)

What does it mean if the sample mean falls outside the confidence interval?

The difference is statistically significant and you can reject the null hypothesis.

What if the p-value is less than 0.05?

It shows significance — you reject the null hypothesis.

How can you visualise the data before doing a t-test?

Use mean(), sd(), and boxplot() in R to get an idea of the data's distribution.

What is a paired sample t-test used for?

Comparing the mean scores of two sets of observations from the same people (e.g., before and after measurements).

What is the hypothesis for a paired sample t-test?

H0: Mean difference = 0, H1: Mean difference ≠ 0

What does the null hypothesis (H0) mean in the paird t-test test?

The average difference between the two scores is zero — no significant change.

Give an example of when to use a paired t-test.

A company measures employee satisfaction in 2022 and again in 2023 using the same survey.

W3 Flashcards

(60 cards)