W3 Flashcards
(60 cards)
Why do we describe data?
Datasets are often large and complex, so summaries help give insights and make the data easier to process.
What types of summaries can we use to describe data?
Numerical and visual summaries.
What are frequency distributions?
They count the number of responses for each value of a variable and show it as a percentage.
What numerical and visual summaries suit nominal data?
Numerical: Frequency table, mode, Visual: Bar chart, pie chart
What numerical and visual summaries suit ordinal data?
Numerical: Frequency table, percentiles, median, Visual: Bar chart, pie chart
What numerical and visual summaries suit interval and ratio data?
Numerical: Average, standard deviation, min, max, percentiles, Visual: Histogram, scatter plot, box plot
What are the three types of descriptive statistics?
Measures of shape: Skewness, kurtosis, Measures of location: Mean, mode, median, - Measures of variability: Range, interquartile range, variance, standard deviation
What is the limitation of descriptive statistics?
They only give an impression and aren’t enough for conclusions — you need to run tests for that.
Why do we do hypothesis testing?
We want to know about a big population but only have a sample, so we use it to test assumptions about the population.
How does a good sample help in hypothesis testing?
If your sample is representative, you can generalize findings to the full population.
What does inferential statistics try to do?
It tries to find the true value in the population using sample statistics.
What is a hypothesis in statistics?
It’s an educated guess about a population, stated as H0 (null) and H1 (alternative), and they must be mutually exclusive.
What are the steps of hypothesis testing?
- State the hypothesis 2. Define the method & gather data 3. Make a decision using statistics and thresholds
What is the P-value used for?
It’s used to help decide whether to reject the null hypothesis based on the likelihood of the observed result.
What is a P-value?
It’s the probability of getting your result (or something more extreme) if the null hypothesis is true.
What does a low P-value mean?
The result is unlikely under the null hypothesis, so we may reject H0 and consider H1 instead.
How do we interpret the P-value in a distribution?
If your result falls in the middle, it likely has no effect. If it’s in the tail, it might be significant.
What determines statistical significance?
Whether the P-value is smaller than the chosen level of significance (e.g., 0.05).
What are the two types of hypothesis tests?
- Two-tailed (two-sided) 2. One-tailed (one-sided)
What is a two-tailed test?
You test whether there’s any difference — e.g., men and women have different Instagram followers (without saying who has more).
What is a one-tailed test?
You test for a specific direction — e.g., women have more Instagram followers than men.
What is the main difference between one-sided and two-sided tests?
Two-sided tests check for any difference. One-sided tests check for a difference in a specific direction.
What are one-sided tests used for?
Only for t-tests. They cannot be used for ANOVA, regression, or chi-squared tests.
What does R do by default for p-values?
R gives the p-value for a two-sided test. For a one-sided test, divide the p-value by 2 or indicate it directly.