NHST & Sampling Flashcards
(39 cards)
Define sampling error.
The difference between the population value of interest and the sample value. Can be any quality or property of the data, like variance or mean; occurs bc the sample only represents an estimation of the actual population data.
Define sampling distributions
The distribution of a sample statistic (e.g., a mean) when sampled under known sampling conditions from a known population. Effectively the same statistic overlaid across several different trials.
Define the null hypothesis
Any difference b/w sample and population statistic is due to sampling error. The sample and population both represent the same quantity; there is unlikely to be any real difference.
Define the alternative hypothesis
Any difference b/w sample and population statistic is probably not the result of sampling error. The sample and population do not represent the same quantity.
Define NHST
Significance tests are a broad set of quantitative techniques for evaluating the probability of observing the data under the assumption that the null hypothesis is true. Lets us decide if the null hypothesis is more probable than the alternative hypothesis.
Define statistical power
The probability of rejecting the null hypothesis when it is false, or correctly rejecting the null hypothesis.
Define a p-value
A probability value used to determine how likely it is to observe certain values based on sample error alone.
Define alpha value
The probability of rejecting the null hypothesis when it is true; called a significance level. Effectively the cutoff percentage for the risk of erroneously rejecting the null.
Explain why the mean is an unbiased statistic and the variance is a biased statistic
- The mean is an unbiased statistic bc the typical sample mean is equal to the mean of the population. Any sample mean that differs from the population mean is equally likely to be arbitrarily high or low.
- The variance is a biased statistic bc the expected sample variance is usually smaller than the population variance. It does not capture the same value as population statistic.
Explain why sampling error occurs
Sampling error occurs bc the sample only represents an estimation of the actual population data. The sample could have different properties from the population, or misrepresent it.
Explain what two problems sampling error causes in psychological research
1.) Our sample values might not be equal to the population values.
2.) Because of this obfuscation, we can run into a number of difficulties testing scientific hypotheses.
Explain what the sampling distribution for the t-test is based on
T-test distribution: the sampling distribution is based on drawing random samples with known parameters. The means are then compared in relation to the assumed population mean; this lets us find the difference b/w the expected sample mean of the distribution and of the population when a sampling error is made.
Explain what the sampling distribution for the ANOVA is based on
ANOVA distribution: the sampling distribution is based on the ratio of the population variance as estimated between groups vs. within groups.
Explain what the standard error of the mean is conceptually. What does the formula tell you about the relationship between sample size and sampling error?
Conceptually, the SEM is the standard deviation of a sampling distribution. The equation for SEM tells you that sampling error decreases as sample size increases.
Explain the basic logic of NHST.
If we make certain assumptions about the population (e.g., mu = 3) and the sampling process (e.g., random sampling, N= 25), we can determine:
a. ) the expected sample mean.
b. ) the expected difference between an observed sample mean and the population mean when a sampling error is made.
This means that we are evaluating a mean difference (Z-test), relative to how much we would expect means to differ on average (SEM).
Explain when to reject/fail to reject the null.
Reject null = probability of observing the difference < .05%
Fail to reject null = probability of observing the difference > .05%
Explain what the role of the p-value is in NHST
Serves as a probability value that denotes whether or not a result would be considered statistically significant by convention. It helps you assess the result against the chosen critical value for the sampling distribution.
Explain two issues to consider when you use a sample to draw conclusions about a population
1. The sample size is an estimation of the population, and therefore does not have the same properties as the population. Capturing the population value is our actual goal, so this presents some conceptual problems.
2. Depending on how we select our sample, our results can be due to biases in the sample, having an unusual sample population, or to chance alone. This is why the data needs to be able to be replicated in scientific studies.
Explain the difference between the directional and non-directional hypotheses
Non-directional - Ha: μMean1 =/= μMean2
“There is a difference b/w the groups, but we do not know if our sample mean will be greater or less than the population.”
Directional - Ha: μMean1 > μMean2
“There is a difference b/w the groups, and we assume that our sample mean will be greater or less than the population by some amount.”
Explain use of one or two-tailed tests when direction is or is not specified
The cutoff for the Z-score depends on the direction of the t-test. If it is non-directional, then we use a two-tailed test. If it is directional, we use a one-tailed test.
Explain the conceptual definition of within group and between group variances (MSwithin and MSbetween)
a.) Within group variance/MSwithin: each of the three sample variances is an estimate of the population variance. We average the three variances using N - 1 to estimate the population variance. We are estimating the population variance separately within each sample or condition.
b.) Between group variance/MSbetween: we use the sample means in each condition to create a sampling distribution representing only those samples. We can then calculate the variance of these samples to estimate the variance of the sampling distribution of the means.
Explain what is the F-ratio and why do we “want” to get a high value for this ratio?
F-ratio: the ratio of the population variance as estimated between groups vs. within groups.
We want to get a high value bc it demonstrates that we are sampling from groups that have different means w/ higher variance b/w those means. A value higher than 1 implies that the difference b/w population means in each group is increasing.
If you get an F-ratio of 1, what can you conclude without even looking at an F table?
If you get an F-ratio of 1, you can conclude that the variance b/w each group is the same. In that case, any difference is due to sampling error and the means are the same across groups.
What is statistical power and what is its relationship to Type 2 error?
1.) Statistical power is defined as the probability of correctly rejecting the null hypothesis when it is false.
2.) If there is more statistical power (e.g. larger sample size and larger effect), then we are more able to detect whether or not the null we rejected was actually false. This is conditional: 1 - β, where β = probability of failing to reject H0 when it is false.