Ch. 13 Flashcards
(49 cards)
statistics
Descriptive data that involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables.
In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from.
parameters
Corresponding values in the population.
sampling error
The random variability in a statistic from sample to sample.
(Note that the term error here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)
One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population.
A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population.
But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error.
any statistical relationship in a sample can be interpreted in two ways:
There is a relationship in the population, and the relationship in the sample reflects this.
There is no relationship in the population, and the relationship in the sample reflects only sampling error.
The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.
Null hypothesis testing
A formal approach to deciding between two interpretations of a statistical relationship in a sample.
One interpretation is called the null hypothesis (often symbolized H0 and read as “H-zero”).
This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error.
Informally, the null hypothesis is that the sample relationship “occurred by chance.”
The other interpretation is called the alternative hypothesis (often symbolized as H1).
This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.
Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:
Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
Determine how likely the sample relationship would be if the null hypothesis were true.
If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis.
If it would not be extremely unlikely, then retain the null hypothesis.
p value
A crucial step in null hypothesis testing is finding the probability of the sample result or a more extreme result if the null hypothesis were true.
This probability is called the p value.
A low p value means that the sample or more extreme result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis.
A p value that is not low means that the sample or more extreme result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis.
α (alpha)
The criterion that shows how low a p-value should be before the sample result is considered unlikely enough to reject the null hypothesis (Usually set to .05).
If there is a 5% chance or less of a result at least as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected.
When this happens, the result is said to be statistically significant.
If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained.
This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it.
Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”
Role of Sample Size and Relationship Strength
the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true.
That is, the lower the p value.
Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small.
shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant.
If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone.
It is extremely useful to be able to develop this kind of intuitive judgment.
One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses.
A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.
Statistical Significance Versus Practical Significance
A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample.
But the word significant can cause people to interpret these differences as strong and important.
This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result.
Practical significance
Refers to the importance or usefulness of the result in some real-world context.
Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant.
t- test
A test that involves looking at the difference between two means.
one-sample t-test
Used to compare a sample mean (M) with a hypothetical population mean (μ0) that provides some interesting standard of comparison.
The null hypothesis is that the mean for the population (µ) is equal to the hypothetical population mean: μ = μ0.
The alternative hypothesis is that the mean for the population is different from the hypothetical population mean: μ ≠ μ0.
To decide between these two hypotheses, we need to find the probability of obtaining the sample mean (or one more extreme) if the null hypothesis were true.
But finding this p value requires first computing a test statistic called t. (A test statistic is a statistic that is computed only to help find the p value.)
The reason the t statistic (or any test statistic) is useful is
that we know how it is distributed when the null hypothesis is true.
we do not have to deal directly with the distribution of t scores.
If we were to enter our sample data and hypothetical mean of interest into one of the online statistical tools in Chapter 12 or into a program like SPSS the output would include both the t score and the p value.
At this point, the rest of the procedure is simple.
If p is equal to or less than .05, we reject the null hypothesis and conclude that the population mean differs from the hypothetical mean of interest.
If p is greater than .05, we retain the null hypothesis and conclude that there is not enough evidence to say that the population mean differs from the hypothetical mean of interest.
critical values
The absolute value that a test statistic (e.g., F, t, etc.) must exceed to be considered statistically significant.
two-tailed critical values, Each of these values should be interpreted as a pair of values: one positive and one negative.
The idea is that any t score below the lower critical value is in the lowest 2.5% of the distribution, while any t score above the upper critical value (the right-hand red line) is in the highest 2.5% of the distribution.
Therefore any t score beyond the critical value in either direction is in the most extreme 5% of t scores when the null hypothesis is true and has a p value less than .05.
Thus if the t score we compute is beyond the critical value in either direction, then we reject the null hypothesis.
If the t score we compute is between the upper and lower critical values, then we retain the null hypothesis.
two-tailed test
Where we reject the null hypothesis if the test statistic for the sample is extreme in either direction (+/-).
This test makes sense when we believe that the sample mean might differ from the hypothetical population mean but we do not have good reason to expect the difference to go in a particular direction.
one-tailed test
Where we reject the null hypothesis only if the t score for the sample is extreme in one direction that we specify before collecting the data.
This test makes sense when we have good reason to expect the sample mean will differ from the hypothetical population mean in a particular direction.
Each one-tailed critical value can again be interpreted as a pair of values: one positive and one negative.
A t score below the lower critical value is in the lowest 5% of the distribution, and a t score above the upper critical value is in the highest 5% of the distribution.
However, for a one-tailed test, we must decide before collecting data whether we expect the sample mean to be lower than the hypothetical population mean, in which case we would use only the lower critical value, or we expect the sample mean to be greater than the hypothetical population mean, in which case we would use only the upper critical value.
Notice that we still reject the null hypothesis when the t score for our sample is in the most extreme 5% of the t scores we would expect if the null hypothesis were true—so α remains at .05.
We have simply redefined extreme to refer only to one tail of the distribution.
The advantage of the one-tailed test is that critical values are less extreme.
If the sample mean differs from the hypothetical population mean in the expected direction, then we have a better chance of rejecting the null hypothesis.
The disadvantage is that if the sample mean differs from the hypothetical population mean in the unexpected direction, then there is no chance at all of rejecting the null hypothesis.
Dependent-Samples t–Test
Used to compare two means for the same sample tested at two different times or under two different conditions (sometimes called the paired-samples t-test).
This comparison is appropriate for pretest-posttest designs or within-subjects experiments.
The null hypothesis is that the means at the two times or under the two conditions are the same in the population.
The alternative hypothesis is that they are not the same.
This test can also be one-tailed if the researcher has good reason to expect the difference goes in a particular direction.
It helps to think of the dependent-samples t-test as a special case of the one-sample t-test.
the first step in the dependent-samples t-test
is to reduce the two scores for each participant to a single difference score by taking the difference between them.
Difference score: A method to reduce pairs of scores (e.g., pre- and post-test) to a single score by calculating the difference between them.
At this point, the dependent-samples t-test becomes a one-sample t-test on the difference scores.
The hypothetical population mean (µ0) of interest is 0 because this is what the mean difference score would be if there were no difference on average between the two times or two conditions.
We can now think of the null hypothesis as being that the mean difference score in the population is 0 (µ0 = 0) and the alternative hypothesis as being that the mean difference score in the population is not 0 (µ0 ≠ 0).
Independent-Samples t-Test
Used to compare the means of two separate samples (M1 and M2).
The two samples might have been tested under different conditions in a between-subjects experiment, or they could be pre-existing groups in a cross-sectional design (e.g., women and men, extraverts and introverts).
The null hypothesis is that the means of the two populations are the same: µ1 = µ2.
The alternative hypothesis is that they are not the same: µ1 ≠ µ2.
Again, the test can be one-tailed if the researcher has good reason to expect the difference goes in a particular direction.
The t statistic here is a bit more complicated because it must take into account two sample means, two standard deviations, and two sample sizes.
formula includes squared standard deviations (the variances) that appear inside the square root symbol.
Also, lowercase n1 and n2 refer to the sample sizes in the two groups or condition (as opposed to capital N, which generally refers to the total sample size).
The only additional thing to know here is that there are N − 2 degrees of freedom for the independent-samples t- test.
The Analysis of Variance
When there are more than two groups or condition means to be compared, the most common null hypothesis test is the analysis of variance (ANOVA).
ANOVA: A statistical test used when there are more than two groups or condition means to be compared.
One-Way ANOVA
Used for between-subjects designs with a single independent variable.
The one-way ANOVA is used to compare the means of more than two samples (M1, M2…MG) in a between-subjects design.
The null hypothesis is that all the means are equal in the population: µ1= µ2 =…= µG.
The alternative hypothesis is that not all the means in the population are equal.
The test statistic for the ANOVA is called F. It is a ratio of two estimates of the population variance based on the sample data.
One estimate of the population variance is called the mean squares between groups (MSB)
The other is called the mean squares within groups (MSW).
The F statistic is the ratio of the MSB to the MSW.
Again, the reason that F is useful is that we know how it is distributed when the null hypothesis is true.
The precise shape of the distribution depends on both the number of groups and the sample size, and there are degrees of freedom values associated with each of these.
The between-groups degrees of freedom is the number of groups minus one: dfB = (G − 1).
The within-groups degrees of freedom is the total sample size minus the number of groups: dfW = N − G.
Again, knowing the distribution of F when the null hypothesis is true allows us to find the p value.
If p is equal to or less than .05, then we reject the null hypothesis and conclude that there are differences among the group means in the population.
If p is greater than .05, then we retain the null hypothesis and conclude that there is not enough evidence to say that there are differences.
In the unlikely event that we would compute F by hand, we can use a table of critical values.
The idea is that any F ratio greater than the critical value has a p value of less than .05.
Thus if the F ratio we compute is beyond the critical value, then we reject the null hypothesis.
If the F ratio we compute is less than the critical value, then we retain the null hypothesis.
mean squares between groups (MSB)
An estimate of the population variance and is based on the differences among the sample means.