Chapter 13: Deriving Standardized Scores Flashcards

(118 cards)

1
Q

This is a bell-shaped, mathematically defined, smooth curve in which the mean, median and mode lie in the exact center. This tapers as it proceeds away from the center of the distribution and, theoretically, approaches the horizontal axis without ever touching it. This serves as the basis from the interpretation of an individual’s norm-reference test scores. This allows for comparisons to be made about different client’s scores on the same test, or about the same client’s scores on different tests.

A

The Normal Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

This describes how the scores spread out from the middle of the distribution. i. 34.13% of all scores fall between the mean (0 SD) and one standard deviation above (or below) the mean. ii. 13.59% of all scores fall between one standard deviation and two standard deviations (whether above or below the mean). iii. 2.14% of all scores fall between two standard deviations and three standard deviations (whether above or below the mean). iv. Only 0.13% of all scores fall beyond three standard deviations (whether above or below the mean). c. Other mathematical constants of the normal curve: i. Half (50%) of the scores always fall below the mean, and half always fall above the mean. ii. About 68% (68.26%) of all scores fall between one standard deviation above the mean and one standard deviation below the mean (±1 SD). iii. About 95% (95.44%) of all scores fall between two standard deviations above the mean and two standard deviations below the mean (±2 SD). iv. About 99% (99.74%) of all scores fall between three standard deviations above the mean and three standard deviations below the mean (±3 SD). d. The normal curve has the wonderful property of equalizing all sorts of standard score scales (e.g., T scores, deviation IQs, z-scores). That is, a deviation IQ of 130, T score of 70, and z-score of +2.00 will always mean the score falls at the 98th percentile. This interchangeability is one of the valuable characteristics of the normal curve.

A

Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

These are raw scores that are meaningless without some context for interpretation. As long as a distribution of scores does not violate an assumption of the normal curve (e.g., skewness), raw scores can be linearly transformed into a standardized score and interpreted with greater meaning.

A

Standardized Scores Based on the Normal Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

This involves converting a (normal) raw score distribution with a given mean and standard deviation into a standardized score distribution conforming to the normal curve characteristics with a standardized mean and standard deviation. i. Making a raw data set “fit” the normal curve allows researchers and test interpreters to communicate a lot about a client’s test score and performance.

A

Linear Transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

This has a mean of zero, and a standard deviation of one and is computed from the raw score distribution using the following formula: where X is the participant’s raw score M is the sample mean SD is the sample standard deviation –1.00 means 1.00 standard deviations; the minus sign indicates “below the mean”; a positive 1.00 standard deviation indicates “above the mean”

A

The z-score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

This provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 50 and the standard deviation is 10. T scores greater than 50 are above the mean, and T scores of less than 50 are below the mean. The interpretation of scores is similar to z-scores, but the index of comparison has shifted T scores: M = 50, SD = 10

A

A T Score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

This is a Deviation IQs, which in the education and counseling fields are frequently referred to simply as “standard scores,” are commonly used to describe scores on intelligence, achievement, and perceptual skills tests. b. A deviation IQ score provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 100 and the standard deviation is 15. i. Nearly all tests use a standard deviation of 15. ii. Standard scores greater than 100 are above the mean, and standard scores of less than 100 are below the mean. The formula for converting a raw score into an SS (standard score) score is or SS = 15 (z) + 100. 4.

A

Deviation IQ and Standard Scores (M = 100; SD = 15)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

These are the Types of Standard Scores Used in Counseling, Education, and Research a. z-scores b. T scores c. Deviation IQ scores d. Normal curve equivalents (NCEs) are standard scores with a mean of 50 and a standard deviation of 21.06.

A

a. z-scores b. T scores c. Deviation IQ scores d. Normal curve equivalents (NCEs) are standard scores with a mean of 50 and a standard deviation of 21.06.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

This is short for “standard nine,” a system that divides the normal curve into nine equidistant segments, using the formula 2z + 5 and rounding to the nearest whole number. i. Stanines 2 through 8 represent a ½ standard deviation range with the 5th stanine straddling the mean (i.e., ±¼ SD [z-scores of –0.25 to +0.24]). ii. Stanines are frequently used in large-scale achievement testing programs, but they should be interpreted with caution.

A

A Stanine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

These have a mean of 10 and a standard deviation of 3, and are frequently used in intelligence, achievement, or perceptual skills measures to report on subtest or subscale scores. i. The scaled scores for these several subtests can be summed and converted into a standard score (M = 100; SD = 15). ii. One example is the Wechsler Scales (e.g., Wechsler Adult Intelligence Scales)

A

Scaled Scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

These have a mean of 500 and standard deviation of 100. i. The Scholastic Assessment Test (SAT) is an example.

A

CEEB (College Entrance Examination Board) Scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

This indicates the percentage of observations that fall below a given score on a measure plus one-half of the observations falling at the given score. i. Percentile ranks are easy to understand when one visualizes a line of 100 individuals with characteristics similar to the reference group under study (e.g., age, grade, sex), with the first individual standing in line possessing the least amount of the construct under study and the 100th person standing in line possessing the greatest amount of the construct under study.

A

Percentile Ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In this, if a participant scored at the 37th percentile rank (i.e., P37), this means that her performance exceeded 37 percent of all those comprising the reference group. We compute percentile ranks from a raw score distribution using the following formula: where PR is the percentile rank n is the sample size cf is the cumulative frequency f is the frequency of occurrence of the value being determined ii. Percentile ranks should not be confused with percentage scores. A percentage is the percent correct out of the total.

A

Percentile Ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

This indicate an individual’s score as referenced to a comparison group of individuals with like characteristics—thus the term norm-referenced. 2. A percentage score indicates only the percentage of responses that met some criterion of correctness but not referenced or related to any group of individuals or scores—thus the term criterion-referenced

A

Percentile Ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

These are not in equal intervals so they cannot be subjected to mathematical operations such as addition, subtraction, or multiplication. 1. These must be converted to standard scores, subjected to mathematical operations, and then converted back into percentiles. iv. These are relatively easy to understand and to explain to those who have little or no test sophistication. v. Reporting a quartile is a common way of dividing a percentile rank distribution into four portions. 1. Q1 covers P0–P24; Q2, P25–P49; Q3, P50–P74; and Q4, P75–P99

A

Percentile Ranks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

This can help convey performance information to those individuals who are unskilled in test score interpretation. Interpretive ranges are verbal performance descriptors such as Average, High Average, and Very Superior which help clients understand their performance in simple terms.

A

An Interpretive Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In this, Counselors can use conversion tables to convert different types of scores to make test interpretation more consistent. Conversion of these diverse types of scores (e.g., T scores, z-scores, percentile ranks, and deviation IQs) to a single scale makes comparisons simple and meaningful.

A

Converting One Standardized Score into Another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In this, it is sometimes of interest to determine the difference or distance between certain types of standardized scores, as well as the area between certain scores under the normal curve (such as figuring out what percent of the population may fall between two standard scores). b. To do this, standardized scores are converted into z-scores and compared using a table of values for areas under the normal curve. Using the information in the table, the two scores can be subtracted or manipulated depending upon what the researcher is trying to find. i. If the z-score is positive, the value from the table is added to .50 to indicate the area above the mean.

A

Finding the Distance Area Between Given Scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Researchers frequently analyze theories or models, develop questions or hypotheses about how variables will behave as predicted by these theories or models, and design experiments to collect data that will answer the questions/hypotheses, thus confirming or rejecting the expected results. This process is based on logical, analytical questioning procedures. b. The statistics used in a study are generally inferential statistics because a researcher makes a judgment on a population parameter based on sampling data. i. If a sample of participants has been randomly or in another way faithfully obtained, the assumption is that the results represent or extend to the population of interest. c. When conducting a study for empirical purposes, three steps are required: i. Translate the research question/hypothesis into a statistical (null) hypothesis. ii. Design, conduct, and analyze the results (data) of the study using an inferential statistic. iii. In this, you determine whether to reject or retain (never accept!) the null hypothesis.

A

Process: Statistical Hypothesis Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

This is the first step in statistical hypothesis testing is to convert the research question or research hypothesis into a statistical hypothesis, more commonly known as the null hypothesis and alternative hypothesis. i. Because inferential statistics are about a population, the hypotheses are created using parameter statistics.

A

Steps to Identifying the Null and Alternative Hypotheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

This is the hypothesis that is rejected or retained with inferential statistics and is often the opposite of what the researcher believes to be true (i.e., no difference exists).

A

The Null Hypothesis (HO)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

This is generally the research hypothesis and is a statement of what occurs if the null hypothesis is rejected. They are typically written in statistical notation as follows:

i. In hypothesis testing, the null hypothesis is always tested.
ii. The null hypothesis (H0) indicates that there is no difference between the groups. If the null hypothesis is retained, then no statistical group differences were found.
iii. If the null hypothesis is rejected, then the alternative hypothesis (H1) holds true (i.e., statistical differences do exist between the groups).

A

The Alternative Hypothesis (H1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

iv. The researcher never “accepts” that the null hypothesis (H0) is true. This is because the null hypothesis is a probability-based statement, and there are at least two additional reasons the experimenter must consider to explain the result:
1. A true difference may have existed, and the sample result did not faithfully express this true population result.
2. There may have been bias in the experimental procedures that led to a conclusion of no difference when a difference did, in fact, exist.
a. Retaining or rejecting the null hypothesis is directly related to the amount of error the researcher chooses to allow in the study.
b. Whether H0 is retained or rejected, researchers reach one of two decision outcomes: The researcher either makes a correct decision or an incorrect decision.
c. There are two types of error to consider in hypothesis testing: type I error and type II error.

A

The Alternative Hypothesis (H1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

. Type I Error a. This type of error occurs when the null hypothesis is rejected but should have been retained.

i. The researcher determined that statistically significant differences existed between the groups when the difference actually did not exist.
ii. The amount of type I error is identified as α (alpha).
iii. The researcher has some control over alpha, although sample size also influences alpha.
iv. The researcher always establishes the amount of type I error allowed at the outset of the study.

A

Type I Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
This type of error occurs when the null hypothesis is retained, but should have been rejected. i. The researcher determined that statistically significant differences did not exist between the groups when the difference actually did exist. ii. The amount of type II error is identified as β (beta). iii. This type is conversely related to statistical power (1 – β): the likelihood of rejecting the null hypothesis given that the null hypothesis should be rejected. 1. Statistical power is the likelihood of finding statistically significant differences, given that these differences actually exist (i.e., a correct decision). 2. There are mathematical formulae to compute statistical power and type II error, known as a power analysis. 3. Studies that lack sufficient power are more likely to make a type II error. The best way to avoid this problem is to have a sufficient sample size. Usually, a sample size of 15 or more for each group suffices.
Type II Error
26
b. Reasons why type I and type II errors may occur: i. Participants in the sample may not be representative of the population. ii. The sampling procedure could be biased. c. Consequences of type I and type II errors: i. Type I error can lead to ineffective treatments being implemented because researchers falsely believe the treatments to be effective. ii. Type II errors can lead to effective treatments not being implemented.
Type II Error
27
In this, a. The alpha level is the level of significance and is set at the beginning of the study. i. The researcher identifies an alpha level, signified by α, as the amount of type I error the researcher is willing to allow in the study. ii. Typical alpha levels are around .05 to .01. Alpha levels can be larger or smaller. Alpha levels ranging from .10 down to .001 are not uncommon. b. Researchers review the literature and choose a level of significance based on previous research. c. Higher alpha levels, such as .05 or .10, allow for a greater chance of type I error (5 percent or 10 percent, respectively) and are therefore considered more liberal tests of statistical significance (α = .10 is frequently used to indicate a “trend” result). d. Lower alpha levels, such as .01 or .001, allow for a reduced chance of type I error (1 percent or .1 percent, respectively), and are therefore considered more conservative tests of statistical significance. However, this also decreases the power.
Establishing a Level of Significance
28
With this, once a level of significance is established, a critical value can be determined and a test statistic can be calculated. b. Scores that are beyond the stated level of significance and critical value are considered to be statistically significant. i. The critical value will always match the level of significance, but the critical value changes based on the type of test statistic used, the number of groups being compared, and the number in the sample size.
Selecting, Calculating, and Evaluating the Test Statistic
29
This occurs when the researcher predicts the two groups will differ but does not predict which group will be higher and statistical significance can be found on either side of the normal curve. i. The shaded areas are referred to as the region of unlikely values because at the given level of probability it is unlikely that differences this great would occur by chance. ii. The unshaded areas are referred to as the region of likely values because at the given level of probability it is likely that differences this great would occur by chance.
A Nondirectional Test
30
1. In this, if a researcher set the alpha level at .05, indicating a 5 percent chance of type I error, scores that fall in the middle 95 percent of the normal curve are not statistically significant. Scores beyond that point, designated by α are statistically significant. 2. If the researcher has established a 5% type I error rate, we can be 95 percent confident in the results.
A Nondirectional Test
31
In this, the researcher hypothesizes that a given score will be either higher or lower than the chosen level of significance. i. For directional hypotheses, the null hypothesis is written the same, but the alternative hypothesis identifies the direction. 1. Alternative hypotheses will read as, “Participants in the treatment group will perform significantly higher on the outcome measure than the mean of the standardization population.” OR 𝐻0∶m1= m2 𝐻1∶m1 ˂ m2 ii. The level of significance is placed to one side, rather than divided between both sides of the normal curve. 1. The critical value is lower on a one-tailed z-test because 5 percent of the type I error rate is placed to one side of the normal curve, not divided by both sides.
A Directional Test
32
This is the likelihood of finding statistically significant differences given that statistically significant differences actually do exist. i. Power is the likelihood of rejecting the null hypothesis when it actually should be rejected. ii. Power is directly related to type II error. The more power in a study, the less chance there is to identify a nonsignificant difference when a significant difference exists. iii. Statistically, power is expressed as 1 – β with type II error expressed as β. iv. The power of a study is dependent on several factors, including sample size, alpha level, effect size, error, and appropriate use of nondirectional and parametric tests.
Statistical Power
33
In this, the most effective method of increasing power in a study is to increase sample size. i. A large sample size increases the likelihood of finding statistically significant differences. Increasing sample size, however, will only substantially increase power when the original sample size is small. c. The alpha level chosen at the outset of the study has an impact on power. i. When the alpha is set at the .10 level of significance, as opposed to .05, the critical value is lowered and the likelihood of finding a statistically significant difference increases. ii. As the likelihood of making a type I error increases, the likelihood of making a type II error decreases, so there is an inverse relationship between them. d. Increasing effect size also increases power. i. The greater the magnitude of the difference between groups, the fewer participants who are needed to identify statistical significance.
Statistical Power
34
ii. In this, effect size increases as the various levels of an independent variable work to create substantial differences in the dependent variable. Thus, the more effective a treatment is, the greater the resulting effect size. e. Power is also influenced by error; the less error measured in a study, the more power. i. Researchers minimize potential sources of systematic and random error by controlling extraneous variables, using dependent variables that produce reliable scores, and designing studies to account for threats to internal validity. f. When all other things are equal, nondirectional (two-tailed) tests are more powerful than directional (one-tailed) tests. Likewise, parametric tests (e.g., t-test, ANOVA) are more powerful than nonparametric tests.
Statistical Power
35
In this, some researchers argue that there is too much emphasis on statistical significance and not enough on practical significance. They argue the need to report practical significance along with statistical significance because knowing where not to look for answers can be just as important as knowing where to look for answers. b. Having sufficient power in a design can be very important to the manner in which results are reported and ultimately published. i. Power is usually deemed sufficient at .80; that is, an 80 percent chance of finding statistically significant differences when they actually do exist and a 20 percent chance of type II error. ii. Post-hoc power analyses identify the amount of statistical power after hypothesis testing was conducted. Many statistical software programs such as SPSS and G*Power offer this service. To calculate power (1 – β), the phi statistic (ɸ) must be calculated as follows: Once phi is calculated, power can be determined using charts of power curves.
Evaluating Practical Significance
36
These are not based on a normal distribution and may often have less stringent criteria but may also reduce statistical power (e.g., produce results that are less likely to be statistically significant and inflate the chance of a type-II error).
Nonparametric Statistics
37
These are used in hypothesis testing when the groups that are being studied are normally distributed on a measure that uses interval or ratio data.
Parametric Statistics
38
Univariate parametric statistics focus on the measurement of one dependent variable.
Univariate parametric
39
In this, the four most popular univariate tests (i.e., one dependent variable) are: i. z-test ii. t-test iii. Analysis of Variance (ANOVA) iv. Analysis of Covariance (ANCOVA) b. Common characteristics among univariate tests: i. Each test is computed from a fraction. ii. The numerator represents a computation for mean differences, such as by comparing two groups and subtracting one mean from another. 1. The numerator is an expression of differences between groups, often referred to as between-groups differences iii. The denominator looks at error, or differences that exist within each group, often referred to as within-group differences. 1. The denominator is an error term, computed by taking into account the standard deviation or variance, and sample size. 2. A test statistic is computed by taking into account a score that represents the average divided by a score that represents error. 1 2 error - differences Test statistic = or or differences mean differences μ μ between-group error s within-gro
Selecting, Calculating, and Evaluating for Statistical Significance
40
In other words, the numerator value expresses differences between groups, while the denominator expresses the error value. 4. The greater the differences between groups compared to lower level of errors, the more confident the researcher can be that any difference is not due to chance errors.
Selecting, Calculating, and Evaluating for Statistical Significance
41
This is to compare a sample mean to a previously known population mean. i. The population mean serves as the “control group” or group of comparison. ii. The z-test identifies whether there is a statistically significant difference between means of the sample and the population. b. Calculating the z-test: i. The population mean is subtracted from the sample mean and then divided by the standard error of the mean 1. X (M) is the sample mean 2. μ is the population mean 3. X σ is the standard error of the mean X -μ z= σ x
Z-Test: Comparing a Sample Mean to a Population Mean
42
This is to calculate the standard error of the mean, the standard deviation is divided by the square root of the sample size: σ σ , X n. a. σ is the standard deviation of the population mean b. n is the number of participants in the sample 2. A larger sample size contributes to a smaller standard error of the mean. Thus, larger sample sizes lead to small amounts of sampling error.
The standard error of the mean represents the standard deviation of the sampling distribution of the mean.
43
This is to determine whether the observed value,( zobs ) is statistically significant at the α = .05 level of significance, the observed score must be greater than the critical value, zcrit. 1. The critical value is based on the probability of making a type I error. 2. The critical value changes based on the level of significance established for the study (alpha level) and the type of test: directional or nondirectional. ii. When a test statistic is computed, it is compared to a critical value either by comparing the selected alpha level to the actual probability of making a type I error, known as a p-value, or by comparing the observed score to the critical value. 1. Statistical significance exists when the null hypothesis is rejected, which is evidenced by zobs > zcrit or by p < α. iii. Directional tests have smaller critical values because all of the type I error is on one side of the normal curve, and therefore one is more likely to find statistical significance if statistical significance actually exists. 1. This is known as statistical power—the likelihood of rejecting the null hypothesis given that the null hypothesis should be rejected.
Evaluating the z-test for statistical significance
44
This is Unlike the z-test, which compares the sample mean to the population mean, Student’s t-test compares two sample means, and the F-test (also known as ANOVA or Analysis of Variance) compares two or more sample means.
Comparing Two or More Sample Means: Student’s T-Test and the F-Test
45
This is A specific set of conditions must be met for the statistical test to be deemed appropriate and meaningful; these conditions are known as model assumptions. ii. Independence. 1. Independence refers to randomness in selection and assignment. a. A selection of a participant should not be dependent on the selection of another participant. b. Individuals in a population should have an equal chance of being selected and assigned to a group at random. c. Each participant should be observed (measured) only once. d. “Observations within or between groups are not paired, dependent, correlated, or associated in any way.” e. There are specific tests for paired observations, known as repeated measures, which can be used with paired observation.
Comparing Two or More Sample Means: Student’s T-Test and the F-Test Model Assumptions
46
This is concerned with the distributions of the groups being compared in a study. a. Parametric tests assume that each group is normally distributed. b. In a normal sample distribution from a population, means and variances over repeated samples will be uncorrelated and, thus, statistically independent. c. In reality, the consequences of violating the normality assumption are rather minimal, especially when conducting a study with a balanced design (i.e., conducting research with sample sizes being equal in all groups).
Normality
47
This is of variances is concerned with the estimation of within-group differences. a. When conducting an ANOVA or t-test, the researcher is focused on establishing whether statistically significant differences exist between the groups. Thus, a mean for each group is calculated. b. Most participants do not score the exact mean (which is only an estimation of the group’s average performance), this results in error. i. The average amount of error in the group is known as the standard deviation.
Homogeneity
48
ii. The variance (i.e., the standard deviation squared) is the estimated amount of error under the normal curve. 1. In univariate parametric statistics, the variances should be approximately equal to each other. iii. When parametric tests are used, a single term representing the pooled variance (i.e., the combined variances of the group) is used. iv. Equal sample sizes reduce the problem of unequal variances.
Homogeneity
49
This is when groups with larger sample sizes also have larger variances and groups with smaller sample sizes have smaller variances, the likelihood of making a type I error (alpha) is actually lower.
Conservative statistical test
50
This is when groups with larger sample sizes have smaller variances and groups with smaller sample sizes have larger variances, the likelihood of making a type I error (alpha) is higher.
Liberal statistical test
51
t-test and ANOVA are robust to heterogeneous designs when sample sizes are equal. i. The Levene test and Brown-Forsythe test, available on statistical software, are tests for homogeneity of variance but are still influenced by model assumptions and sample size.
Liberal statistical test
52
Degrees of freedom (df) i. Sample size and group comparisons can differ from study to study, and so can the distributions. Thus, probability values under the normal curve can change. ii. Degrees of freedom are calculated to provide an estimate of the normal curve given the number of groups and sample size of each group.
Liberal statistical test
53
In Calculating this, a. this (sometimes called a t-test for independent samples), a statistical test is run to determine if a statistically significant difference exists between two independent sample means. i. Independent in this context means that the two means are not connected to the same sample of participants. b. The sample mean of one group is subtracted from the sample mean of the second group and then divided by the pooled standard error i. 1 X is the sample mean for the first group ii. 2 X is the sample mean for the second group iii. x 1 x 2 S - S is the pooled standard error 1 2 1 2 – s – s = X X x x The pooled standard error represents the combined variance terms from each of the two groups.
Calculating the Independent t-test
54
``` c. To calculate the standard error, the variance and samples of each group need to be known. 2 2 1 2 2 1 2 1 2 1 1 1 = ( –1) + s (n –1) + æ ö÷ çç ÷ççè ø÷÷ s – s s n x x n n d. To answer the question, “Is there a statistically significant difference between the two groups? Can it be stated with 95 percent confidence (α = .05) that the difference in these scores is outside the realm of chance?” complete the following steps: i. Decide if the test is directional or nondirectional and set the alpha level. ii. Compute the variance from the standard deviation iii. Compute the degrees of freedom df1 = n1 − 1 df2 = n2 − 1 df1 = df1 + df2 iv. Compute the pooled error variance 1 2 2 2 1 1 2 2 1 2 1 2 ( 1) ( 1) 1 1 ( 1) ( 1) - + - æ ö÷ - = çç + ÷- + - çç ÷÷ è ø x x s n s n s s n n n n v. Conduct the t-test 1 2 1 2 – s – s = X X x x t ```
Calculating the Independent t-test
55
vi. Calculate the critical value using a critical value t-test chart 1. In the event the degrees of freedom are not represented in the table, use the lower degrees of freedom between the two values for a more conservative estimate. e. Evaluating the Independent t-test for Statistical Significance: i. If you want to determine if an observed score is statistically significant at the .05 level of significance, the observed score must be greater than the critical value, tcrit. 1. tcrit is typically located in a chart under the α = .05 column 2. tobs > tcrit means there is a statistically significant difference between groups 6. Calculating the Dependent T-Test a. A dependent t-test (sometimes called a t-test for correlated means) compares two means, but in this case the means are compared from the same sample of participants across time, such as with a pretest and posttest administered to the same group. i. Whereas in an independent t-test two sets of independent means are compared between two randomized groups, in a dependent t-test, two sets of correlated means are compared for a set of participants.
Calculating the Independent t-test
56
ii. The two sets of means in a dependent t-test may result from a study in which individuals initially take a pretest, then experience a treatment/intervention, and finally take a posttest. iii. A second common scenario involves a comparison between pairs of individuals who have been matched due to commonality in one or more characteristics. 1. A dependent t-test is conducted to determine significant differences between participants who received the intervention (i.e., treatment) versus participants who did not receive the intervention (i.e., control). 2. The advantage of this design is the presence of a control group. b. To conduct a dependent t-test, the correlation coefficient (r) must be calculated between the pretest and posttest scores or the scores of the matched pairs: 1 2 1 2 2 2 1 2 1 2 1 2 1 2 1 2 2 - - = = - æ ö÷æ ö÷ + - çç ÷çç ÷ ç ÷÷ç ÷÷ çè ø÷èç ø÷ X X X X t sx sx s s s s r n n n n i. For the dependent t-test, the sample size (n), standard deviation (s), the variance for each group (s2), and the correlation coefficient (r) between the two groups are needed. c. To answer the questions, “Is there a statistically significant difference between two scores over time? Can it be stated with 95 percent confidence (α = .05) that the difference in these scores is outside the realm of chance?” complete the following steps: i. Decide if the test is directional or nondirectional and set the alpha level H0 : μ1 = μ2. H1 : μ1 ≠/>/< μ2 (depending upon directionality) ii. Compute the variance from the standard deviation. iii. Compute the degrees of freedom (The total number of pairs is subtracted by 1: df = n – 1). iv. Compute the pooled error variance. v. Conduct the t-test. vi. Calculate the critical value using a table. d. Evaluating the Dependent t-test for Statistical Significance i. To determine whether the observed score (tobs) is statistically significant at the α = .05 level of significance, the observed score must be greater than the critical value tcrit. ii. When reporting the statistical significance mathematically, it is written t(df) = critical value
Calculating the Independent t-test
57
This is a. A one-way ANOVA answers the question “Is there a difference among group means?” and is used in designs with a single independent variable (with two or more groups or levels) and single dependent variable (using an interval or ratio scale). b. An ANOVA is conducted when two or more group means (J) are being compared.
The F-Test: One-Way Analysis of Variance (ANOVA)
58
i. The relationship between a t-test and an F-test when J = 2 is F = t2 (or t = F ). ii. When J > 2 (i.e., there are three or more means being compared), statistical significance can be ascertained by conducting one statistical test, ANOVA, or alternatively, by repeated t-tests (not recommended).
The F-Test: One-Way Analysis of Variance (ANOVA)
59
This is Calculating this a. Nearly all statisticians use computerized statistical packages, such as SPSS, to compute the F-test statistic, but seeing the actual logical formula steps may be instructive. b. ANOVA is calculated by dividing the mean differences squared by the error variance. Thus, the numerator will be comprised of mean differences, and the denominator will be comprised of error variance. c. Follow these steps: i. State the null hypothesis and alpha level 0 1 2 3 4 0 1 2 3 4 : : H μ μ μ μ H μ μ μ μ = = = ¹ ¹ ¹
Calculating the F-test
60
1. The numerator of the F-test uses squared values, so the observed value in an F-test is always positive. Thus, the null hypothesis in an F-test is never directional. 2. If any of the groups are not statistically significantly different (either higher or low) from one another, then the null hypothesis is accepted. 3. If a statistically significant difference does exist between or among any groups, then the null hypothesis is rejected.
Calculating the F-test
61
Compute the grand mean. 1. To evaluate whether statistically significant differences among groups are evident, group means [( j X )] (i.e., the mean of each group) are compared to a grand mean ( X.) (i.e., the mean of the entire sample). 2. The grand mean can be computed with the following formula: j j j n X X.= n å å 3. The numerator is the sum of the sample size for each group, multiplied by the mean of each group ( j X ), and the denominator is the sum of the sample size from each group (n.).
Calculating the F-test
62
iii. Calculate the Sum of Squares Between (SSB) 1. The Sum of Squares Between (SSB) is the squared sum of the differences between each group mean and the grand mean. 2. By comparing each group mean to the mean of the entire sample, the amount of variation between the groups can be assessed. 3. Calculate SSB as follows: SSB = å nj (X j - X.)2 , nj is the number of participants in a group ( .) j X - X is the magnitude of the difference between a group mean and grand mean; it is sometimes referred to as a treatment effect.
Calculating the F-test
63
iv. Calculate the Sum of Squares Within (SSw) 1. The Sum of Squares Within (SSW) is the sum of the squared differences between each observation (i.e., raw score) and the group mean. 2. By comparing each observation to the group mean, the amount of variation within each group can be assessed. 3. Statistical software, such as SPSS, are almost always used to compute this. 4. Both SSB and SSW account for group size when computed. 5. Degrees of freedom must be computed for SSB and SSW to establish the ratio of the F-test. This ratio is referred to as mean squares.
Calculating the F-test
64
v. Compute mean square between (MSB) 1. To compute MSB, the degrees of freedom between (dfb or [j – 1]) must be computed—that is, the number of groups minus 1 or ( j – 1).
Calculating the F-test
65
vi. Compute mean square within (MSW) 1. To compute MSW, the degrees of freedom within (dfw or [n. – j]) must be computed; that is, the total sample size minus the number of groups or (n. – j).
Calculating the F-test
66
vii. Compute the F-ratio / 1 / . B B w w MS SS j F MS SS n j - = = - d. Evaluating the F-test for Statistical Significance: i. To determine whether or not the observed score is statistically significant at the .05 level of significance (the level decided on prior to the study), the observed score must be greater than the critical value, Fcrit. 1. The critical values for F values are typically located in a table. ii. At this point, we only know that a difference exists, not specifically which groups display differences. 1. To determine this, we must conduct post hoc analyses.
Calculating the F-test
67
This is a. When statistically significant differences are found in an ANOVA, what is really known is that statistically significant differences exist among the groups. b. Exactly what groups are different or where the statistically significant differences lie are not explained in ANOVA results. i. Post hoc analyses are the statistical tests conducted to indicate exactly where statistically significant differences exist. ii. Post hoc analyses are only conducted when ANOVA results indicate statistical significance. c. Tukey, Newman-Keuls, and Scheffé are some of the most commonly used post hoc procedures.
Post Hoc Analysis
68
``` i. The most common method for post hoc analysis is the Tukey method of multiple comparisons (TMC). ii. Post hoc comparisons are common in statistical packages, such as SPSS and SAS. iii. TMC tests all possible pairwise comparisons: ( 1) , 2 j j C - = C is the number of pairwise comparisons j is the number of groups For example: group 1 to group 2 X1® X2 group 1 to group 3 X1® X3 group 1 to group 4 X1® X4 group 2 to group 3 X1® X3 group 2 to group 4 X1® X4 group 3 to group 4 X1® X4 ```
Post Hoc Analysis
69
iv. To determine whether statistically significant differences are evident in each pairwise comparison, the absolute value of the mean differences is divided by the standard error term. 1 1 2 j K W j K X X q= MS n n - æ ö÷ çç ÷ ç + ÷÷÷ çè ø j X and K X are the group means being compared MSW is the mean square within nj and nk are the sample sizes in each comparison group 1. Inflated type I error occurs when conducting pairwise comparisons. This can be addressed by conducting a Bonferroni adjustment. In a Bonferroni adjustment, the q statistic is evaluated at α c , where c is the number of pairwise comparisons. a. Use a table to identify critical q values for the identified alpha levels. b. If one pair of groups is statistically significant, then compare the next pair of groups with the next greatest difference until you have either compared all of the pairs of groups or until one pair of groups is not statistically significant. d. The important factors to remember when evaluating research from a statistical package or when reading research is to (a) address the inflated type I error through the Bonferroni adjustment and (b) evaluate statistical significance at the appropriate alpha level.
Post Hoc Analysis
70
10. In this, a. Statistical significance refers to the probability that the rejection of the null hypothesis occurred outside the realm of chance (alpha [α] level). b. Practical significance refers to the meaningfulness of the differences by specifying the magnitude of the differences between the means or the strength of the association between the independent variable(s) and the dependent variable. i. Larger sample sizes increase the likelihood of finding statistical significance but do not always lead to practical relevance. ii. Practical significance is important because it addresses the magnitude of a treatment effect without the complication of sample size, thereby providing more meaningful information that practitioners and researchers can use. iii. The reporting of practical significance is very important when reporting results and mandatory in many social science journals.
Evaluating Practical Significance
71
c. Cohen’s d and Cohen’s f measure effect size in standard deviation units and are used to provide measures of effect size to determine practical significance. i. Currently, statistical packages such as SPSS do not compute these but they can easily be done by hand. d. Cohen’s d i. Cohen’s d is used to determine the effect size for the differences between two groups, such as in a t-test or pairwise comparisons (e.g., Tukey post hoc), and it is expressed in standard deviation units. ii. The following categories are used to interpret d: 1. Small effect, d = .2 2. Medium effect, d = .5 3. Large effect, d = .8 iii. The following formula is used to calculate Cohen’s d: 1 2 pooled X X s - 2 2 1 1 2 1 pooled 1 2 ( 1) ( 1) . ( 1) ( 1) s n s n s n n - + - = - + - e. Cohen’s f i. Cohen’s f also expresses effect size in standard deviation units, but it does so for two or more groups. ii. When conducting an ANOVA, Cohen’s f can be computed to determine the practical significance in the differences among the groups. iii. Cohen’s f will identify the magnitude of the differences among the groups, but it will not explain differences between specific groups. 1. To identify differences between specific groups, a Tukey post hoc analysis followed by Cohen’s d for each pairwise comparison would be necessary. iv. The following categories are used to interpret f: 1. Small effect, f = .10 2. Medium effect, f = .25 3. Large effect, f = .40 v. The following formula is used to calculate Cohen’s f: f = ½ d
Evaluating Practical Significance
72
vi. Practical significance is not always measured in standard deviation units and may be expressed in variance units. 1. When conducting parametric statistics in which the focus of the study is on group differences, it is best to express effect size in standard deviation units. 2. As a rule of thumb, Cohen’s d and Cohen’s f may be more informative for ANOVA.
Evaluating Practical Significance
73
Many statistical packages, provide measures of strength of association, especially η2 (eta-squared), so this measure is widely used. i. The following categories are used to interpret strength of association across the three statistics covered in this section (η2, ω2): 1. Small effect = .01 2. Medium effect = .059 3. Large effect = .138 a. Eta-squared may be calculated directly from the computations used in the F-test.
Evaluating Practical Significance
74
``` g. When strength of association is used in ANOVA, Omega squared (ω2) is also a common measure. i. Computation of ω2 uses terms from the ANOVA computation: 2 TOT ( j 1)( ) B W W SS MS ω SS MS - - = + SSTOT is the sum of SSB + SSW j is the number of groups ii. The rationale for using omega-squared, as opposed to eta-squared, is that eta-squared is criticized for overestimating practical significance in ANOVA. ```
Evaluating Practical Significance
75
When doing this, a. A strong experimental design that is not investigated completely or communicated effectively loses credibility. b. But the results section should provide adequate information related to statistical and practical significance. c. A well-written results section should include the following: i. The type of analysis conducted and the level of significance ii. A statement related to meeting model assumptions iii. A report on the statistical test iv. Follow-up procedures, if appropriate v. A statement related to practical significance
Interpreting and Writing Results
76
d. When reporting the results of a statistical test, the reader will encounter a general format: i. Statistical test (e.g., t, F) (degrees of freedom) = observed value, probability of type I error (p) = probability value. 1. F (3, 16) = 11.49, p < .001 2. Signifies that an ANOVA with four groups (4 – 1 = 3 degrees of freedom) and a sample size of 20 (20 – 4 = 16 degrees of freedom) has an observed value of 11.49, and there is less than a onethousandth of 1 percent probability of making a type I error.
Interpreting and Writing Results
77
ii. A common practice when providing results is to refer readers to a table that reports means and standard deviations of each group. iii. ANOVA results may be reported in a table format as well. iv. It is good practice to report model assumptions to identify that the statistical test was appropriate to the data.
Interpreting and Writing Results
78
12. In this a. An ANOVA does not need to be limited to one independent variable. i. A factorial ANOVA is conducted when two or more independent variables are examined across a single dependent variable. b. When an ANOVA is conducted across two independent variables, the F-tests are calculated. i. There is an F-test for each independent variable, called a main effect, and an F-test for an interaction effect; that is, the two independent variables may interact. c. When the data is graphed and similar patterns are noted across each independent variable, then there is no statistically significant interaction. i. When there is no statistically significant interaction, then the main effects of each independent variable can be interpreted in a manner similar to a one-way ANOVA. d. When a statistically significant interaction does exist, the researcher needs to graph the interaction and examine each level of an independent variable across the other independent variable.
Factorial ANOVA
79
13.In this, a. Randomized block factorial ANOVA is used when “blocking variables” have been introduced into a design. i. A blocking variable is one that is suspected of being extraneous or confounding and is therefore incorporated into the ANOVA design. ii. The designated blocking variable allows an additional control for variance in the resulting main effects and interaction effects that would otherwise be unaccounted for but is now accounted for through the introduction of a new independent (albeit extraneous) variable. b. A repeated measures ANOVA is used in studies using a within-subjects design, in which participants are sequentially exposed to different levels of the independent variable (treatment). i. Changes among the scores for the repeated administrations of the single dependent variable for each level become the foci of the analysis. c. A mixed (split-plot) ANOVA is appropriate for studies that use a mixed design—that is, a design using one or more independent variables in a between-groups design and one or more independent variables in a within-subjects design.
Other Forms of ANOVA Designs
80
14. In this, a. Analysis of covariance (ANCOVA) is a statistical analysis that combines ANOVA and regression. b. It is often used to nullify the effects of a confounding variable by statistically removing the variability in the dependent variable caused by the confounding variable.
Analysis of Covariance (ANCOVA)
81
c. The computations for ANCOVA are complex but many statistical packages, including SPSS, can compute ANCOVA. d. Generally, an ANCOVA is used if the results of the ANOVA appear to be biased. i. Studies using ANCOVA do not necessarily remove all of the bias. ii. The best method of reducing bias is randomization, such as in a true experimental design. iii. ANCOVA may not be appropriate or helpful in quasi-experimental studies.
Analysis of Covariance (ANCOVA)
82
This Chapter focused on univariate inferential statistics, exploring the four most popular univariate tests used in counseling research: z-test, t-tests, Analysis of Variance (ANOVA), and Analysis of Covariance (ANCOVA). Univariate statistics examine the measurement of one dependent variable and are frequently used in counseling research. Common characteristics of univariate tests were outlined and the difference between-groups and within-groups were distinguished. Error value and differences among or between groups were described and exemplified using mathematical notations. Each test described in this chapter included an overview, practical application, formula, step-by-step instruction on how to calculate the test, a breakdown of related variables and terminology, and an evaluation for statistical significance with an associated description of how to do so. Critical values, observed values, error, and power were common themes that emerged among each of the tests.
Chapter 16 Narrative
83
Additionally, this chapter reviewed the model assumptions necessary to deem a statistical test meaningful and appropriate. These model assumptions included independence, normality, and homogeneity of variance. Degrees of freedom were also defined according to each test. A discussion of practical and statistical significance was also included, with a breakdown of Cohen’s d and Cohen’s f in relation to practical significance of research findings. Formulas and definitions were provided for each. Lastly, a brief portion of the chapter discusses what should be included in the results section of a research report and how to interpret and write results in a way that clearly communicates the researcher’s practical and statistical findings.
Chapter 16 Narrative
84
1. In this a. Most people have an intuitive understanding of a correlation between two variables. i. For example, when asked about the correlation between IQ and GPA, people usually say something like “The higher the IQ, the higher the GPA of the students.”
Correlation Between Two Variables
85
b. This is a measure of relationship between two variables. i. This can be graphically represented in a scatterplot, a diagram with the predictor variable on the horizontal axis and criterion variable on the vertical axis. ii. Scatterplots can show a linear relationship between the measures on X and Y. 1. Each dot on the scatterplot represents one person’s score on the X and Y variables. 2. The dots group along a straight line with a positive slope (i.e., the direction of the dots is from the lower left to the upper right, or “uphill”). a. In statistical parlance, there is a positive linear relationship (positive correlation) between the two variables X and Y (literacy and average life expectancy, respectively). i. High literacy (X) tends to be associated (paired) with high life expectancy (Y) and low literacy (X) tends to be associated with low life expectancy (Y).
Bivariate correlation
86
3. Dots can also group along a straight line with a negative slope (i.e., the direction of the dots is from the upper left to the lower right, or “downhill”). a. In statistical parlance, this is referred to as a negative linear relationship (negative correlation) between the two variables X and Y (Depression and Vitality, respectively). i. High scores on X (Depression) tend to associate with low scores on Y (Vitality), and low scores on X tend to associate with high scores on Y. 4. There is no correlation between two variables when neither a positive or negative linear relationship is in place
Bivariate correlation
87
2. In this, a. Nature and Interpretation of the Pearson r i. Although a scatterplot of the relationship between two variables, X and Y, is useful, more accuracy is needed to determine the presence (or absence) of a linear relationship between X and Y, its direction (positive, negative), and strength. ii. For variables that are interval or ratio in nature, such information is provided by the Pearson product-moment correlation coefficient (Pearson r). iii. Pearson r summarizes the relationship between two variables as a single number.
The Pearson Product-Moment Correlation Coefficient
88
1. In this, r can take on values from –1.00 and +1.00 a. A positive r indicates a positive linear relationship (i.e., directly related; as scores on X get higher, so do scores on Y ) b. A negative r indicates a negative linear relationship between X and Y (i.e., inversely related; as scores on X get higher, scores on Y decrease). c. The decimal indicates the magnitude or strength of the relationship. i. The closer the absolute value of r to 1.0, the stronger the correlation (on a scatterplot this means that the closer the dots are to the straight line, called a line of regression, the stronger the relationship). ii. When r = 0, there is no linear relationship between X and Y; that is, the scatterplot usually looks circular or without a linear pattern. iii. The extreme positive value r = 1.0 indicates that there is a perfect positive correlation between X and Y, and all dots in the scatterplot fall exactly on a straight line with a positive slope. iv. The extreme negative value r = –1.0 indicates that there is a perfect negative correlation between X and Y, and all dots in the scatterplot fall on a straight line with a negative slope.
The Pearson Product-Moment Correlation Coefficient
89
d. A rule of thumb for interpreting the size of the Pearson r is based on its absolute value (sign ignored) as follows: i. .90 to 1.00 = very high correlation ii. .70 to .90 = high correlation iii. .50 to .70 = moderate correlation iv. .30 to .50 = low correlation v. .00 to .30 = very low (if any) correlation
The Pearson Product-Moment Correlation Coefficient
90
b. This is Calculation of this... i. Given the X and Y scores for any sample of size n, the covariance of X and Y (denoted sXY ) is the average cross-product of the deviation scores ( )( ) . XY 1 X X Y Y s n å - - = - (n-1) represents the degrees of freedom n represents the sample size X and Y indicate the mean of the X and Y distributions
The Pearson Product-Moment Correlation Coefficient
91
1. The covariance of two variables ( X and Y ) expresses the relationship between them. a. If there is a positive linear relationship between X and Y, sXY is positive. b. If here is a negative linear relationship between X and Y, sXY is negative. ii. The Pearson r is obtained by dividing the covariance by the product of the standard deviations of X and Y: S r XY XY Sx Sy = 1. After replacing sXY in the equation, we have the familiar formula for Pearson r: ( )( ) . ( 1) X X Y Y r XY n S S X Y å - - = - 2. The Pearson r is, in fact, the sum of cross-products of the standard scores for X and Y (zX and zY) divided by the degrees of freedom (n – 1).
The Pearson Product-Moment Correlation Coefficient
92
3. This is Testing this... a. It is important to use the Pearson r as an inferential statistic to determine whether a linear relationship between X and Y exists in the entire population to which the sample belongs. b. With ρXY denoting the correlation between X and Y in the population, the task is then to test the null hypothesis H0: ρXY = 0 versus the alternative Ha : ρXY ¹ 0. i. When ρXY = 0 (and only then), the sampling distribution of the correlation coefficient r, calculated for a sample of size n, is symmetrical and follows the t distribution with n – 2 degrees of freedom. ii. The test statistic for testing H0 : ρXY = 0 is 2 1 n t r r2 - = - The t critical value is found in a statistical table
Pearson r for Statistical Significance
93
iii. If the absolute value of the test statistic does not exceed the critical value, there is no sufficient evidence to reject H0 : ρXY = 0. iv. Statistical significance is frequently a function of sample size because, all other things being equal, larger sample sizes tend to be more representative of the population. v. With H0 : ρXY = 0 being the null hypothesis that the correlation coefficient is zero for the population, SPSS provides the actual probability of type I error (to falsely reject H0), referred to as the p value. 1. If α is the desired level of significance (e.g., α = .05) and p < α, then there is sufficient evidence to reject H0 and thus to conclude that there is a linear relationship between the two variables in the population.
Pearson r for Statistical Significance
94
4. This Using SPSS this Instructor’s Manual for Research and Evaluation in Counseling, a. The correlation coefficients (for each pair of variables) are summarized in a correlation matrix obtained through the use of SPSS. i. The correlation matrix with the SPSS printout provides the correlation coefficients, their p values, and the sample size.
Using SPSS to Compute the Pearson r
95
5. These are Factors Affecting this... ``` a. It is important to keep in mind that the Pearson r is an index of the linear relationship between two variables. i. Therefore, r = 0.00 indicates that there is no linear relationship between the two variables, but there still might be some kind of (nonlinear) relationship between them. b. In some cases, r = 0.00 because there is a curvilinear relationship between two variables (e.g., age and physical strength). c. Other times, r = 0.00 when calculated over a restricted range of variable values, although there is a linear relationship between the two variables over a larger range of their measures. d. Still yet, there may not be a linear relationship between two variables for a sample of persons (r = 0.00), but there might be a linear relationship between the variables for some subgroups of persons from the total sample. ```
Factors Affecting the Pearson r
96
i. For example, there may be a positive linear relationship between X and Y for one subgroup (e.g., females) and, conversely, a negative linear relationship between X and Y for another subgroup (e.g., males), even though there is no linear relationship (r = 0.00) for the total sample. ii. In this case, correlation coefficients by separate subgroups are more useful than r = 0.00 for the entire sample.
Factors Affecting the Pearson r
97
6. Linear Transformations and the Pearson r a. Linear transformations on variables X and/or Y do not affect the size of the Pearson r. b. Linear transformations commonly occur when a client’s raw score on a normreferenced psychological or educational test is transformed into a standard score, such as a deviation IQ score (M = 100; SD = 15), T score (M = 50; SD = 10), or zscore (M = 0; SD = 1). i. The correlation coefficient does not change when the values of X and Y are transformed into standard (z-) scores ( μ = 0; σ = 1) or other scales such as the T score ( μ = 50; σ = 10) and norm curve equivalent (NCE) scale ( μ = 50; σ = 21).
Factors Affecting the Pearson
98
7. Reliability and the Pearson r a. The reliability of the observed measures for X and Y influences the size of their correlation coefficient, r. b. The reliability of measures indicates the degree to which these measures are free of error. i. If it is assumed that the observed score for a person (X) is a sum of the person’s true score (T ) and a random error of measurement (E)—that is, X = T + E, the reliability of X, denoted ρXX, can be viewed as the proportion of the variance in X scores that is not error variance. ii. In other words, ρXX is a coefficient from 0.0 to 1.0, which indicates what proportion of the observed variance is true score variance: ρXX = 2 2 T X σ /σ .
Factors Affecting the Pearson
99
iii. A reliability coefficient of .85 indicates that 15 percent of the variance in the observed scores is due to measurement error or, equivalently, 85 percent of the observed score variance is true score variance.
Factors Affecting the Pearson
100
c. The lower the reliability for X and/or Y, the lower the Pearson r gets compared to its “true” size. i. An estimate of the Pearson r that would be found if X and Y were perfectly reliable (i.e., no measurement error) is provided with the correction for attenuation formula: T X TY XY XX YY r r ρ ρ = rXY is the Pearson r calculated with the observed values of X and Y ρXX is the reliability for X ρYY is the reliability for Y ii. The reliability of test (or questionnaire) data is usually estimated with the Cronbach’s alpha coefficient for internal consistency reliability. 1. Cronbach’s alpha is an accurate measure of reliability only when the components of the instrument (e.g., items) are tau-equivalent (i.e., they measure the same trait and their true scores have equal variances in the population of respondents). iii. Since there is always some measurement error in the observed data, the corrected for attenuation correlation is a theoretical “ceiling” toward which the observed correlation coefficient can move with improved reliability of the measures for the two variables.
Factors Affecting the Pearson
101
8. This is a. The Pearson r is a measure of the linear relationship between two variables, but it is also used to determine the degree to which the individual differences in one variable can be associated with the individual differences in another variable. b. The square of the correlation coefficient, referred to as coefficient of determination (r 2), indicates what proportion of the variance in one of the variables is associated with the variance in the other variable. i. For instance, by squaring the correlation of .45 between “emotional reactivity” and “fusion with others,” we obtain r 2 ≈ .20. This tells us that about 20 percent of people’s differences in fusion with others are associated with their differences in emotional reactivity (or vice versa). ii. Graphical illustration of the coefficient of determination (r 2) is provided with Venn diagrams, where each circle represents the variance of a variable. 1. Let the focus in interpreting r 2 be on the degree to which the variance in Y is associated with the variance in X (i.e., Y is a dependent criterion variable and X an independent predictor variable). a. The larger the overlap between two circles, the higher the proportion of the variance in Y associated with the variance in X.
Coefficient of Determination, 𝐫𝟐
102
9. Other Types of Correlation Coefficients a. The Pearson r is used to determine the relationship between two variables derived from interval or ratio scales. b. Many other coefficients have been developed to analyze the relationships between variables from various combinations of scaling methods (e.g., nominal, ordinal, interval, ratio).
Coefficient of Determination, 𝐫𝟐
103
c. There are a number of coefficients called “Pearson family coefficients” because they are computationally equivalent to r. i. Spearman rho, phi coefficient, and point-biserial correlation coefficient d. The Spearman rho ( ρ) is used when data on both variables is rank-ordered (ordinal scale). It is frequently used in behavioral and animal research when researchers want to minimize the effects of outliers (extreme scores) on the variance of a distribution. i. The Spearman rho formula is: 6 2 1 , ( 2 1) D ρ N N å = - - D = the difference between the ranks for a given individual’s scores
Coefficient of Determination, 𝐫𝟐
104
ii. When two of the same test score occur, add the two ranks together and divide by 2. iii. When assigning ranks, always make sure both sets of scores are ranked in the same direction (i.e., high to low, or low to high). If one set of scores is ranked in a different direction, the direction (e.g., positive, negative) of the correlation will be altered. iv. Outliers contribute a huge amount of variance to the Pearson r correlation computations, whereas rank ( ρ) ordered equivalents contribute less variance. 1. In comparison to the Pearson r, the computation of ρ is rather simple. e. The phi (Φ) coefficient is used when one variable is a true dichotomy (i.e., malefemale, correct-incorrect) and the other variable is either a true dichotomy or an artificial dichotomy (i.e., favorable- unfavorable, successful-unsuccessful). i. The simplest formula for computing phi is Φ = χ2/N.
Coefficient of Determination, 𝐫𝟐
105
f. The point-biserial correlation coefficient (rpb) is used when one variable is a true dichotomy and the other is continuous. It is frequently used when achievement, ability, or intelligence test items are correlated with subtest or total scores to determine how highly related various items are to the interpreted scales. g. Other coefficients are not part of the Pearson family: the biserial, tetrachoric, and eta coefficients. These are rarely seen in test manuals. h. The biserial correlation coefficient (rbi) expresses the relationship between two variables when one variable is an artificial dichotomy and the other a continuous variable. i. When attempting to relate success (i.e., successful/unsuccessful) at some given task to achievement on some continuous scale, rbi may be useful. i. The tetrachoric coefficient (rt) is used to compute relationships when both variables are scaled on an artificial dichotomy (i.e., successful/unsuccessful achievement versus favorable/unfavorable opinion). j. The eta coefficient (η) is used to assess nonlinear relationships (e.g., curvilinear distributions).
Coefficient of Determination, 𝐫𝟐
106
10. This is a. The coefficient of determination (r 2) indicates what proportion of the variance in Y is associated with the variance in X, but this does not necessarily mean that individual differences in Y are caused by individual differences in X i. An easier way to say this is “correlation does not necessarily mean causation.” 1. High (positive or negative) correlation between X and Y indicates that scores on Y can be accurately predicted from scores on X, but this does not imply that changes in X cause changes in Y.
Correlation and Causation
107
11. This is a. Correlation, Prediction, and Causation i. When there is a correlation between two variables, the linear relationship between them can be used to predict values on one of the variables (Y) from values on the other variable (X). ii. If the goal is to predict scores on Y from scores on X, then Y is referred to as the criterion variable and X is referred to as the predictor variable. 1. Which of the two variables is Y (criterion) and which one is X (predictor) depends on the research question and the methodological soundness of the prediction model. iii. The emphasis in predictive research is on practical applications, not on causal explanation and conceptual understanding of relationships between variables.
Simple Linear Regression
108
This is i. Suppose that a sample of scores on X and Y is available and the goal is to predict Y scores from X scores with future samples in which only X scores are available. ii. The regression line (i.e., line of best fit) for such a prediction is illustrated, in which the scatterplot of dots (●) reveals a high positive linear relationship between the X and Y scores (r = .729).
The Regression Line
109
iii. The straight line fitting the scatterplot is used to predict scores on Y (task involvement) from scores on X (motivation). Specifically, for any value of X, the predicted Y value (denoted Ŷ ) is located on the straight line (○). 1. The difference between the actual Y score and the predicted score (Ŷ ) is called prediction error: e = Y – Ŷ. 2. The Y values for dots (●) above the prediction line are associated with positive errors (e > 0), whereas those below the prediction line are associated with negative errors (e < 0). 3. The total error is defined with the sum of squared errors: SSE = Σe 2 = Σ(Y – Ŷ )2.
The Regression Line
110
a. Among all possible prediction lines, the one that produces the smallest total error (sum of squared errors) is the line of the best fit, or regression line. b. The prediction with a regression line is referred to as simple linear regression because only one predictor (X) is used to predict Y (or, to “regress Y on X”). iv. The general analytic form of the regression line is provided with the equation: Ŷ = bX + a, where b is the slope, and a is the intercept of the line (the intersection point of the prediction line and the vertical axis (Y )) 1. The slope and the intercept for the line of the best fit are calculated as S b r Y XY S X a Y bX, = = - rXY = correlation between X and Y sX = standard deviation of the X scores sY = standard deviation of the Y scores X = mean of the X scores Y = mean of the Y score
The Regression Line
111
2. Positive b indicates positive direction, and negative b indicates negative direction of the regression line. 3. The regression slope equals the correlation coefficient (b = r) when X and Y have equal variances and, thus, equal standard deviations (sX = sY).
The Regression Line
112
12. In this a. The positive sign of the slope indicates that when the X scores increase, the predicted scores associated with them also increase. i. Thus, a positive change in X pairs with a positive change in Ŷ. ii. If we use ΔX for a change in X (delta, Δ, means “change”) and ΔŶ for the associated change in the predicted score (Ŷ): 1. When the slope is negative, an increase in X (ΔX > 0) is associated with a decrease in the predicted Y score (ΔŶ < 0). ˆ . ΔY b= ΔX
Interpretation of the Slope
113
a. The slope in the regression equation indicates the change in the predicted Y score associated with a unit change in the predictor variable, X. b. At any point in the future when one is collecting data about a client using the predictor (X) variable, to find the predicted score (Ŷ ) given X, one need simply recall the regression equation and replace X with its given value.
Interpretation of the Slope
114
13. This is SPSS Output for this a. The output for linear regression with two or more predictors is referred to as multiple regression. b. The Model Summary table in the following figure provides: i. The coefficient of correlation between motivation (X) and task involvement (Y ), rXY = .729 ii. The squared correlation coefficient, = .532, which indicates that 53.2 percent of the individual differences on task involvement are accounted for by differences in motivation iii. The squared correlation coefficient, adjusted for the population, r 2adj. = .506 iv. The standard error of estimate, sY.X = 1.322. c. The Coefficients table provides information to test the regression coefficient for statistical significance and to obtain the regression equation Ŷ = bX + a. i. The regression coefficient (b = 0.665) is statistically significant because the p value (.000) that is associated with its statistic (t = 4.524) is less than the .05 level of significance (p < α). ii. In other words, there is sufficient evidence to reject the null hypothesis that the regression coefficient is equal to zero for the study population. iii. Notice that the higher the correlation between X and Y, the more accurate the predictions resulting from linear regression equations.
SPSS Output for Linear Regression
115
14. In this, a. Two variables (X and Y) may correlate because they are both affected by a third variable (Z). i. For example, math achievement and reading achievement are related but without causal inference. A third factor, perhaps intelligence, may be hypothesized to “cause” both math and reading achievement to vary. b. A mediating variable is one that is related to a first variable and subsequently influences the second variable. i. When the effect of Z is removed (partialled out) from X and Y, the resulting correlation is referred to as partial correlation between X and Y, controlling for Z, and denoted rXYZ. c. When the correlation between X and Y exists solely because it is affected by a common cause, Z, it is referred to as a spurious correlation. i. If the correlation between X and Y is spurious, their partial correlation is zero or very close to it. 1. For example, it is known in behavioral research that there is a very high positive correlation between shoe size and mental ability for children in ages, say, from 3 to 16 years. This, however, is a spurious correlation because it vanishes when controlling for age. d. Another situation for potential use of partial correlations is when the correlation between two variables is mediated by other variables.
Partial Correlation
116
15. In this, a. There are situations in which the interest is on the correlation between two variables, while removing the effect of other variables from only one of the two variables. b. In the context of an example involving three variables (X1 = motivation, X2 = self-reliance, and X3 = task involvement), for instance, assume the correlation between motivation (X1) and task involvement (X3) is r13 = .729. i. The coefficient of determination (r 2) = (.729)2 = .5314 shows that 53.14 percent of the variance in task involvement is accounted for by the variance in motivation. ii. But because (X1) is also correlated with self-reliance (X2) (r12 = .35), it might be interesting to know what proportion of the variance in task involvement (X3) is accounted for by motivation (X1), over and above the proportion of variance accounted for by self-reliance (X2). 1. With the X1, X2, and X3 notations, such correlation is referred to as semipartial (part) correlation between X3 and X1, partialing out X2 from X1 but not partialing out X2 from X3. The notations for this semipartial correlation is r3(1.2) 2. This semipartial (part) correlation r3(1.2) is depicted by part A: the overlap between the residualized X1 and the intact X3, after removing X2 from X1 but not from X3. Thus, the squared semipartial correlation (.3612) indicates what proportion is part A from the total (intact) circle X3.
Semi-partial Correlation
117
This Chapter introduced the concepts of linear correlations (relationships) between two variables and simple linear regression. The nature and strength of variables’ relationships determines how correlations are graphically represented. Scatterplots are frequently used to represent correlations, indicating positive or negative correlations between two variables. Positive correlations occur when both variables increase or decrease simultaneously; conversely, negative correlations occur when the variables respond inversely. Graphically, these are depicted if the dots group along a straight line and increase from the lower left to the top right (positive slope) or decrease from the top left to the bottom right (negative slope). When no correlation exists between two variables, there is no linear relationship and scatterplots have no distinguishable direction.
Chapter 17 Narrative
118
This chapter further explored numerous correlation coefficients, including Pearson r, eta, Spearman rank order, contingency, point biserial, biserial, phi, and tetrachoric. Pearson r and the other correlation coefficients were defined according to their properties and applications, influential factors, relationship to statistical significance, and mathematical formulas. Magnitude,
Chapter 17 Narrative