Tell when, where & how often things happen & looks at "why" & "how" to produce observations, notes & descriptions of behavior & motivation. (Quality of relationship, actions, situations or other phenomena) Methods: Interviews, Focus Groups, Reviews & Observations
Conducted to obtain numerical data on variables. Produces hard numbers that can be turned into statistics. There are 2 types: Non-Experimental (descriptive) - Conducted to collect data on variables [correlational & archival research, case studies & surveys] Experimental - Conducted to test hypothesis about the relationship between variables or the effects of 1 + IV on 1 + DV.
Steps for Planning & Conducting Experimental Research
1. Devel. Idea into a testable hypothesis (about the rel. btwn. variables) 2. Choosing an approp. research desing 3. Selecting a sample (ID target pop., determine how to select from pop. & select sample) 4. Conducting the study (Collect & record data for later analysis) 5. Analyzing the obtained data (analyze with approp. descriptive & inferential stats techniques) 6. Reporting the results
Any characteristic, behavior, event or other phenomenon that is capable of varying or existing in at least 2 different states, conditions or levels. Ex: Gender - Male/Female (when characteristics restricted to a single state/condition it's a constant. Ex: Gender - Male only)
Independent (experimental) Variable (IV)
Variable believed to have an effect on the dependent variable. It is manipulated in a research study to determine it's effects on the DV and must have 2 levels (2 points of comparison since possible to determine effects of an intervention when there is a comparison point) aka - "Treatment" or "Intervention" & symbolized by letter X Tip: "what is the effect of (IV) on (DV)?
Dependent Variable (DV)
Observed & measured in a study & is believed to be affected by the IV (depends on IV). can be considered the outcome of treatment; does not manipulate just observes and measures. Symbolized by the letter Y
Method of defining & measuring variables related to behavior as it occurs. The record being a detailed written description or an audio &/or video recording.
Method of defining & measuring variables related to organizing the data into categories that can be used to summarize & interpret information in a narrative record.
Used to ID cognition's underlying problem solving & decision making. Involves recording a subject's verbalization's when instructed to "Think Aloud" while solving complex cognitive problems.
Behavioral sampling that involves dividing a period of time into discrete intervals & recording whether the behavior occurs in each interval. Useful when the target behavior has no clear beginning or end.
Behavioral sampling that involves recording each occurrence of a behavior during a pre-defined/pre-selected event. Useful for behaviors that occur infrequently or leave a permanent record (test or measure).
Method of assigning subjects to treatment groups using a random method; aka "Randomization." Key component to True Experimental Research since it enables an experimenter to conclude that any observed effect on the DV was actually caused by the an IV rather than error.
Involves conducting a study to test hypotheses about the relationship between the IV's & DV's. There are 2 types: 1. True Experimental Research (Random assignment) 2. Quasi-Experimental Research
True Experimental Research
Allows for greater control over the experimental situation & "Hallmark" is random assignment to different groups (different levels of the IV). Allows the experimenter to be more certain that subjects in different groups are initially similar & that any observed difference between the groups on the DV were caused by the IV. Ex: Tx Group vs No Tx Group
Involves testing a hypotheses about the relationship between the IV & DV yet has less experimental control since the experimenter can not control the assignment of subjects to a treatment group & must use pre-existing groups or a single treatment group. Ex: kids that attend 2 schools; use kids from school 1 as experimental group & kids from school 2 as control group.
Enables the investigator to generalize their findings from the sample to the population. Research does not have access to the entire population of interest & must draw a sample from that population. So that any observed relationship between variables in the sample can be generalized to the Population, the people in the sample must be as representative of the population as possible in terms of relevant characteristics such as age, gender & severity of symptoms.
Entails selecting units or groups (clusters) of ppl from the population (schools, hospitals, clinics) & either including all individuals in the units or randomly selecting individuals from each unit (multistage). Useful when not possible to ID/obtain access to the entire population of interest.
Simple Random Sampling
Every member of the population has an equal chance of being selected for inclusion in the sample. (reduces probability of bias especially when its a large sample)
Stratified Random Sampling
When the population varies in terms of specific characteristic "strata" relevant to the research hypothesis, this will ensure that each stratum is represented in the sample by dividing the population into the appropriate strata & randomly selecting subjects from each stratum. (Ex: Gender, age, ED level, SES, racial/ethnic/cultural background)
Maximizing Variability Due to IV(s)
In conducting and experimental research study and experimenter wants a design that will maximize variability in the DV that is due to: -The IV -Control variability due to extraneous variables (systematic error) -Minimize variability due to random error
A predictable Error
Extraneous (Confounding) Variables
Are a source of systematic error. It is a variable that is irrelevant to the purpose of the research study but confounds it's results because it has a systematic effect on (correlates with ) the DV.
Variability in the DV that is due to the IV is maximized when groups are made as different as possible with respect to that variable.
Error that is unpredictable (random). Variability due to random error is minimized by ensuring that random fluctuations in subjects, conditions & measuring instruments are eliminated or equalized among all treatment groups. (Sampling error)
Randomization (Random Assignment of Subjects to Tx Groups)
Random assignment of subjects to different levels of the IV is considered the most powerful method of control because it helps ensure that groups are initially equivalent with regard to all known & unknown extraneous variables.
Holding the Extraneous Variable Constant
Eliminate the effects of an extraneous variable by selecting subjects who are homogeneous with respect to that variable (Shortcoming is that it limits the generalizability of the research results)
Matching Subjects on the Extraneous Variable
Useful for controlling extraneous variables when the number of subjects is too small to guarantee that random assignment will equalize the group in terms of an extraneous variable.
Blocking (Building the Extraneous Variable into the Study)
Subjects are not individually matched but are blocked (Grouped) in terms of their status on the extraneous variable & subjects with in each block are randomly assigned to one of the treatment groups.
Statistical Control of the Extraneous Variable
When an investigator has information on each subject's status (score) on an extraneous variable. The ANCOVA (analysis of covariance) or other statistical technique can be used to statistically remove the effects of an extraneous variable.
The degree to which a research study can allow the experimenter to conclude that observed variations in the DV were caused by variations in the IV rather than by other factors.
Generic Extraneous Variables that can be a Threat to Internal Validity (Campbell & Stanley)
1. Maturation 2. History 3. Testing 4. Instrumentation 5. Statistical Regression 6. Selection 7. Attrition (Mortality) 8. Interactions with Selection
Refers to changes that occur with in subjects when a physical or psychological process or event occurred during the course of the study as the result of the passage of time (e.g. increasing fatigue, decreasing motivation) & that have a systematic effect on the subjects status on the DV. (Ex: Reflects Changes that occur within subjects as the result of the passage of time) (Threat to Internal Validity)
Refers to an external event that is irrelevant to the research hypothesis but that affects subjects performance during the course of the study & affects subjects status on the DV. (Ex: Comes from "out there" & occurs at around the same time the IV is administered.) (Threat to Internal Validity)
When a subjects exposure to a test may change the performance on a subsequent test. (Ex: when a pre-test affects subjects scores on the post-test) (Threat to Internal Validity)
Changes in accuracy/sensitivity of measuring devices or procedures during the course of the study can confound results. (Ex: Improved rater accuracy over course of study) (Threat to Internal Validity)
The tendency for very high & low scores to "regress" (move) toward the mean on retesting of the same group. (Ex: for examinees who obtained extremely high or extremely low scores on a measure to obtain scores closer to the mean when retested.) (Threat to Internal Validity)
A problem when subjects in different treatment groups are not similar in terms of important characteristics at the onset of the study & therefore would differ at the end of the study even if no treatment had been applied. A threat when participants are not randomly assigned to groups. (Ex: Is an assignment problem) (Threat to Internal Validity)
When subjects who drop out of the study differ in some important way from subjects who remain in the study for the duration . (Threat to Internal Validity)
Interactions with Selection
Selection can interact with history & threaten a study's internal validity if one group of subjects is exposed to an external condition that does not affect subjects in other groups. (Threat to Internal Validity)
The degree to which a study's results can be generalized to other people, settings, conditions, etc.(external validity is always limited by its internal validity)
4 Threats to External Validity (Campbell & Stanley)
1. Interaction Between Testing & Treatment 2. Interaction Between Selection & Treatment 3. Reactivity (Reactive Arrangements) 4. Multiple Treatment Interference (Order Effect, Carryover Effects)
Interaction Between Testing & Treatment
Administration of a pre-test can "sensitize" subjects to the purpose of the research study & alter their reaction to the IV; when contaminated like this the results can not be generalized to people who have not been pre-tested. (Threat to External Validity)
Interaction Between Selection & Treatment
Occurs when people in the sample differ from people in the population in terms of some characteristics that makes them respond differently to the IV. (Threat to External Validity)
Reactivity (Reactive Arrangements)
Occurs when research participants act differently because they know their behavior is being observed. (Threat to External & Internal Validity)
Cues in the experimental situation that inform subjects of how they are expected to behave during the course of the study. (Threat to External & Internal Validity)
Multiple Treatment Interference (Order Effect, Carryover Effects)
Occurs when more than one level of the IV is administered to each subject. (Threat to External Validity)
A research design used to control carryover (order) effects. Involves administering the different levels of the IV to different subjects or groups of subjects in a different order. (the Latin square design is a type of counterbalance design)
Between-Groups (Between-Subjects) Design
The effects of an IV are assessed by administering each level of the IV to a different group of subjects & comparing the performance or status of the groups on the DV.
A study that includes two or more "factors" (IV's) an advantage is that it provides more thorough information about the relationships among variables by allowing an investigator to analyze the main effects for each IV & the interaction between the IV.
The effect of 1 IV on the DV, disregarding the effects of all other IV's.
The effect of 2 or more IV's considered together & occurs when the impact of an IV differs at different levels of another IV. (the main effects must be interpreted in light of the interaction)
Within-Subjects Design (Repeated Measures)
The effects of an IV are analyzed by comparing the performance or status of the same group of subjects on the DV after receiving each level of the IV (or combo of IV's) at different times. A comparison is made within subjects rather than between groups.
Single-Group Time-Series Design (Within-Subjects Design)
The DV is measured several times before & after the IV is applied. Subjects act as their own no-tx controls. Internal Validity can be threatened by History (external event occurring at same time as IV)
A disadvantage of the time-series & other within-subjects designs is that the analysis of the data can be confounded by autocorrelation, which occurs when a subjects performance on the post-test is likely to correlate with their performance on the pre-test. This can inflate the value of the inferential statistics (e.g., the t or F) resulting in an increased probability of a Type I error.
Combines Between-Groups & Within-Subjects designs so that comparisons can be made.
This type of design is common & involves measuring the DV over time or across trials.
They all include at least 2 phases: 1. At least 1 Baseline Phase(A) - No Tx 2. 1 Treatment (B) Phase They involve measuring the DV at regular intervals during each phase of study. (This controls for Maturational Effects; threats to Internal Validity)
Includes: 1. A single Baseline (A) Phase (No Tx) 2. A single Treatment (B) Phase.
Reversal (Withdrawal) Designs (ABA, ABAB, etc.)
Extended the AB design by including a minimum of: 1. 2 Baseline (A) Phases (No Tx) 2. one Treatment (B) Phase. The Tx is withdrawn ("reversed") during the 2nd & subsequent baseline phases. If the subject's performance on the DV follows the predicted pattern (i.e., if it changes in the expected direction after the Tx is applied & withdrawn) a researcher can conclude that changes in the DV are due to the IV rather than to Hx.
Multiple Baseline Design
When it is not ethical to withdraw and effective Tx, and investigator can use this design , which involves sequentially applying the IV (Tx) to 2 or more "baselines" (i.e., either to 2 or more behaviors, settings, or subjects). An Advantage is that, once the IV is applied to a baseline it is not withdrawn during the study.
Used to describe & summarize the data collected on a variable or the relationship between variables.
Used to determine if obtained sample values can be generalized to the population from which the sample was drawn.
A method for measuring variables and can theoretically take on an infinite number of values on the measurement scale. (Ex: Time)
A method for measuring variables and can assume only a finite number of values. (Ex: DSM-IV Dx)
Scales of Measurement
A method of categorizing the various ways to measure variables. There are 4 scales of measurement that differ in terms of mathematical "sophistication." From least to most sophisticated they are (NOIR): Nominal - ("Frequency Data"; the frequency of observations in each nominal category.) Ordinal Interval Ratio (Ordinal, Interval & Ratio scales yield scale values or scores)
When characteristics are measured with this scale there is no inherent order to the scale categories & can not say that one person has more or less of the characteristic being measured than another person. A variable is measured on this type of scale when it is divided into categories & the frequency (number) of indiv. in each category will be compared. Least mathematically complex.(Ex. of Nominal Variables are gender, religion, political affiliation, place of birth, eye color & DSM Dx)
It not only divides observations into categories (Rank Order; 1st, 2nd, 3rd) but also provides information on the order of those categories. Limitation is that they do not determine how much difference there is between scores. The zero point is not an absolute zero. 3rd most mathematically complex.(Ex: Ranks & Likert-scale scores)
This scale has the property of order & equal intervals between successive points on the measuring scale. Mathematical operations of addition & subtraction can be performed & conclude that a score of 100 is midway between the scores of 90 & 110. The zero point is not an absolute zero; 2nd most mathematically complex.(Ex: Temperature on Celsius & Fahrenheit, scores on standardized Ed & Psych tests)
This scale has the properties of order & equal intervals & an absolute zero point. The absolute zero makes it possible to add & subtract scores but also to multiply & divide them & to conclude that a person who receives a score of 150 has 3 times as much of the characteristic being measure as a person who receives a score of 50. Most Mathematically Complex. (Ex: Temperature on Kelvin Scale, # of calories consumed, # of correct items on a test & reaction time in seconds)
A set of data that represents and ordinal, interval or ration scale that can be described in terms of its shape. The variables are recorded on the horizontal axis (abscissa), while the frequencies are coded on the vertical axis (ordinate).
Normal Curve (Normal Distribution)
A symmetrical & bell shaped distribution defined by a specific mathematical formula. When scores on a variable are normally distributed, it is possible to conclude that a specific number of observations fall within certain areas of that distribution that are defined by the standard deviation: In a normal distribution about 68% of observations fall between the scores that are plus & minus 1 standard deviation from the mean, 95% between the scores that are plus & minus 2 standard deviations from the mean, & 99% between the scores that are plus & minus 3 standard deviations from the mean. A "peaked" distribution = Leptokurtic A flatter distribution = Platykurtic
Asymmetrical distributions in which the majority of scores are located on one side of the distribution. Over 50% of the scores fall on one side of the distribution.
Positively Skewed Distributions
Most scores are concentrated in the low (Negative) side but a few scores are in the high (Positive) side (tail) of the distribution. The Mean (average) is greater than the Median (middle).
Negatively Skewed Distributions
Most scores are concentrated in the high (Positive) side but a few scores are in the low (Negative) side (tail) of the distribution. The Mean (average) is less than the Median (middle), which is less than the Mode (most common #).
Measures of Central Tendency
The descriptive statistics techniques that address these goals; and the mode, median, & the mean are most commonly-used measures of central tendency that summarize a distribution of data by providing a "typical" score. Nominal = Mode Ordinal = Mode & Median Interval = Mode, Median, or Mean Ratio = Mode, Median, or Mean
(Mo) The most frequently occurring score or category in a distribution or set of data. A distribution with 2 modes is called bimodal.
(Md) The middle score in an ordered set of data from low to high.
Mean (Arithmetic Mean)
The average; can be calculated only when the variable has been measured using a(n) Interval or ratio scale. (M)
Simplest measure of variability, which is calculated by subtracting the lowest score in the distribution from the highest score.
Variance (Mean Squared)
More thorough measure of variability since its calculation includes all the scores in the distribution. The variance provides a measure of the average amount of variability in a distribution by indicating the degree to which the scores are dispersed around the distribution's mean. (V)
A measure of dispersion (variability) of scores around the mean of the distribution. Calculated by dividing the sum of the squared deviation scores by N (or N-1) & taking the square root of the result. The standard deviation is equal to the square root of the variance. The larger the SD the greater the dispersion of scores around the distribution mean. (SD)
Areas Under the Normal Curve
Interpret in terms of normal distribution & whenever the shape of the distribution of scores approaches normal certain conclusions can be drawn about the number of cases that fall within limits that are defined by the standard deviation. When a distribution is normal, 68.26% of the scores fall between the scores that are + & - 1 standard deviation from the mean; 95.44% of scores fall between the scores that are + & - 2 standard deviations from the mean; & 99.72% of the scores fall between the scores that are + & - 3 standard deviations from the mean.
Effects of Mathematical Operations on Measures of Central Tendency & Variability
When a constant is added or subtracted to each score in a distribution, the measures of central tendency change but the measure of variability stay the same. In contrast, when each score is multiplied or divided by a constant, the measure of central tendency & variability all change.
Population Parameters & Sample Statistics
An investigator uses a sample statistic to estimate a population parameter. (Know all the symbols for both for exam)
Sampling Distribution of the mean
The distribution of sample means that would be obtained if an infinite number of equal-size samples were randomly selected from the population & the mean for each sample calculated. (An inferential statistical test enables an investigator to determine the probability of obtaining a sample with a particular value by comparing the obtained sample value to and appropriate sampling distribution.)
Central Limit Theorem
A sampling distribution is not actually constructed by obtaining a large number of samples; a theoretical sampling distribution is derived from probability theory that predicts what the sampling distribution of the mean would look like. 3 Predictions: 1. The sampling distribution of the mean is normally shaped (will approach a normal shape as sample size increases, regardless of the shape of the pop. dist. of scores) 2. It's mean equals the Population mean 3. It's standard deviation, the "Standard Error of the Mean," is equal to the population standard deviation divided by the square root if the sample size (N).
Standard Error of the Means
Used in inferential statistics to determine how likely it is to obtain a particular sample mean given the population mean, the population standard deviation, the sample size & level of significance.
The sampling distribution is normally-shaped, it's mean is equal to the population mean, & its standard deviation (The standard error of means) is equal to the population standard deviation divided by the square root of the sample size.
This formula indicates that, as the population standard deviation increases and/or the sample size decreases, the standard error increases in magnitude. Sample size increases, the standard error decreases
(Memorize this formula; SEM=Sigma/√N))
Logic of Hypothesis Testing
Testing research hypothesis about the effects of an IV on a DV involves the following steps: 1. Translate the verbal research hypothesis about the relationship btwn IV & DV into 2 competing statistical hypotheses: Null & Alternative Hypothesis 2. Conduct the study & analyze the obtained data w/an inferential stat. test. 3. Decided based on the results of stats. test to retain or reject the stats. hypotheses.
Null Hypothesis (Ho)
Is stated in a way that implies that the IV does NOT have an effect on the DV & that observed effect is the result of sampling error. always the Null that is 1st rejected or retained.
Alternative Hypothesis (H1)
States the opposite of the Null Hypothesis & is expressed in a way that implies that the IV DOES have an effect (affect the DV.). Most closely resembles the verbal research hypothesis & always 2nd decision to reject or retain. Can have 2 forms: 1. Directional Alternative Hypothesis - predicts whether the population values will be greater or less than the population value specified in the null hypothesis (one-tailed test). 2. Non-directional Alternative Hypothesis - states that the population value is not equal to the value stated in the null hypothesis (two-tailed test - alpha divided by 2 and on each side of the distribution).
Analyzing the Data & Making a Decision
The results of an inferential statistical test (t-test or ANOVA) indicate whether the obtained sample value falls within the region of likely values or unlikely values in the sampling distribution.
Rejection Region (region of unlikely values)
When the results indicate that the obtained sample value falls in the region of unlikely values (rejection region), the null hypothesis is rejected & the alternative hypothesis is retained. This region of a sampling distribution contains those sample values (Means) that are unlikely to be obtained as the result of sampling error. The size of the region is defined by Alpha (α).
Retention Region (region of likely values)
When the results indicate that the sample value falls in the region of likely values (retention region), the null hypothesis is retained & the alternative hypothesis is rejected. This region of a sampling distribution contains those sample values (Means) that are likely to be obtained as the result of sampling error. The size of the region is is equal to 1-α.
Alpha (Level of Significance)
Alpha (α) is commonly set at .05 or .01 set by the experimenter prior to collecting or analyzing data. When alpha is .01, this means that 1% of the sampling distribution is the region of unlikely values, while the remaining 99% is in the region of likely values. If alpha is .05 then 5% of sampling dist. represents the rejection region & the remaining 95% represents the retention region. The rejection region always lies in one or both tails of the sampling dist; in the portion of the sampling dist. that contains the values LEAST likely to occur as a result of sampling error only.
When the results of a study are statistically significant, this means that the sample stat. is in the rejection region of the sampling distribution & that the investigator has rejected the null hypothesis.
When a researcher makes the decision to retain or reject the null hypothesis, there is no way to know with certainty if the decision is correct or in error. There are 2 types of decision errors: Type I Type II
Type I Error
Made when a true null hypothesis is rejected. This occurs when a researcher concludes that an IV has had an effect on the DV, but the observed effect was actually due to sampling error. The probability of making a Type I error is equal to alpha (level of significance). EX: When alpha is set at .05 & the researcher has rejected the null hypothesis, there is a 5% chance that a type I error has been made.
Type II Error
Made when a false null hypothesis is retained. This occurs when the researcher decides that an IV has no effect on the DV when it actually does. A Type II error might occur when the IV was not administered in sufficient intensity or for long enough period of time when the sample size was too small, or when alpha is too small. The probability of making a Type II error is equal to Beta (β) - which is usually unknown; when value of α low, sample size small, & IV not admin. in sufficient intensity.
Power refers to the probability of rejecting a false null hypothesis. Power can not be directly controlled but can be increased by including a large sample, maximizing the effects of the IV, increasing the size of alpha, reducing error, when one-tailed test is used & when the data are analyzed using a t-test, ANOVA, or other parametric statistical test.
One kind of correct decision is to retain a true null hypothesis; the researcher correctly concludes that any observed effect on an IV is actually due to sampling error. Other correct decision is to reject a false null hypothesis; the researcher correctly decides that the IV had an effect on the DV. When a stats. test enables a researcher to make this kind of correct decision, the test is said to have power.
Inferential statistical tests that are used when the data to be analyzed represent an interval or ratio scale and when certain assumptions about the population distribution(s) have been met: i.e., randomly selected, when scores on the variable of interest are normally distributed and when there is homoscedasticity (population variances are equal). An advantage of the parametric tests is that they are more "powerful" than the nonparametric tests.
Inferential statistical tests used when the data to be analyzed represent either an ordinal or nominal scale or when the assumptions for a parametric test have not been met. Assumption that the sample has been randomly selected from the population & that observations are independent. A major limitation is that they are less powerful & therefore makes it more difficult to reject a false null hypothesis. Include the chi-square tests, the Mann-Whitney U, and the Wilcoxon matched-pairs test.
Inferential (non-parametric) stats. tests used to analyze the frequency of observations in each category (level) of a nominal variable (scale).
Single-Sample Chi-Square Test (single-variable)
Approp. for descriptive studies that include a single nominal variable.
Inferential (non-parametric) stats. tests used when data to be analyzed represent a nominal (frequency data) scale. Used when the study includes one variable & the data to be analyzed are the number of observations in each category of that variable. aka "goodness-of-fit test" Stats: X^2 df: (C-1), where C = # of "columns" (levels of the variable) Ex: Analyze data obtained in a study designed to compare the # of private colleges located in the Northeast, Midwest, South & West in the US; df for this study (4-1)=3
Multiple-Sample Chi-Square Test (multiple-variable)
Approp. for studies that include 2+ variables & the data to be analyzed represents a nominal scale.
Inferential (non-parametric) stats. tests used when data to be analyzed represent a nominal (frequency data) scale. Used when it includes two or more variables & the data to be analyzed are the number of observations in each category. Stats: X^2 df: (C-1)(r-1), where C = # of "columns" (levels of the variable) & R = # of "rows" Ex: Analyze data obtained in a study to determine if 4 major cities in the US differ in terms of the frequency of 5 different crimes (homicide, assault, rape, robbery, larceny); df for this study are equal to (4-1)(5-1);(3)(4) = 12
Inferential Parametric stats. test used to evaluate hypotheses about the differences between (compare) two means on an interval/ratio scale. A study including more that 2 means (More than 2 levels of the IV), would require conducting multiple t-tests; which would increase likelihood of making a Type I error. 3 forms: 1. Single Sample T-Test 2. Independent Samples T-Test 3. Correlated Samples T-Test
Single Sample T-Test
Single-sample t-test is used to compare a single obtained sample mean to a known or hypothesized population mean. (sample vs. population) Parametric Test; Interval/Ratio Data. Use: 1 IV; Single group 1 DV; Interval or Ratio data Stats: t df: (N-1), where N = # of subjects
Independent (unrelated) Samples T-Test
T-test for independent samples is used to compare means from two independent samples (groups) & the means of the 2 groups will be compared. Parametric Test; Interval/Ratio Data. Use: 1 IV; 2 independent groups 1 DV; Interval or Ratio data Stats: t df: (N-2), where N = total # of subjects
Correlated (related) Samples T-Test
The t-test for correlated samples is used to compare 2 sample means when subjects in the 2 groups are related in some way (e.g., because they were matched on an extraneous variable or because a single-group pretest/post-test design was used). Parametric Test; Interval/Ratio Data. Use: 1 IV; 2 correlated groups 1 DV; Interval or Ratio data Stats: t df: (N-1), where N = # of pairs of scores
Approp. for studies assessing the impact of a single IV on a single DV that is measured on an interval or ratio scale.
A parametric statistical test used to compare the means of 2 or more groups when a study includes 1 IV & 1 DV that is measured on an interval or ratio scale. It is preferable to multiple t-tests when a study involves more than three groups because it helps control the experimentwise error rate. The one-way ANOVA yields an F-ratio that indicates if any group means are signiﬁcantly different. One-way ANOVA divides the total sum of squares (SST) into a "between groups sum of squares" (SSB) & a "within group sum of squares" (SSW): SST=SSB+SSW The sums of squares are converted to mean squares (Variances) by dividing each sum of squares by the approp. degree of freedom (df are used not only to id the critical value but to calculate the F-ratio) MST=SST/df MSB=SSB/df MSW=SSW/df Use: 1 IV; 2+ independent groups 1 DV; Interval or Ratio data Stats: F df: (C-1), (N-C), where C= # of levels of IV & N = # of subjects
The F-ratio represents a measure of treatment effects plus error divided by a measure of error only MSW (mean square within; measure of variability that reflects error) & MSB (mean square between; measure of variability that reflects Tx effects & error) are used to calculate the F-ratio: F=MSB/MSW=(Tx+error)/error When the treatment has had an effect, the F-ratio is larger than 1.0.
Approp. for studies assessing the impact of 2+ IV's on a single DV that is measured on an interval or ratio scale.
The type of ANOVA used when a study includes two or more IV's (i.e., when the study has used a factorial design). Also referred to as a two-way ANOVA, three-way ANOVA, etc., with the words "two" and "three" referring to the number of IV's. Variability between groups is "partitioned" even further so the F-ratios are obtained for the main effect of each IV & for the interactions. Parametric Test; Interval/Ratio Data. Use: 2+ IVs; independent groups 1 DV; Interval or Ratio data Stats: F
Randomized Block Factorial ANOVA:
A version of the ANOVA that is appropriate when blocking has been used as a method for controlling an extraneous variable. Allows an investigator to statistically analyze the main and interaction effects of the extraneous variable (Which is being treated as an additional IV). Parametric Test; Interval/Ratio Data.
ANCOVA (Analysis of Covariance)
A version of the ANOVA used to increase the efﬁciency of the analysis by statistically removing variability in the DV that is due to an extraneous variable. When using the ANCOVA, each person's score on the DV is adjusted on the basis of his or her score on the extraneous variable. Parametric Test; Interval/Ratio Data.
A type of analysis of variance used to assess linear and nonlinear trends when the independent variable is quantitative. Parametric Test; Interval/Ratio Data.
MANOVA (Multivariate Analysis of Variance)
A form of the ANOVA used when a study includes one or more IVs and two or more DVS, each of which is measured on an interval or ratio scale. Use of the MANOVA helps reduce the experimentwise error rate and increases power by analyzing the effects of the IV(s) on all DVs simultaneously.
Parametric & Non-parametric tests both yield a test statistic that the researcher compares to the critical value, which is the cutoff point that divides the sampling into the regions of likely & unlikely values. The critical value for a particular research study is determined by 2 factors: 1. Alpha (determines what proportion of the sampling distribution represents the rejection region) 2. Degrees of Freedom (df)-(determines the distributions exact shape)
Mann-Whitney U Test
Non-parametric Alt. to t-test for indep. samples; Ordinal Data.
Used when a study includes 2 independent groups to compare & the data on the DV are reported in terms of ranks.
Use: 1 IV; 2 independent groups 1 DV; Rank-ordered data
Wilcoxon Matched-Pairs Test
Non-parametric Alt. to t-test for correlated samples; Ordinal Data.
Used when a study includes 2 correlated (related/matched) groups to compare & the difference between the DV scores of subjects who have been matched in pairs are converted to ranks.
Use: 1 IV; 2 correlated groups 1 DV; Rank-ordered data
Non-parametric Alt. to one-way ANOVA; Ordinal Data. Used when a study includes 2+ independent groups & the data to be analyzed are ranks.
Use: 1 IV; 2+ independent groups 1 DV; Rank-ordered data
The non-parametric alternative to the one-way ANOVA & can be used to compare 2+ indep. grps & useful when 1+ of the assumptions for the one-way ANOVA has been violated.
Degrees of Freedom
Degrees of Freedom (df) - (determines the distributions exact shape) To calculate for: t-test it is based on sample size & degrees of freedom are derived from the total number of subjects (N-1). Chi-square test a diff. kind of sampling distribution is used based on the number of categories (levels) of the variable, & the degrees of freedom are derived from the total number of categories (C-1).
The analysis of variance is used to compare 2 or more means. An advantage is that is simultaneously makes all comparisons of group means while holding the probability of making a Type I error at the level of significance set by the investigator (helps control the experimenterwise error rate). There are several versions:
1. One-way ANOVA
2. Factorial ANOVA
3. Other ANOVA's when study needs a more complex research design.
- Split-plot ANOVA/Mixed ANOVA - used when the researcher has used a "mixed" design.
When the results of a study indicate that a Tx's effects are statistically significant, the researcher may want to evaluate the practical or clinical significance of the results; by calculating the effect size. Cohen's d & eta square (η2) can both be used as a measure of effect size. Also used in meta-analysis since it allows the results of diff. studies to be converted to a common metric so the results can be compared & an average effect across studies can be determined.
A measure of the difference between 2 groups (an experimental & control group). IT is calculated by subtracting the mean of 1 group from the mean of the other group & dividing the result by a pooled SD for the 2 groups.The result indicates the difference between the groups in terms of SD units. Cohen (1988) provided the following criteria for evaluating the size of d: Small effect size = 0.2 Medium effect size = 0.5 Large effect size = 0.8
r square (r^2) & eta square (η2)
These indices indicate the percent of variance in the outcome variable that is accounted for by variance in treatment.
The correlation coefﬁcient is a numerical index of the relationship (degree of association) between two or more variables. The magnitude of the coefficient indicates the strength of the relationship; its sign indicates the direction (positive or negative). Value ranges from -1.0 to +1.0 & the closer the coefficient is to these values the stronger the relationship.
(Pearson product moment correlation coefficient) The Pearson r is used when data on both variables represent a continuous scale & commonly with interval & ratio data. Value ranges from -1.0 to +1.0 & the closer the coefficient is to these values the stronger the relationship. Use of Pearsons r is based on 3 assumptions: 1. There must be a linear relationship between the variables. 2. There must be an unrestricted range of scores on both variables 3. There must be homoscedasticity, or the same range of Y scores at every value of X.
The point biserial when one variable is a true dichotomy and the other is continuous.
Used to determine the relationship btwn a variable that is artiﬁcial dichotomized & one that is continuous
Used to determine the relationship btwn 2 continuous variables when the relationship is known to be non-linear.
Represents the degree of association between 2 different variables. A correlation coefﬁcient can be squared to obtain a measure of shared variability (Coefficient of Determination). For example, if the correlation between X and Y is .50, this means that 25% of variability in Y is shared with (or is accounted for by) variability in X.
A statistical technique used to predict a score on a criterion based on the person's obtained score on a predictor. Involves the identification of a regression line (“line of best fit") and the use of the equation for that line, the regression equation. Makes it possible to use a predictor (X) score to predict or estimate a criterion (Y) score. An assumption is that the relationship between X & Y can be described by a straight line. The position of the regression line in a scattergram is id using the least squares criterion, which locates the regression line so that error in prediction is minimized.
Cleary's regression model aka model of test bias; if a test has the same regression line for members of both grps, the test is not biased even if it has different means for the grps.
The multivariate technique used for predicting a score on a continuous criterion based on performance on 2 or more continuous and/or discrete predictors. The analysis yields a multiple correlation coefficent (R) & a multiple regression equation & is used when 2 or more continuous or discrete predictors will be used to predict status on a single continuous criterion. Predictors included in a multiple regression equation will have low correlations with each other and high correlations with the criterion. (High correlations between predictors is referred to a “multicollinearity.") The output of multiple regression is the multiple correlation coefficient (R), which indicates the degree of association between the criterion & a linear combination of predictors and a multiple regression equation.
Refers to validating a correlation coefficient (e.g., a criterion-related validity coefficient) on a new sample. Because the same chance factors operating in the original sample are not operating in the subsequent sample, the correlation coefficient tends to "shrink" on cross-validation.
In terms of the multiple correlation coefficient (R), shrinkage is greatest when the original sample is small and the number of predictors is large. When a multiple regression equation is cross-validated, the multiple correlation coefficient tends to shrink.
Discriminate Function Analysis
The multivariate technique used when there are two or more continuous predictors and one discrete (nominal) criterion. Referred to as multiple discriminant function analysis when the criterion has more than two categories. Linear relationship between variables. Non-linear relationship is a logistic regression.
A causal modeling technique used to verify a pre-defined causal model or theory. Involves translating the theory into a path diagram, collecting data on the variables of interest (the observed variables), and calculating and interpreting path coefficients. (Multivariate Technique)
A causal (structural equation) modeling technique used to verify a predefined causal model or theory. More complex than path analysis; it allows two-way [non-recursive) paths and takes into account observed variables, the latent traits they are believed to measure, and the effects of measurement error.(Multivariate Technique)
Are used to determine the degree of association between 2 or more variables & to make predictions about status on 1 or more criteria based on status on one or more predictors.
A picture of the relationship between 2 variables. The X (predictor) variable is placed on the horizontal axis, while the Y (criterion) variable is located on the vertical axis. Each data point in a scattergram corresponds to the 2 scores obtained by a single person. Data point far apart the variable have a weak relationship vice versa.
Degree of Association
Interpretation of a correlation coefﬁcient can be interpreted directly by the closer the coefficient is to either -1.0 or +1.0, the stronger the association between variables; the closer it is to 0, the weaker the association.
Correlation coefficients can be tested to determine if they are statistically significant by comparing the obtained coefficient to the appropriate critical value. The magnitude to the critical value is determined by the level of significance (alpha) and the sample size. The smaller the sample, the larger the correlation coefficient must be to be statistically significant.
A type of multiple regression involves adding or subtracting predictors one at a time, with the decision to add or subtract one predictor at a time & calculating the multiple correlation coefficient to determine the effects of having more or less predictors (being based on the size of R-squared).
The results indiate the fewest number of predictors needed to obtain maximally accurate predictions
An extension of multiple regression that is used when 2 or more continuous predictors are to be used to predict status on 2 or more continuous criteria.
In contrast to other forms of sampling (which involve selecting individuals from the population), cluster sampling entails selecting units or groups (clusters) of individuals from the population (e.g., schools. hospitals, clinics.)
Experimentwise Error Rate
Refers to the probability of making a Type I error. As the number of statistical comparisons in a study increases, the experimentwise error rate also increases.
If a teacher adds 10 points to each score in a distribution of scores, this will?
Affect the distributions mean (average) so the new mean is the orig. mean plus the constant (10),
Adding a constant does not affect the variability of scores & does not change the standard deviation (SD), variance (ave. of the squared diff from the mean), or range .
Used to determine the relationship btwn 2 normally distributed continuous variables that have been artificially dichotomized.
Ex: To assess the degree of assoication btwn Tx outcomes & Sx severity, when both variables were originally measured on a continuous sclae but were then dichotomized so that the outcome is categorized as either successsful or unsuccesful & Sx severity is categorized as mild or severe.
Interrupted Time Series Design
(Cook & Campbell) The researcher is measuring the effects of Tx over time for a single-grp of participants.
It is a single-grp desing in whihc the DV is measured at regular intervals before & after the Tx is applied.
Influences the nature of the relationship btwn an IV & DV.
Ex: Stress inoculation is most effective for ppl w/ mild to moderate anxiety, while pharmacotherapy & stress inoculation is most effective for ppl w/severe anxiety.
In this situation level of anxiety affects (moderates) the relationship btwn type of therapy & therapy outcome
Intervening variable is responsible for the relationship btwn an IV & DV.
Ex: Stress inoculation is most effective for ppl w/ mild to moderate anxiety, while pharmacotherapy & stress inoculation is most effective for ppl w/severe anxiety.
Level of anxiety affects the relationship btwn therapy & anxiety but is does not cause the relationship.
Reduces or conceals the relationship btwn 2 variables.
In a normal distribution of scores that has a mean of 120 and a standard deviation of 12, a raw score of 138 is equivalent to a z score of:
C. 2.25 Correct - To identify the correct answer to this question, you need to know that z scores indicate how far a raw score is below or above the mean. In this distribution, the mean is 120 and the standard deviation is 12. Therefore, a raw score of 138 is 1.5 standard deviations above the mean and is equivalent to a z score of 1.5.