Statistics Flashcards

1
Q

Descriptive Statistics

A

Organizes, summarizes, and communicates a group of numerical observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential Statistics

A

Uses a sample data to make general estimates about the larger population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

Set of observations drawn from the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population

A

includes all possible observations about which we’d like to know something

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variable

A

any observation of physical, attitudinal, or behavioural characteristic that can take on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Discrete observation

A

Can take only specific values; on other values can exist between the numbers; times one woke up early in a week

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Continuous observation

A

can take on a full range of values (numbers out to several decimal places); infinite number of potential values exist; A person might complete a task in 12.839 seconds, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal Variable

A

variable used in observations that have categories, or names, as their values; 1 for female and 2 for male

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ordinal Variable

A

A variable used for observations that have rankings as their values; team sports, which team placed first, second, third

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interval Variables

A

used for observations that have numbers are their values; distance (or interval) between pairs of consecutive numbers assumed to be equal; temperature because the interval from one degree to the next is always the same; cannot be anything but whole numbers, can be personality and attitude measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Ratio Variables

A

Variables that meet the criteria for interval variables but also have meaningful zero points; reaction time; time has a meaningful zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Scale Variable

A

Variable that meets the criteria for an interval variable or a ratio variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Level

A

Discrete value or condition that a variable can take on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Independent Variable

A

has at least two levels that we either manipulate or observe to determine its effects on the dependent variable; does gender predict one’s attitude about politics; gender with two levels, male and female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dependent Variable

A

Outcome variable that we hypothesise to be related to, or caused by, changes in the independent variable;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confounding Variable

A

Any variable that systematically varies with the independent variable to that we cannot logically determine which variable is at work; also called a confound; start using a diet drug AND exercise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reliable measure

A

One that is consistent, your weight now will be the same as your weight an hour from now, your scale is reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Valid measure

A

One that measures what it was intended to measure; your scale may match your weight when you measure it at the doctor’s office

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Hypothesis Testing

A

process of drawing conclusions about whether a particular relation between variables is supported by evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Operational Definition

A

Specifies the operations or procedures used to measure or manipulate a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Correlation

A

An association between two or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Random assignment

A

Every participant in the study has an equal chance of being assigned to any of the groups or experimental conditions in a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Experiment

A

A study in which participants are randomly assigned to a condition or level of one or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Between-Groups Research Design

A

Participants experience one, and only one, level of the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Within-Groups Research Design
The different levels of the independent variable are experienced by all participants in the study, also called a Repeated measures design
26
Outlier
an extreme score that is either very high or very low in comparison with the rest of the scores in the sample
27
Outlier Analysis
Studies that examine observations that do not fit the overall pattern of the data, in an effort to understand the factors that influence the dependent variable
28
Raw Score
Data point that has not yet been transformed or analyzed
29
Frequency Distribution
Describes the pattern of a set of numbers by displaying a count or proportion for each possible value of a variable
30
Frequency Table
Visual description of data that shows how often each value occurred, that is, how many scores were at each value
31
Grouped Frequency Table
Visual depiction of data that reports the frequencies within a given interval rather than the frequencies for a specific value
32
Normal Distribution
A very specific frequency that is bell-shaped, symmetric, unimodal curve
33
Skewed distribution
Distributions in which one of the tails of the distribution is pulled away from the centre; lopsided, off-venter, or nonsymmetric
34
Positively skewed data
The distribution's tail extends to the right, in a positive direction
35
Floor effect
Situation in which a constraint prevents a variable from taking values below a certain point
36
Negatively skewed data
Have a distribution with a tail that extends to the left, in a negative direction
37
Ceiling effect
situation in which a constraint prevents a variable from taking on values above a given number
38
Hint to tell whether the data is positively or negatively skewed
The tail tells the tale; negative scores are to the left, when the long thin tail of a distribution is to the left of the distribution centre, it is negatively skewed. When the long thin tail of a distribution is to the right of the distribution centre, it is positively skewed.
39
Ways to present raw data
Frequency Tables, Grouped Frequency tables, Histograms, and Frequency Polygons
40
Ways to mislead with graphs
``` False Face Validity Lie Biased Scale Lie Sneaky sample lie Interpolation Lie Extrapolation Lie Inaccurate Values Lie ```
41
Types of Graphs
``` Scatterplot Line Graph Time Series Plot Bar Graph Pictorial Graphs Pie Charts ```
42
Central tendency
Refers to the descriptive statistics that represents the centre of a data set, the particular value that all the other data seem to be gathering around, it's what we mean when we refer to the typical score; can be measured through the mean, median, and mode
43
Mean
Arithmetic average of a group of scores
44
Statistic
A number based on a sample taken from a population
45
Parameter
number based on the whole population
46
Median
the middle score of all the score in a sample when the scores arranged in ascending order, if there is no single middle score, the median is the mean of the two middle scores
47
Mode
The most common score of all the scores in the sample; used (1) when one particular score dominates a distribution (2) when the distribution is bimodal or multimodal (3) when the data are nominal
48
Unimodal distribution
has one mode, or most common score
49
Bimodal distribution
has two modes, or most common scores
50
Multimodal distribution
has more than two modes, or most commons cores
51
Standard deviation
The square root of the average of the squared deviation from the mean, the typical amount that each score varies, or deviates, from the mean
52
Measures of variability
Range Variance Standard Deviation Interquartile Range
53
Independent measures t-test
Mann-Whitney Test
54
Repeated measures t-test
Wilcoxon Signed Rank
55
Independent measures Anova
Kruskal Wallis Test
56
Repeated measures Anova
Friedman Test
57
Pearson r
Spearman Rho
58
Random sample
One in which every member of the population has an equal chance of being selected into the study
59
Convenience Sample
One that uses participants who are readily available
60
Generalizability
Refers to researchers' ability to apply findings from one sample or in one context to the other samples or contexts, known as external validity
61
Replication
refers to the duplication of scientific results, ideally in a different context or with a sample that has different characteristics
62
Volunteer sample
special kind of convenience sample in which participants actively choose to participate in a study; also called a self-selected sample
63
Control Group
A level of the independent variable that does not receive the treatment of interest in a study; designed to match an experimental group in all ways but the experimental manipulation itself
64
Experimental Group
Level of the independent variable that receives the treatment or intervention of interest in an experiment
65
Null hypothesis
a statement that postulates that there is no difference between populations or that the difference is in a direction opposite from that anticipated by the researcher
66
Research hypothesis
Statement that postulates that there is a difference between populations or sometimes, more specifically, that there is a difference in a certain direction, positive or negative; also called an alternative hypothesis
67
Making a Decision About Our Hypothesis
We decide to reject the null hypothesis (there is a difference) We dede to fail to reject the null hypothesis (there is no difference)
68
Rules of Formal Hypothesis Testing
The null hypothesis is that there is no difference between groups and usually, our hypotheses explore the possibility of a mean difference We either reject or fail to reject the null hypothesis. There are no other options. We never use the word accept in reference to formal hypothesis testing
69
Type I Error
Occurs when we reject the null hypothesis but the null hypothesis is correct; false positive; rejecting the null hypothesis falsely; detrimental consequences because people often take action based on a mistaken finding
70
Type II Error
Occurs when we fail to reject the null hypothesis but the null hypothesis is false; false negative; results in a failure to take action because a research intervention is not supported or a given diagnosis is not received;
71
Standardization
Converts individual scores to standard scores for which we know the percentiles if the data were normally distributed;
72
z Score
The number of standard deviations a particular score is from the mean; can be computed if we know the mean and the standard deviation of a population
73
z scores into percentiles
2-14-34-34-14-2
74
Central Limit Theorem
REfers to how a distribution of sample means is a more normal distribution than a distribution of scores, even when the population distribution is not normal; repeated sampling approximates a normal curve even when the original population is not normally distributed; a distribution of means is less variable than a distribution of individual scores; minimum of thirty comprises each sample
75
Distribution of means
Distribution composed of many means that are calculated from all popsicle samples of a given size, all taken from the same population
76
Ways to describe the same scores within a normal distribution
Raw Scores z Scores Percentile Rankings
77
Assumptions
The characteristics that we ideally require the population from which we are sampling to have so that we can make accurate inferences;
78
Parametric Tests
Inferential statistical analyses based on a set of assumptions about the population
79
Nonparametric Tests
Inferential statistical analyses that are not based on a set of assumptions about the population
80
Assumptions for Conducting Analyses
The dependent variable is assessed using a scale measure, there is an equal distance between the number. If variable is nominal or ordinal, don't make assumption. Assume that the participants are randomly selected. Distribution of the population of interest must be approximately normal.
81
Steps of Hypothesis Testing
Identify populations, comparison, distribution, and assumptions. State the null and research hypothesis Determine the characteristics of the comparison distribution Determine critical values or cutoffs Calculate the test statistic Decide whether to reject or fail to reject the null hypothesis
82
Statistically significant finding
If the data differ from what we would expect by chance if there were, in fact, no actual difference; does not necessarily mean the finding is important or meaningful
83
Robust hypothesis test
one that produces fairly accurate results even when the data suggest that the population might not meet some of the assumptions
84
Critical value
Test statistic value beyond which we reject the null hypothesis, also known as a cutoff
85
Critical region
refers to the area in the tails of the comparison distribution in which we reject the null hypothesis if our test statistic falls there.
86
p level/alpha
The probability used to determine the critical values, or cutoffs, in hypothesis testing
87
Two-tailed test
Hypothesis test in which the research hypothesis does not indicate a direction of the mean difference or change in the dependent variable, but merely indicates that there will be a mean difference;
88
Point estimate
Summary statistic from a sample that is just one number used as an estimate of the population parameter
89
Interval Estimate
Based on a sample statistic and provides a range of plausible values for the ovulation parameter; used when reporting polls;
90
Confidence Interval
Internal estimate, based on the sample statistic, that would include the population mean a certain percentage of the time if we sampled from the same population repeatedly; centred around the mean. 95% confidence interval most commonly used, 95% falls between the two tails; Confidence level is 95%, confidence interval is the range between the two values that surround the sample mean.
91
Note on sample size and statistic
As sample size increases, there is a corresponding increase in test statistic during hypothesis testing; A larger sample size should influence our level of confidence but it shouldn't increase our confidence that the story is important
92
Effect Size
Indicates the size of a difference and is unaffected by sample size; tells us how much two populations DO NOT overlap; the less overlap, the bigger the effect size
93
How to decrease amount of overlap between two distributions
If means are farther apart | If the variation within each population is smaller
94
Effect Size and Standard Deviation
When two population distributions decrease their spread, the overlap of the distributions is less and the distribution is bigger
95
Cohen's d
Developed by Jacob Cohen; a measure of effect size that assesses the difference between two means in terms of standard deviation, not standard error; similar to a z statistic
96
Cohen's Conventions for Effect Sizes
Effect Size Convention Overlap Small 0.2 85% Medium 0.5 67% Large 0.8 53%
97
Statistical Power
Measure of our ability to reject the null hypothesis given that the null hypothesis is false; the probability that we will reject the null hypothesis when we should reject the null hypothesis; the probability that we will not make a Type II Error. Acceptable rate is .80
98
Ways to increase power of a statistical Test
1. Increase the alpha. Take the p level of 0.05 and increase it to 0.10; Side effect of increasing the probability of a Type I error from 5% to 10% 2. Turn a two-tailed hypothesis into a one tailed hypothesis 3. Increase N. Increasing sample size leads to an increase in the test statistic, making it easier to reject the null because a larger test statistic is more likely to fall beyond the cutoff 4. Exaggerate the levels of the independent variable. Example is to add to the length of group therapy if the study is on the effectiveness of group therapy for social phobia 5. Decrease the standard deviation (use reliable measure from the beginning of the study and sampling from a more homogeneous group in which participants' responses are more likely to be more similar to begin with)
99
Meta-analysis
Study that involves the calculation of a mean effect size from the individual effect sizes of many studies
100
Ways of analysing data
Hypothesis Testing Confidence Intervals Effect Size Power Analysis
101
t Distributions
Help us specify precisely how confident we can be in our research findings; The t test, based on t distributions, tells us how confident we can be that our sample differs from the larger population; used instead of a z distribution when sampling requires us to estimate the population standard deviation from the sample standard deviation
102
t Statistic
Indicates the distance of a sample mean from a population mean in terms of the standard error
103
Single-Sample t-test
Hypothesis test in which we compare data from one sample to a population for which we know the mean but not the standard deviation
104
Degrees of freedom
The number of scores that are free to vary when estimating a population parameter from a sample
105
Reporting a t statistic in APA Format
1. Write the symbol for the test statistic 2. Write the degrees of freedom, in parentheses 3. Write an equal sign and then the value of the test statistic, typically to two decimal places 4. Write a comma and then indicate the p value by writing "p=" and then the actual value t(4) = 2.87, p < 0.05 It appears that counselling centre clients who sign a contract to attend at least 10 sessions do attend more sessions, on average, than do clients who do not sign such a contract, t(4) = 2.87, p < 0.05
106
Dot plot
Graph that displays all the data points in a sample with the range of scores along the x-axis and a dot for each data point above the appropriate value
107
t Test
Types of t tests: Single sample t test (when we compare a sample mean to a population mean but don't know the population standard deviation) Paired-Samples t-test [Dependent Samples t test] (when we are comparing two samples and every participant is in both samples, a within-groups design; before and after comparisons) Independent samples t test (when we are comparing two samples and every participant is in only one sample, a between-groups design)
108
Assumptions for a paired samples t test
1. The dependent variable is scale 2. The participants were randomly selected 3. The population is normally distributed
109
Order Effects/Practice Effects
Refer to how a participant's behaviour changes when the dependent variable is presented for a second time
110
Counterbalancing
minimises the practice effect by varying the order of presentation of different levels of the independent variable from one participant to the next
111
Example of a Paired Samples t Test
Salaries for the same position in two different cities; scores of 30 students on two different exams; scores on tests before and after interventions
112
Independent samples t-test
used to compare two means for a between-groups design, situation in which each participant is assigned to only one condition; difference between means
113
Three Assumptions for Independent Samples t-test
1) The dependent variable is a rating on a liking measure, which can be considered a scale variable 2) We do not know whether the population is normally distributed, there are at least 30 participants 3) Participants are randomly selected
114
Example of independent samples t-test
Group 1: Low trust in leader Group 2: High trust in leader Level of agreement with their supervisor from 1 (strongly disagree) to 7 (strongly agree) Population 1: women exposed to humorous cartoons Population 2: men exposed to humorous cartoons Dependent variable: percentage of cartoons characterised as funny (scale) Ho: u1 = u2 Are women really more talkative than men? How long, in minutes, do male and female students spend getting ready for a date? Can women experience "mother hearing," an increased sensitivity to and awareness of noises, in particular, those of children? Mothers and non mothers.
115
Taylor and Ste-Marie studied eating disorders in 41 Canadian female figure skaters, They compared the figure skaters' data on the Eating Disorder Inventory to the means of known populations, including women with eating disorders. On average, the figure skaters were more similar to the population of women with eating disorders than to those without eating disorders
Single sample t test because we have one sample of figure skaters and are comparing that sample to a population (women with eating disorders) for which we know the mean
116
In an article titled "A Fair and Balanced Look at the News: What Affects Memory for controversial Arguments," Wiley found that people with a high level of previous knowledge about a given controversial topic (abortion, military intervention) had better average recall for arguments on both sides of that issue than did those with lower levels of knowledge
Independent Samples t test because we have two samples, and no participant can be in both samples. One cannot have both high level and low level of knowledge about a topic
117
Engle-Friedman and colleagues studied the effects of sleep deprivation. Fifty students were assigned to one night of sleep loss (students were required to call the laboratory every half-hour all night) and then one night of no sleep loss (normal sleep). The next day, the students were offered a choice of math problems with differing levels of difficulty. Following sleep loss, students tended to choose less challenging problems
We would use a paired-samples t test because we have two samples, but every student is assigned to both samples - one night of sleep loss and one night of no sleep loss
118
Anova Example
Three experiments to compare Group 1 vs Group 2, Group 1 vs Group 3, Group 2 vs Group 3, and putting all three groups in a single experiments is far more efficient. Scores in the final exam for Group 1 (control group), and group 3 (take responsibility group) were the same. average scores on the final exam for group 2 (self-esteem group) sank to .37%.
119
Using T tests to compare three groups
Leads to more chances of committing a Type I error. (0.95)(0.95)(0.95) = 0.857, this gives us almost a 15% chance of having at least one Type I error if we run 3 analyses.
120
F distributions
Allow us to conduct a single hypothesis test with multiple groups; more complex variations of the z distributions and the t distributions
121
Anova
Analysis of Variance; a hypothesis test typically with one or more nominal independent variables with at least three groups overall and a scale dependent variable
122
F statistic
Ratio of two measures of variance: (1) between groups variance, which indicates differences among sample means, and (2) within-groups variance, which is essentially an average of the sample variances; a way of measuring whether three or more groups vary from one another; an expansion of the z statistic and t statistic
123
Between Groups Variance
An estimate of the population of the population variance based on the differences among the means
124
Within Groups Variance
An estimate of the population variance based on the differences within each of the three (or more) sample distributions
125
When to use z, t, and F statistic
``` z = one sample population and standard deviation are known t = one sample, only population is known; two samples F = three or more samples ```
126
One-Way Anova
A hypothesis test that includes one nominal independent variable with more than two levels and a scale dependent variable
127
Within-Groups Anova
A hypothesis test in which there are more than two samples, and each sample is composed of the same participants; repeated measured ANOVA
128
Between-Groups Anova
A hypothesis test in which there are more than two samples, and each sample is composed of different participants
129
Assumptions for Anova
Samples are randomly selected. Population distribution is normal. All samples comes from populations with the same variances
130
External Validity
Ability to generalize beyond the sample
131
Homoscedasticity
Homoscedastic populations are those that have the same variance
132
Heteroscedastic Populations
Those that have different variances
133
Null Hypothesis for Anova
Ho = u1 = u2 = u3 = u4
134
Source Table
Presents the important calculations and final results of an Anova in a consistent and easy-to-read format
135
Grand Mean
The mean of every score in a study, regardless of which sample the score came from
136
R2
Proportion of variance accounted for by the dependent variable that is accounted for by the independent variable
137
Planned comparison
A test that is conducted when there are multiple groups of scores, but specific comparisons have been specified prior to data collection
138
Post hoc Test
Statistical procedure frequently carried out after we reject the null hypothesis in an analysis of variance; it allows us to make multiple comparisons among several means; often referred to as a follow-up test
139
A priori comparisons
Guided by an existing theory or a previous finding
140
Choices a researcher can make
Conducting one or more independent samples t tests with a p level of 0.05 Conducting one or more independent-samples t tests using a more conservative p level as determined by a Bonferroni Test
141
Tukey HSD Test
Post-hoc test that determines the differences between means in terms of standard error, comparable to a critical value; sometimes referred to as the q test; Involves (1) calculation of differences between each pair of means (2) division of each difference by the standard error
142
Within Groups Degrees of Freedom for a one-way between-groups ANOVA
df within = df1+df+df3+df4 Sum the degrees of freedom for each group by subtracting 1 from the number of people in that sample
143
One way within-groups ANOVA
When there's just one nominal or ordinal independent variable (type of beer), the independent variable has more than two levels (cheap, mid-range, and high-end), the dependent variable is scale (ratings of beers), and every participant is in every group (each participant tastes the beers in every category)
144
How to reduce error in a within groups design
Each group includes exactly the same participants, groups are identical on all the relevant variables; same taste preferences, amount of alcohol typically consumed, tendency to be critical or lenient when rating, and so on
145
Steps of Hypothesis Testing for within-groups ANOVA
Identify the populations, distribution, assumptions State the null and research hypotheses Determine the characteristics of the comparison distribution (F distribution, degrees of freedom [df within = (df between)(df subjects)] [df total = df between + df subjects + df within)] Determine critical values or cutoffs (F statistic for a p level of 0.05) Calculate the test statistic Make a decision
146
As social scientists
We should critically examine the research design and, regardless of its merits, call for a replication
147
Problems to watch for when using matched groups
We may not be aware of all of the important variables of interest If one of the people in a matched pair deicdes not to complete the study, then we must discard the data for the match for this person
148
Statistical Interaction
Occurs when a factorial design when two or more independent variables have an effect in combination that we do not see when we examine each independent variable on its own
149
Two-way ANOVA
Hypothesis test that includes two nominal independent variables, regardless of their numbers of levels, and a scale dependent variable
150
Factorial ANOVA
A statistical analysis used with one scale dependent variable and at least two nominal independent variables (factors); also called a multifactorial ANOVA
151
Factor
Term used to describe an independent variable in a study with more than one independent variable
152
How to name an ANOVA
if IVs Participants in 1 or all Always follows desc. One-way Between Groups ANOVA Two-way Within-Groups Three-way Mixed-Design
153
Example of ANOVA
Examine (1) the effect of Lipitor versus other medication (2) the effect of grapefruit juice versus other beverages (3) ways in which a drug and a juice might combine to create some entirely new and unexpected effect Lipitor Zocor Placebo GF JUICE L & G Z & G P & G WATER L & W Z & W P & W
154
Main effect
Occurs in a factorial design when one of the independent variables has an influence on the independent variable
155
Quantitative interaction
An interaction in which one independent variable exhibits a strengthening or weakening of its effect at one or more levels of the other independent variable, but the direction of the initial effect does not change
156
Qualitative interaction
Particular type of quantitative interaction of two (or more) independent variables in which one independent variable reverses its effect depending on the level of the other independent variable
157
Marginal Mean
The mean of a row or a column in a table that shows the cells of a study with a two-way ANOVA design
158
Six Steps of a Two-Way ANOVA
Identify the populations, distribution, assumptions State the null and research hypothesis Determine characteristics of the comparison distribution Determine the critical values, or cutoffs Calculate the test statistic Make a decision
159
Mixed Design ANOVA
Used to analyse data from a study with at least two independent variables; at least one variable must be between groups. Includes both a between-groups variable and within-groups variable
160
Multivariate Analysis of Variance (MANOVA)
Form of ANOVA in which there is more than one dependent variable; The word multivariate refers to the number of dependent variables, not the number of independent variables
161
Analysis of Covariance (ANCOVA)
Type of Anova in which a covariate is included so that statistical findings reflect effects after a scale variable has been statistically removed;
162
Covariate
scale variable that we suspect associates, or covaries, with the independent variable of interest; statistically subtracts the effect of a possible confounding variable
163
Multivariate Analysis of Covariance (MANCOVA)
An ANOVA with multiple dependent variables and the inclusion of a covariate
164
Example of Two-Way Between-Groups ANOVA
Online dating Website allows users to post personal ads to meet others. Each person is asked to specify a range from the youngest age acceptable to the oldest age acceptable. Data were randomly selected from ads of 25-year-old people living in the New York City area. Scores represent youngest acceptable ages listed by those in the sample. 25 y.o. women seeking men 25 y.o. women seeking women 25 y.o. men seeking women 25 y.o. men seeking men Two independent variables (gender of seeker, levels: male and female); and gender of the person being sought, levels: male and female); one dependent variable: youngest acceptable age of the person being sought)
165
Correlation
Association or relation between two variables; gives new ways to measure behaviour and to distinguish among the influences of overlapping variable
166
Correlation Coefficient
A statistic that quantifies a relation between two variables
167
Positive Correlation
An association between two variables such that participants with high scores on one variable tend to have high scores on one variable tend to have high scores on the other variable as well, and those with low scores on one variable tend to have low scores on the other variable
168
Three main characteristics of the Correlation Coefficient
It can be either positive or negative It always falls between -1.00 and 1.00 It is the strength (or magnitude) of the coefficient, not its sign, that indicates how large it is
169
Negative Correlation
An association between two variables in which participants with high scores on one variable tend to have low scores on the other variable
170
Guidelines on Size of Correlation and Correlation Coefficient
Size of Correlation Correlation Coefficient Small 0.10 Medium 0.30 Large 0.50
171
Limitations of Correlation
Correlation is Not Causation | Restricted Range
172
Possible Causal Explanations for a Correlation
The first variable might cause the second variable The second variable could cause the first variable A third variable could cause both A and B
173
Effect of an extreme outlier on a correlation
A correlation can be dramatically altered by a restricted range or by an extreme outlier
174
Pearson Correlation Coefficient
Statistic that quantifies a linear relation between two scale variables; a single number is used to describe the direction and strength of the relation between two variables when their overall pattern indicates a straight-line relation
175
Hypothesis Testing with the Pearson Correlation Coefficient
1. Identify the population, distribution, and assumptions 2. State the null and research hypotheses Ho: p = 0; H1: ≠ 0 3. Determine the characteristics of the comparison distribution (df = N-2) 4. Determine the critical/cutoff values. Look up values in r table given the degrees of freedom and the p level 5. Calculate the test Statistic 6. Make a decision
176
Coefficient alpha
Estimate of a test measure's reliability and is calculated by taking the average of all possible split-half correlations
177
Partial Correlation
Technique that quantifies the degree of association between two variables after statistically removing the association of a third variables after statistically removing the association of a third variable with both of those two variables
178
Simple Linear Regression
A statistical tool that lets us predict a person's score on the dependent variable from his or her score on one independent variable
179
Regression
A statistical technique that can provide specific quantitative information that predicts relations between variables; can provide specific quantitative predictions that more precisely explain relations among variables
180
Regression to the mean
Regression of the dependent variable; the tendency of scores that are particularly high or low to drift toward the mean over time
181
Standardized Regression Coefficient
A standardised version of the slope in a regression equation, is the predicted change in the dependent variable in terms of standard deviations for an increase of 1 standard deviation in the independent variable
182
Regression Line
The line that best fits the points on the scatterplot; the regression line is the line that leads to the least amount of error in prediction
183
Standard of Error of the Estimate
Statistic indicating the typical distance between a regression line and the actual data point; we are concerned with variability around the best line of fit rather than variability around the mean
184
Regression to the mean
Occurs because extreme scores tend to become less extreme, that is, they tend to regress towards the mean
185
Proportionate Reduction in Error/Coefficient of Determination
Statistic that quantifies how much more accurate predictions are when we use the regression line instead of the mean as a prediction tool
186
Orthogonal variable
An independent variable that makes a separate and distinct contribution in the prediction of a dependent variable, as compared with another variable
187
Multiple regression
A statistical technique that includes two or more predictor variables in a prediction equation
188
Use of Statistical Techniques
A way of quantifying whether multiple pieces of evidence really are better one A way of quantifying precisely how much better each additional piece of evidence actually is
189
Stepwise Multiple Regression
A type of multiple regression in which computer software determines the order in which independent variables are included in the equation
190
Hierarchical multiple regression
type of multiple regression in which the researcher adds independent variables to the equation in an order determined by theory
191
Structural Equation Modeling (SEM)
A statistical technique that quantifies how well sample data "fit" a theoretical model that hypothesises a set of relations among multiple variables
192
Statistical (or Theoretical) Model
Hypothesized network of relations, often portrayed graphically among multiple variables
193
Path
A term that statisticians use to describe the connection between two variables in a statistical model
194
Path Analysis
A statistical method that examines a hypothesised model, usually by conducting a series of regression analyses that quantify the paths at each succeeding step in the model
195
Manifest Variables
The variables in a study that we can observe and that are measured
196
Latent Variables
The ideas we want to research but cannot directly measure
197
Chi Square Statistic
Allows us to test relations between variables when they are nominal
198
When to use nonparametric test
1. When the dependent variable is nominal (whether or not a woman gets pregnant) 2. When the dependent variable is ordinal 3. When the sample size is small and we suspect that the underlying population of interest is skewed
199
Chi Square Test for Goodness-of-Fit
A nonparametric hypothesis test used with one nominal variable
200
Chi-Square Test for Independence
A nonparametric hypothesis test used with two nominal variables
201
Example of Chi Square
Researchers reported that the best soccer players in the world were more likely to have been born early in the year than later. 52 elite youth players in Germany were born in January, February, or March. Only 4 players were born in October, November, or December
202
Steps to conduct chi-square test for goodness-of-fit for Hypothesis Testing
1. Identify populations, distribution, and assumption 2. State the null and research hypotheses 3. Determine the characteristics of the comparison distribution (how many degrees of freedom) 4. Determine the critical values or cutoffs (use the chi-square table, basis is degrees of freedom and p level) 5. Calculate the test statistic 6. Make a decision
203
Steps in hypothesis testing for chi-square test for independence
1. Identify populations, distribution, and assumption 2. State the null and research hypotheses 3. Determine the characteristics of the comparison distribution 4. Determine the critical values or cutoffs using the degrees of freedom and the p level 5. Calculate the test statistic 6. Make a decision
204
Example of chi square test for independents
EXPECTED FREQUENCIES WITH TOTALS Pregnant Not Pregnant Clown No Clown
205
Relative risk/relative likelihood/relative chance
A measure created by making a ration of two conditional proportions
206
Adjusted Standardized Residual
The difference between the observed frequency and the expected frequency for a cell in a chi-square research design, divided by the standard error
207
Spearman Rank-Order Correlation Coefficient
A nonparametric Statistic that quantifies the association between two ordinal variables; coefficient can range from -1 to +1; can indicate a strong correlation but no causation
208
Wilcoxon Signed-Rank Test
Nonparametric hypothesis test used when there are two groups, a within-groups design, and an ordinal dependent variable
209
Hypothesis Testing for Wilcoxon Signed Rank Test
1. Identify the assumptions (differences between pairs must be ranked, random selection, difference scores should come from a symmetric population distribution) 2. State the null and research hypotheses (only in words, not symbols) 3. Determine the characteristics of the comparison distribution (T statistic; decide the cutoff or critical value; one-tailed or two-tailed test; determine the sample size) 4. Determine the critical values (check the table) 5. Calculate the test statistic 6. Make the decision
210
Mann-Whitney U Test
Nonparametric hypothesis test used when there are two groups, a between-groups design, and an ordinal dependent variable; U statistic
211
Hypothesis Testing for Mann-Whitney U Test
1. Identify the assumptions 2. State the null and research hypotheses 3. Determine the characteristics of the comparison distribution 4. Determine the critical values, or cutoffs (We want the smaller of the test statistics to be equal or smaller than this critical value) 5. Calculate the test statistics 6. Make the decision
212
Kruskal-Wallis H Test
A nonparametric hypothesis test used when there are more than two groups, a between-groups design, and an ordinal dependent variable, H
213
Hypothesis Testing for Kurskal-Wallis Test
1. Identify the assumptions 2. State the null and research hypotheses 3. Determine the characteristics of the comparison distribution 4. Determine the critical values, or cutoffs using a table, based on a chi square distribution with a p level of 0.05, and degrees of freedom 5. Calculate the test statistic 6. Make a decision
214
Bootstrapping
Statistical process in which the original sample data are used to represent the entire population, and we repeatedly take samples from the original sample data to form a confidence interval