Stats Flashcards

(127 cards)

1
Q

2 Basic Mathematical Principles important for EPPP

A

Squaring Decimals

Square rooting Decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Critical Factor in determining the type of stat test to be used

A

Type of data, particularly for the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 Types of Data

*NOIR

A

Nominal
Ordinal
Interval
Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Nominal data

A

Non ordered categorical data, assigned a number for identification purposes but no further meaning to numbers
Sex, political party, race
Can compute percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal Data

A

Ordered categorical data

Ex-grouped according to SES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interval Data

A

Numerical scores, but no zero score, or zero is not absolute (e.g. temp in celcius or farenheit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ratio data

A

Numerical score, has an absolute zero
Ex- money in bank, EPPP score, weight
Means can be calculated as well a comparisons across values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 Broad classes of statistics

A

Descriptive

Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

With descriptive stats, the data collected is ____, whereas with inferential stats, the goal is to make inferences about the ___ from the ___

A

simply described
population
sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 basic groups of Descriptive stats

A
  1. Stats on on whole group’s data

2. Stats describing ind’s score relative to the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Descriptive stats on group data include

A

measures of central tendency
measures of variability
Graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures of Central Tendency

A

Mean-avg score
Median- score at 50th percentile
Mode-most frequently occurring score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The best measure of central tendency is typically the ___

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If data is skewed (extreme scores present) the most accurate measure of central tendency is ___

A

median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measure of Variability

A

Standard Deviation-avg spread from the mean
Variance-
Range-diff between lowest & highest score obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard deviation is the __ __ of the variance

A

square root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Variance is the standard deviation

A

squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data that are not normally distributed are ___ or ___, meaning that scores are not equally distributed above & below the mean

A

skewed, kurtotic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In a positive skew, how are measures of central tendency impacted?

A

Mode is lowest, mean is highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In a negative skew, how are measures of central tendency impacted?

A

Mode is highest, mean is lowest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Leptokurtic distribution

A

Very sharp peak

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Platykurtotic Distribution

A

Flattened

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Normal Distribution

A

Bell shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Norm referenced score

A

provides info as to how a person scored relative to the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
The most informative norm referenced score is the ___ ___.
Percentile rank
26
Graphs for percentile ranks are ___ or ___
flat, rectangular
27
Standard scores
based on standard deviation of the sample
28
Examples of standard scores
``` z-scores t-scores IQ scores SAT scores EPPP scores ```
29
z-score
most basic standard score corresponds directly to standard deviation units, mean of 0, SD of 1 Ex- z score of +2 means the score is 2 SDs above the mean Shape of z score distribution always same as raw score distribution
30
z-score formula
z= score - mean/standard deviation
31
Parameters vs. Statistics
Population values vs Sample Values
32
mu
population mean
33
sigma
population standard deviation
34
Sampling Error
Samples are not perfectly representative of the population (sample means not identical to pop mean)
35
Standard Error of the Mean
The avg amount of deviation in a distribution of sample means
36
Standard Error of the Mean formula
SD population/square root of N
37
Central Limit Theorem
If an infinite number of equal sized samples are drawn from a population, the means of these samples will be a normal distribution. The mean of the means (the grand mean) will equal the population mean The standard deviation of the means will equal the SD of the population divided by the square root of the sample size (standard error of the mean) *the shape of a sampling distribution of means approaches normality as sample size increases
38
Standard Error of the mean helps up to determine
If an obtained mean is most likely due to treatment/experimental effects vs chance (sampling error) Ex: if SEM of IQ is 3 and testing the effectiveness of a IQ enhancement program yields a mean sample IQ of 103 this difference is likely due to chance. as opposed to sample IQ of 110, which would be 3 standard errors away from the mean (meaning that this is likely statistically significant)
39
Key concepts in hypothesis testing
Null Hypothesis Alternative Hypothesis Rejection Hypothesis
40
Null Hypothesis
States that there are no differences between groups, experimental research always hopes to reject the null hyp *results almost always stated in terms of the null hypothesis
41
Alternative Hypothesis
Directly states that there are differences between groups
42
Rejection region/Region of Unlikely Values
The tail end of the curve; unlikely that a researcher will obtain means in this region simply by chance. Suggests that treatment did have an effect & null hyp is rejected
43
Size of the rejection region corresponds to the ___ ___
alpha level | Ex: alpha of .05 indicates that rejection region is 5% of the curve
44
Acceptance/Retention region
No sig diffs between groups, null hyp is accepted
45
2 Factors contributing to conclusions re: stat significance
1. Treatment Effects | 2. Sampling Error
46
The only way to know w/certainty if a tx effect is significant is to:
Replicate study numerous times
47
4 Possible Outcomes in terms of Correctness of Research Findings
Type I Error Type II Error Power Correct Decision w/no name
48
Type I Error
Null is rejected, but later turns out to be a mistake, or diffs are found when they do not actually exist
49
The size of ___directly corresponds to likelihood of making Type I Error
Alpha
50
Conventional cutoff for alpha (.05, .01. .001) indicate that:
obtained means are different enough to be attributed to tx effects and not to chance
51
Type II Error
Null is accepted, but this is a mistake, or no diffs are found where differences do actually exist
52
The value of ___ corresponds to the probability of making Type II error
beta
53
Power
Null is rejected, and this is correct | Defined as the ability to correctly reject the null
54
Factors affecting Power
``` Increased w/: Large Sample Size Small random error Magnitude of intervention is large Statistical test is parametric Test is one tailed ```
55
___ has the most sig measurable effect of power; as ___ increases, so does power.
Beta; Alpha
56
Correct Decision w/no name
Null is accepted and this is correct
57
In determining the appropriate statistical test, you must first determine:
what type of question is being addressed in the research
58
Commonly asked questions in research
Questions of Difference between groups Questions of Relationship & Prediction Questions of Structure or Fit
59
Steps to Select the Appropriate Test of Difference
1. Type of Data of the DV (Nominal, Ordinal, Interval, Ratio) 2. Number of IVs and Levels of IVs 3. Sample/Group Independence vs. Correlation
60
If the DV is Nominal or Ordinal, a ___ test test will be used
non-parametric, for example chi-square, Mann-Whitney, Wilcoxin
61
If the DV is interval or ratio data, a ___ test will be used
parametric, for example t-test or ANOVA
62
If there is more than one DV (interval or ratio data), a ___ will the stat test of choice
MANOVA
63
Independent Groups
Subjects randomly assigned to conditions or are grouped based on a pre-existing characteristic (gender or ethnicity)
64
3 Factors Resulting in Correlated Groups
1. Repeated measures 2. Subjects matched prior to assignment to groups (i.e. matched on income, IQ, etc) 3. Inherent relationship between subjects (twins, siblings, spouses)
65
In order to use a parametric test, what 3 assumptions must be met?
1. Data is interval or ratio 2. Homoscedasticity-similar variability or SDs in the different groups 3. Data must be normally distributed * If one of these is not met, stat of choice will typically be one use for ordinal data
66
Assumption for the chi square test
Non parametric test | Answer: Independence of observations (no repeated measures design)
67
Degrees of freedom
``` # of possible variations in outcome that can be obtained *calculated differently based on the type of stat test ```
68
Single Sample Chi Square
Nominal data collected for 1 IV | Ex: 100 psychologists sampled as to their political affiliation (political party seen as columns or groups)
69
Single Sample Chi Square degrees of freedom formula
df= #columns - 1
70
Multiple Sample Chi Square degrees of freedom formula
Nominal data collected for 2 IVs | df= (#rows - 1) x (#columns -1)
71
Standard Error of the mean has a direct relationship with the ____ ____ ____ and an indirect relationship with ___ ___
population standard deviation sample size *SEM increases as SD increases and sample size decreases
72
2 Way ANOVA calculates:
calculates 3 F ratios (one for each main effect and one for the interaction)
73
df formula for single sample t test
df=N - 1 | N- number of subjects
74
when do we use a one sample t test?
interval or ratio data collected for one group of subjects | Ex-BDI obtained for 30 subjects
75
when do we use a t test for matched or correlated samples?
interval or ratio data collected for 2 correlated groups of subjects Ex- BDI obtained for 2 matched groups of 15 people (so 30 total)
76
df formula for matched samples t test
df= #pairs - 1
77
when do we use a Multiple sample chi square?
nominal data collected for 2 IVs | Ex- 100 psychologists sampled as to voting pref and ethnicity
78
when do we use a t test for independent samples?
interval or ratio data collected for 2 independent groups of subjects Ex-BDI obtained for 2 group of 15 randomly assigned subjects (30 total)
79
df formula for t test for independent samples
df= N -2
80
One Way ANOVA
interval or ratio data collected for more than 2 groups of subjects Ex- 60 subjects assigned to one of 4 tx groups
81
Formulas for df in one way ANOVA
df total= N - 1 df between groups= #groups - 1 df within groups= dftotal - dfbetweengroups
82
Formula for Expected Frequency in Chi Square when N & the groups are given
``` Expected Freq= N/total # of cells Ex- 4x2 chi square with a sample of 160 total # of cells is 8 160/8=20 expected freq in each cell=20 ```
83
Formula for expected freq in any cell when data are given for a chi square
Expected freq for any cell= (sum of the row x sum of the column)/ N
84
When do you use a one-way ANOVA?
when more than 2 groups are being compared on one IV Ex- comparing 4 diff depression txs preferable to using multiple t tests to avoid increasing probability of Type I error
85
Stat for One Way ANOVA
F Ratio | Want to find high variability between groups and low within
86
Formula for F Ratio; Guidelines for significance
F ratio= Mean Square between groups/Mean Square within groups *Mean square is measure of avg variability F Ratio= 1, no significance Typically sig when above 2.0
87
A significant F Ratio with an ANOVA means:
There are differences between groups, but you do not know which ones. Must perform post hoc analyses
88
Post hoc analyses following significant ANOVA involve:
many pairwise comparisons
89
Possible post hoc tests following sig ANOVA, in order from most to least protection from Type I error
``` Scheffe Tukey Duncan Dunette Neuman-Kuels Fisher's least sig diff *reverse order for protection from Type II error ```
90
When to use a Two Way ANOVA & main advantage over 2 separate one way ANOVAs
Groups are being compared on 2 IVs (ex- sex and treatment); examines main effects for each IV and interaction effects
91
In a 2 way ANOVA, if there are sig main & interaction effects, which is interp first?
Interactions
92
To calculate Main & Interaction effects of a 2 Way ANOVA on the test you:
1. Find the sum of each column (if sums are different, there is a main effect for that IV) 2. Find the sum of each row (if sums are different, there is a main effect for the second IV) 3. Divide the table into squares and the diagonal means for each square (if sums are diff, there is an interaction effect for those IVs)
93
When do we use a MANOVA?
When there is more than one outcome measure or DV
94
When an IV is quantitative, how do we analyze the data?
Trend Analysis Ex: IV is dosage of a drug, length of time, etc Data is non-linear, so less interested in group diffs but trends in the data
95
Stats depicting relationships between variables are termed ____, while stats that predict are termed ___ or ___
correlations | regressions/analyses
96
Bivariate correlations
look at relationship between variables, X (predictor) and Y (criterion)
97
Range of Correlation Coefficient
-1.0 to +1.0 (describes strength and direction of the correlation)
98
Graphic depictions of correlations
data point reps ind's score on both X and Y, the closer the points are clustered, the stronger the correlation
99
Correlation coefficient tells you
how the variability or spread of Y scores for any given X score compares to the total variability of Y scores Ex- if there is no correlation at all (coefficient of 0.0), for any given X, the range of possible Y could be anywhere from bottom to top of possible scores
100
Coefficient of Determination
correlation coefficient squared Represents amount of variability in Y that is explained or accounted for by X Ex- correlation coefficient of .50 for level of education and income .5 squared= .25, meaning that 25% of variability in income is explained by education level
101
Simple Linear Regression Equation
Derived anytime the correlation coefficient is other than 0.0, based on line of best fit through the scatter plot of scores
102
3 basic assumptions of bivariate correlations
Linear relationship between X and Y Homoscedasticity-similar spread of scores across scatter plot Unrestricted range of scores on both X and Y
103
Impact of restriction of range
Correlation, reliability and validity is always dramatically lower when the range of either variable is restricted
104
For Bivariate correlations, if both X and Y are interval or ratio data, you use
Pearson r
105
For Bivariate correlations, if both X and Y are ordinal (rank ordered) data, you use
Spearman's rho or Kendall's Tau
106
Zero Order Correlation
most basic correlation | analyzes rel btwn X and Y when no extraneous variable affect relationship
107
Partial Correlation ( First Order)
examines rel btwn X and Y when effect of a third, confounding variable is removed Ex: examine relationship btwn GPA & SAT scores after removing impact of parental education
108
Part (Semipartial) Correlation
examines rel btwn X and Y when the effect of a third, confounding variable is removed from only one of the orig variables
109
Moderator Variable (in Bivariate Corr)
A variable that influences the strength of relationship between predictor & criterion Ex- relationship between income & smoking may be different strength at diff ages
110
Mediator Variable (in Bivar Corr)
Explains why there is a rel between predictor & criterion | Ex- if effect of education removed from link btwn SES and smoking, corr goes down to almost 0
111
Multivariate Tests of correlation & prediction
``` Involve several predictors or IVs & one or more criterions or DVs Multiple R Multiple Regression Canonical R & Canonical Analysis Discriminant Functional Analysis Loglinear Analysis Path Analysis Structual Equation Modeling ```
112
Multiple R
Correlation btwn 2 or more IVs and one DV, where Y is always interval or ratio data and at least one X is interval or ratio data
113
Coefficient of Multiple Determination
Index of amt of variability in criterion Y that is accounted for by all predictors (Xs).
114
Multiple Regression
Uses Multiple R to derive equation that allows prediction of the criterion based on values of the predictors * To optimally predict, want low corr btwn predictors (Xs) and moderate to high corr btwn each predictor and the criterion * Compensatory technique b/c low scores on one predictor can be compensated for by high scores on another
115
Multicollinearity
Problem that occurs w/multiple regression equation when predictors are highly correlated with one another
116
2 most common subtypes of multiple regression
Stepwise-computerized, forward or backward | Hierarchical-researcher controls, adds variables to regr analysis in order most consistent w/theory proposed
117
Canonical R & Canonical Analysis
Extension of multiple R Corr btwn 2 or more IVs (rpedictor set) and 2 or more DVs (criterion set) *compensatory approach
118
Discriminant Fx Analysis
Used when there are 2 or more predictors (Xs) and one nominal (categorical) criterion variable Ex: predicting likelihood of passing or failing EPPP (categorical Y) based on time spent studying and number of practice tests completed *compensatory
119
Loglinear Analysis
Used to predict categorical criterion (Y) based on categorical predictors (Xs) Ex: type of grad program (categorical X) and sex (categorical X) used as predictors for passing or failing EPPP (cat Y) *compensatory
120
2 Approaches that apply correlational techniques to causal modeling
Path Analysis | Structural Equation Modeling
121
Tests of Structure
determine which variables in the set fit best together or form coherent subsets that are relatively independent of one another Includes: Factor Analysis, Cluster Analsysis
122
Factor Analysis
Extracts as many sig factors from the data (strongest to weakest), stronger the factor the more it will account for variability in scores
123
Eigenvalue
indicates strength of a factor, less than 1.0 are not interpreted
124
Factor Analysis starts w/___ ___ and computes ___ ___, which are correlations between a variable and the underlying factor
correlation matrix | factor loadings
125
Factor Rotation
Makes factor loadings more distinct & interpretable
126
2 types of factor rotation
Orthogonal (axes remain perpendicular) | Oblique
127
Cluster analysis
Gather data on variety of DVs and look for naturally occurring subgroups in the data, without a priori hypotheses