Stats Flashcards Preview

EPPP > Stats > Flashcards

Flashcards in Stats Deck (127):
1

2 Basic Mathematical Principles important for EPPP

Squaring Decimals
Square rooting Decimals

2

Critical Factor in determining the type of stat test to be used

Type of data, particularly for the DV

3

4 Types of Data
*NOIR

Nominal
Ordinal
Interval
Ratio

4

Nominal data

Non ordered categorical data, assigned a number for identification purposes but no further meaning to numbers
Sex, political party, race
Can compute percentages

5

Ordinal Data

Ordered categorical data
Ex-grouped according to SES

6

Interval Data

Numerical scores, but no zero score, or zero is not absolute (e.g. temp in celcius or farenheit)

7

Ratio data

Numerical score, has an absolute zero
Ex- money in bank, EPPP score, weight
Means can be calculated as well a comparisons across values

8

2 Broad classes of statistics

Descriptive
Inferential

9

With descriptive stats, the data collected is ____, whereas with inferential stats, the goal is to make inferences about the ___ from the ___

simply described
population
sample

10

2 basic groups of Descriptive stats

1. Stats on on whole group's data
2. Stats describing ind's score relative to the group

11

Descriptive stats on group data include

measures of central tendency
measures of variability
Graphs

12

Measures of Central Tendency

Mean-avg score
Median- score at 50th percentile
Mode-most frequently occurring score

13

The best measure of central tendency is typically the ___

mean

14

If data is skewed (extreme scores present) the most accurate measure of central tendency is ___

median

15

Measure of Variability

Standard Deviation-avg spread from the mean
Variance-
Range-diff between lowest & highest score obtained

16

Standard deviation is the __ __ of the variance

square root

17

Variance is the standard deviation

squared

18

Data that are not normally distributed are ___ or ___, meaning that scores are not equally distributed above & below the mean

skewed, kurtotic

19

In a positive skew, how are measures of central tendency impacted?

Mode is lowest, mean is highest

20

In a negative skew, how are measures of central tendency impacted?

Mode is highest, mean is lowest

21

Leptokurtic distribution

Very sharp peak

22

Platykurtotic Distribution

Flattened

23

Normal Distribution

Bell shaped

24

Norm referenced score

provides info as to how a person scored relative to the group

25

The most informative norm referenced score is the ___ ___.

Percentile rank

26

Graphs for percentile ranks are ___ or ___

flat, rectangular

27

Standard scores

based on standard deviation of the sample

28

Examples of standard scores

z-scores
t-scores
IQ scores
SAT scores
EPPP scores

29

z-score

most basic standard score
corresponds directly to standard deviation units, mean of 0, SD of 1
Ex- z score of +2 means the score is 2 SDs above the mean
Shape of z score distribution always same as raw score distribution

30

z-score formula

z= score - mean/standard deviation

31

Parameters vs. Statistics

Population values vs Sample Values

32

mu

population mean

33

sigma

population standard deviation

34

Sampling Error

Samples are not perfectly representative of the population (sample means not identical to pop mean)

35

Standard Error of the Mean

The avg amount of deviation in a distribution of sample means

36

Standard Error of the Mean formula

SD population/square root of N

37

Central Limit Theorem

If an infinite number of equal sized samples are drawn from a population, the means of these samples will be a normal distribution.
The mean of the means (the grand mean) will equal the population mean
The standard deviation of the means will equal the SD of the population divided by the square root of the sample size (standard error of the mean)
*the shape of a sampling distribution of means approaches normality as sample size increases

38

Standard Error of the mean helps up to determine

If an obtained mean is most likely due to treatment/experimental effects vs chance (sampling error)
Ex: if SEM of IQ is 3 and testing the effectiveness of a IQ enhancement program yields a mean sample IQ of 103 this difference is likely due to chance. as opposed to sample IQ of 110, which would be 3 standard errors away from the mean (meaning that this is likely statistically significant)

39

Key concepts in hypothesis testing

Null Hypothesis
Alternative Hypothesis
Rejection Hypothesis

40

Null Hypothesis

States that there are no differences between groups, experimental research always hopes to reject the null hyp
*results almost always stated in terms of the null hypothesis

41

Alternative Hypothesis

Directly states that there are differences between groups

42

Rejection region/Region of Unlikely Values

The tail end of the curve; unlikely that a researcher will obtain means in this region simply by chance. Suggests that treatment did have an effect & null hyp is rejected

43

Size of the rejection region corresponds to the ___ ___

alpha level
Ex: alpha of .05 indicates that rejection region is 5% of the curve

44

Acceptance/Retention region

No sig diffs between groups, null hyp is accepted

45

2 Factors contributing to conclusions re: stat significance

1. Treatment Effects
2. Sampling Error

46

The only way to know w/certainty if a tx effect is significant is to:

Replicate study numerous times

47

4 Possible Outcomes in terms of Correctness of Research Findings

Type I Error
Type II Error
Power
Correct Decision w/no name

48

Type I Error

Null is rejected, but later turns out to be a mistake, or diffs are found when they do not actually exist

49

The size of ___directly corresponds to likelihood of making Type I Error

Alpha

50

Conventional cutoff for alpha (.05, .01. .001) indicate that:

obtained means are different enough to be attributed to tx effects and not to chance

51

Type II Error

Null is accepted, but this is a mistake, or no diffs are found where differences do actually exist

52

The value of ___ corresponds to the probability of making Type II error

beta

53

Power

Null is rejected, and this is correct
Defined as the ability to correctly reject the null

54

Factors affecting Power

Increased w/:
Large Sample Size
Small random error
Magnitude of intervention is large
Statistical test is parametric
Test is one tailed

55

___ has the most sig measurable effect of power; as ___ increases, so does power.

Beta; Alpha

56

Correct Decision w/no name

Null is accepted and this is correct

57

In determining the appropriate statistical test, you must first determine:

what type of question is being addressed in the research

58

Commonly asked questions in research

Questions of Difference between groups
Questions of Relationship & Prediction
Questions of Structure or Fit

59

Steps to Select the Appropriate Test of Difference

1. Type of Data of the DV (Nominal, Ordinal, Interval, Ratio)
2. Number of IVs and Levels of IVs
3. Sample/Group Independence vs. Correlation

60

If the DV is Nominal or Ordinal, a ___ test test will be used

non-parametric, for example chi-square, Mann-Whitney, Wilcoxin

61

If the DV is interval or ratio data, a ___ test will be used

parametric, for example t-test or ANOVA

62

If there is more than one DV (interval or ratio data), a ___ will the stat test of choice

MANOVA

63

Independent Groups

Subjects randomly assigned to conditions or are grouped based on a pre-existing characteristic (gender or ethnicity)

64

3 Factors Resulting in Correlated Groups

1. Repeated measures
2. Subjects matched prior to assignment to groups (i.e. matched on income, IQ, etc)
3. Inherent relationship between subjects (twins, siblings, spouses)

65

In order to use a parametric test, what 3 assumptions must be met?

1. Data is interval or ratio
2. Homoscedasticity-similar variability or SDs in the different groups
3. Data must be normally distributed
*If one of these is not met, stat of choice will typically be one use for ordinal data

66

Assumption for the chi square test

Non parametric test
Answer: Independence of observations (no repeated measures design)

67

Degrees of freedom

# of possible variations in outcome that can be obtained
*calculated differently based on the type of stat test

68

Single Sample Chi Square

Nominal data collected for 1 IV
Ex: 100 psychologists sampled as to their political affiliation (political party seen as columns or groups)

69

Single Sample Chi Square degrees of freedom formula

df= #columns - 1

70

Multiple Sample Chi Square degrees of freedom formula

Nominal data collected for 2 IVs
df= (#rows - 1) x (#columns -1)

71

Standard Error of the mean has a direct relationship with the ____ ____ ____ and an indirect relationship with ___ ___

population standard deviation
sample size
*SEM increases as SD increases and sample size decreases

72

2 Way ANOVA calculates:

calculates 3 F ratios (one for each main effect and one for the interaction)

73

df formula for single sample t test

df=N - 1
(N- number of subjects)

74

when do we use a one sample t test?

interval or ratio data collected for one group of subjects
Ex-BDI obtained for 30 subjects

75

when do we use a t test for matched or correlated samples?

interval or ratio data collected for 2 correlated groups of subjects
Ex- BDI obtained for 2 matched groups of 15 people (so 30 total)

76

df formula for matched samples t test

df= #pairs - 1

77

when do we use a Multiple sample chi square?

nominal data collected for 2 IVs
Ex- 100 psychologists sampled as to voting pref and ethnicity

78

when do we use a t test for independent samples?

interval or ratio data collected for 2 independent groups of subjects
Ex-BDI obtained for 2 group of 15 randomly assigned subjects (30 total)

79

df formula for t test for independent samples

df= N -2

80

One Way ANOVA

interval or ratio data collected for more than 2 groups of subjects
Ex- 60 subjects assigned to one of 4 tx groups

81

Formulas for df in one way ANOVA

df total= N - 1
df between groups= #groups - 1
df within groups= dftotal - dfbetweengroups

82

Formula for Expected Frequency in Chi Square when N & the groups are given

Expected Freq= N/total # of cells
Ex- 4x2 chi square with a sample of 160
total # of cells is 8
160/8=20
expected freq in each cell=20

83

Formula for expected freq in any cell when data are given for a chi square

Expected freq for any cell= (sum of the row x sum of the column)/ N

84

When do you use a one-way ANOVA?

when more than 2 groups are being compared on one IV
Ex- comparing 4 diff depression txs
preferable to using multiple t tests to avoid increasing probability of Type I error

85

Stat for One Way ANOVA

F Ratio
Want to find high variability between groups and low within

86

Formula for F Ratio; Guidelines for significance

F ratio= Mean Square between groups/Mean Square within groups
*Mean square is measure of avg variability
F Ratio= 1, no significance
Typically sig when above 2.0

87

A significant F Ratio with an ANOVA means:

There are differences between groups, but you do not know which ones. Must perform post hoc analyses

88

Post hoc analyses following significant ANOVA involve:

many pairwise comparisons

89

Possible post hoc tests following sig ANOVA, in order from most to least protection from Type I error

Scheffe
Tukey
Duncan
Dunette
Neuman-Kuels
Fisher's least sig diff
*reverse order for protection from Type II error

90

When to use a Two Way ANOVA & main advantage over 2 separate one way ANOVAs

Groups are being compared on 2 IVs (ex- sex and treatment); examines main effects for each IV and interaction effects

91

In a 2 way ANOVA, if there are sig main & interaction effects, which is interp first?

Interactions

92

To calculate Main & Interaction effects of a 2 Way ANOVA on the test you:

1. Find the sum of each column (if sums are different, there is a main effect for that IV)
2. Find the sum of each row (if sums are different, there is a main effect for the second IV)
3. Divide the table into squares and the diagonal means for each square (if sums are diff, there is an interaction effect for those IVs)

93

When do we use a MANOVA?

When there is more than one outcome measure or DV

94

When an IV is quantitative, how do we analyze the data?

Trend Analysis
Ex: IV is dosage of a drug, length of time, etc
Data is non-linear, so less interested in group diffs but trends in the data

95

Stats depicting relationships between variables are termed ____, while stats that predict are termed ___ or ___

correlations
regressions/analyses

96

Bivariate correlations

look at relationship between variables, X (predictor) and Y (criterion)

97

Range of Correlation Coefficient

-1.0 to +1.0 (describes strength and direction of the correlation)

98

Graphic depictions of correlations

data point reps ind's score on both X and Y, the closer the points are clustered, the stronger the correlation

99

Correlation coefficient tells you

how the variability or spread of Y scores for any given X score compares to the total variability of Y scores
Ex- if there is no correlation at all (coefficient of 0.0), for any given X, the range of possible Y could be anywhere from bottom to top of possible scores

100

Coefficient of Determination

correlation coefficient squared
Represents amount of variability in Y that is explained or accounted for by X
Ex- correlation coefficient of .50 for level of education and income
.5 squared= .25, meaning that 25% of variability in income is explained by education level

101

Simple Linear Regression Equation

Derived anytime the correlation coefficient is other than 0.0, based on line of best fit through the scatter plot of scores

102

3 basic assumptions of bivariate correlations

Linear relationship between X and Y
Homoscedasticity-similar spread of scores across scatter plot
Unrestricted range of scores on both X and Y

103

Impact of restriction of range

Correlation, reliability and validity is always dramatically lower when the range of either variable is restricted

104

For Bivariate correlations, if both X and Y are interval or ratio data, you use

Pearson r

105

For Bivariate correlations, if both X and Y are ordinal (rank ordered) data, you use

Spearman's rho or Kendall's Tau

106

Zero Order Correlation

most basic correlation
analyzes rel btwn X and Y when no extraneous variable affect relationship

107

Partial Correlation ( First Order)

examines rel btwn X and Y when effect of a third, confounding variable is removed
Ex: examine relationship btwn GPA & SAT scores after removing impact of parental education

108

Part (Semipartial) Correlation

examines rel btwn X and Y when the effect of a third, confounding variable is removed from only one of the orig variables

109

Moderator Variable (in Bivariate Corr)

A variable that influences the strength of relationship between predictor & criterion
Ex- relationship between income & smoking may be different strength at diff ages

110

Mediator Variable (in Bivar Corr)

Explains why there is a rel between predictor & criterion
Ex- if effect of education removed from link btwn SES and smoking, corr goes down to almost 0

111

Multivariate Tests of correlation & prediction

Involve several predictors or IVs & one or more criterions or DVs
Multiple R
Multiple Regression
Canonical R & Canonical Analysis
Discriminant Functional Analysis
Loglinear Analysis
Path Analysis
Structual Equation Modeling

112

Multiple R

Correlation btwn 2 or more IVs and one DV, where Y is always interval or ratio data and at least one X is interval or ratio data

113

Coefficient of Multiple Determination

Index of amt of variability in criterion Y that is accounted for by all predictors (Xs).

114

Multiple Regression

Uses Multiple R to derive equation that allows prediction of the criterion based on values of the predictors
*To optimally predict, want low corr btwn predictors (Xs) and moderate to high corr btwn each predictor and the criterion
*Compensatory technique b/c low scores on one predictor can be compensated for by high scores on another

115

Multicollinearity

Problem that occurs w/multiple regression equation when predictors are highly correlated with one another

116

2 most common subtypes of multiple regression

Stepwise-computerized, forward or backward
Hierarchical-researcher controls, adds variables to regr analysis in order most consistent w/theory proposed

117

Canonical R & Canonical Analysis

Extension of multiple R
Corr btwn 2 or more IVs (rpedictor set) and 2 or more DVs (criterion set)
*compensatory approach

118

Discriminant Fx Analysis

Used when there are 2 or more predictors (Xs) and one nominal (categorical) criterion variable
Ex: predicting likelihood of passing or failing EPPP (categorical Y) based on time spent studying and number of practice tests completed
*compensatory

119

Loglinear Analysis

Used to predict categorical criterion (Y) based on categorical predictors (Xs)
Ex: type of grad program (categorical X) and sex (categorical X) used as predictors for passing or failing EPPP (cat Y)
*compensatory

120

2 Approaches that apply correlational techniques to causal modeling

Path Analysis
Structural Equation Modeling

121

Tests of Structure

determine which variables in the set fit best together or form coherent subsets that are relatively independent of one another
Includes:
Factor Analysis, Cluster Analsysis

122

Factor Analysis

Extracts as many sig factors from the data (strongest to weakest), stronger the factor the more it will account for variability in scores

123

Eigenvalue

indicates strength of a factor, less than 1.0 are not interpreted

124

Factor Analysis starts w/___ ___ and computes ___ ___, which are correlations between a variable and the underlying factor

correlation matrix
factor loadings

125

Factor Rotation

Makes factor loadings more distinct & interpretable

126

2 types of factor rotation

Orthogonal (axes remain perpendicular)
Oblique

127

Cluster analysis

Gather data on variety of DVs and look for naturally occurring subgroups in the data, without a priori hypotheses