Statistics Flashcards

(129 cards)

1
Q

population

A

set of all individuals of interest in a study population = parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

parameter

A

numerical value that describes a population can be a single measurement or set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

set of individuals selected from a population, representative of population in a study sample = statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

statistics

A

numerical value that describes a sample can be a single measurement or set of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

descriptive statistic

A

statistical procedures that are used to summarize, organize, simplify data - make raw score meaningful e.g. mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

inferential statistics

A

techniques that allow us to study samples then make generalizations about the population - infer sample -> population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sampling error

A

discrepancy/ amount of error that exists between a sample statistic and population parameter - important to consider in inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

construct

A

internal attributes/ characteristics that cannot be directly observed but are useful for describing and explaining behavior - hypothetical e.g happiness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

operational definition

A

defines construct in terms of observable behaviors e.g. intelligence defines as performance on IQ test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

nominal scale

A

categorical organization - can only measure qualitative difference e.g gender, country of origin, hair color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ordinal scale

A

categories organized in a certain sequence, differences are quantitative - amount between one person and next is not consistent e.g. class rank, rating scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

interval scale

A

ordered categories that are intervals of exactly same size with an arbitrary zero point - 0 does not mean the absence of the construct being measured e.g. celsius scale, temp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ratio scale

A

interval scale with absolute zero point - can describe differences between categories in terms of ratios (one thing is 3 times larger than another) e.g. weight, height, speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

discrete variables

A

separate, indivisible categories - whole numbers or specific categories - no decimals e.g 3 goals scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

continuous variables

A

infinite number of possible values that fall between any two observed values - divisible into infinite number of fractional parts e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

real limits

A

boundaries of intervals for scores that are represented on a continuous number line - each score has two limits, half way between scores (upper real limit, lower real limit) e.g. if you have observed value of 8, actually represents range from 7.5 - 8.5 (kind of like rounding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

correlational method

A

two variables observed to see if there is a relationship between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

experimental method

A

establishes cause and effect relationship between variables - must manipulate one variable, observe second - controlled research situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

non-experimental method

A

variable determines group (those that have depression) - don’t manipulate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

independent variable

A

manipulated variable - 2+ treatment conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

dependent variable

A

observed for changes to assess effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

control

A

does not receive manipulated experimental treatment, baseline for comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

quasi-independent variable

A

groups not created by manipulating independent variable - participent variable (male/female) - time variable (before/after)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

summation notation

A

a way to represent scores n ∑ xi i = 1 i = the starting point of the scores n = the stopping point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
µ
population mean
26
x
sample mean
27
σ
population standard deviation
28
s
sample standard deviation
29
σ2
population variance
30
s2
sample variance SS/n (df w/ sample)
31
P
population portion that have particular attributes
32
p
sample proportion that have particular attributes
33
ρ
population correlation coefficient
34
r
sample correlation coefficient
35
N
population number of elements
36
n
sample number of elements
37
H0
null hypothesis
38
H1
alternative hypothesis
39
α
alpha probability of a type 1 error
40
B
beta probability of a type 2 error
41
type 1 error
incorrect rejection of a null hypothesis false positive thinking there is an effect when there isnt
42
type 2 error
incorrectly retaining a false null fals negative thinking there isnt an effect when there is one
43
frequency distribution
organized tabulation of the number of individual scores located in each category on the scale of measurement - takes disorganized scores and placed them in order from highest to lowest - see entire set of scores at glance - categories based odd measurement scale - can be graph or table
44
grouped frequency distribution
when the data covers a wide range of values and it is unrealistic to list individual scores - rule 1: ~10 class intervals - rule 2: relatively simple width (2, 5, 10) - rule 3: interval starts with a score that is multiple of the width - rule 4: all intervals should be the same width
45
bar graph
uses horizontal or vertical bars to show comparisons among categories - nominal/ordinal
46
ogive
curve of the cumulative frequency distribution or cumulative related frequency distribution - express simple frequency as percentage of total frequency - cumulate and plot these percentages (e.g. lowest scores makes up 5%, next score makes up 6% but the cumulative frequency is 11% so that is what is plotted for score 2)
47
polygon
a line drawn to join all the midpoints of the top bars of a histogram - like an ogive, but does not use cumulative frequencies or smooth lines - to convert to ogive, add up percentages before each bar
48
histogram
an area diagram -\> bars portray frequencies of possible values of a variable - continuous variables (this is why the bars touch) - set of rectangles along the intervals between class boundaries - areas proportional to the frequencies in corresponding classes
49
population distributions
cant find absolute frequency but can find relative frequencies e.g. don't know how many fish encompass the population in a lake -\> don't know how many trout or salmon, after research can say that there are twice as many trout as salmon
50
percentile
score point below which a specified % of the scores in a distribution fall - compute the percent \* N - round this figure so that it ends in .0 or .5 whichever is closer - if rounded value ends in .5 the desired centile is the next higher value, if ending in .0 split the difference with the next higher score
51
percentile rank
precent of cases which are below a specific point in the distribution - write down exact limits of the interval which contain the score whose rank is to be obtained - interpolate between the cumulative percents to dind desired CR exact limit/ cum % Y/A X/B Z/C X-Z/Y-Z = B-C/A-C
52
central tendency
descriptive statistical measure to determine a single score that defines the center of a distribution goal: find one score that is most representative of the group most common method of summarizing/describing distribution
53
mean
average; sum of scored divided by number of scores appropriate when... no extreme outliers, no nominal scales ∑X/N
54
median
the score that divides the distribution of scores exactly in half appropriate when... there are extreme outliers, no nominal scales, skewed distribution N/2
55
mode
score or category that has the greatest frequency appropriate when... you want answer to be correct as often as possible, nominal scales, discrete variables (hair color frequency)
56
how is the mean affected when adding/removing a new score?
will change mean, unless score is the same as the mean
57
how is the mean affected when adding/subtracting a constant to every score?
same constant is added/subtracted to the mean e.g. 1,2,3 M = 2; now add 2 to each score: 3,4,5 M = 4
58
how is the mean affected when scores are multiplied/divided by a constant?
mean changes in the same way e.g. 1, 2, 3 M = 2; now multiple all scores by 2: 2, 4, 6 M = 4
59
central tendency and its relation to symmetrical and skewed distributions
when choosing which measure is most valuable... normal dist: all equal skewed dist: median negatively skewed: mean \< median \< mode positively skewed: mode \< median \< mean
60
variability
quantitative measure of the degree to which scores in a distribution are spread out or clustered together no variability: no difference between scores small variability: small difference large variability: large difference
61
range
the distance between the largest score and the smallest score must compute in terms of real limits problem: solely determined by two extreme outliers of distribution calculate: substract lowest number from highest number
62
inter-quartile range
ignores any extreme outlier scores -\> measures the range covered by the middle 50% of the distribution separates scores into 4 equal parts with "cuts" either between or on certain scores interquartile range is distance between Q1 and Q3 (top 25% to lowest 25%) calculate: order from least to greatest, find median/middle number, calculate the median of the first half, calculate median of the 2nd half, substract the smaller half from the larger half
63
semi-interquartile range
half of the inter-quartile range middle 25% divide interquartile range in half
64
standard deviation (SD)
most commonly used and most important measure of variability takes into account all values of a variable mean = reference point; measures variability by considering distance between each score and the mean determines whether scores are generally near or far from mean, how much they deviate from the mean
65
SS (sum of square deviations) - population
∑(X - µ)2 find the deviation score: x - µ compute this for each score, be mindful of +/- square each deviation score (X - µ)2 add up all the deviation scores ∑(X - µ)2 this is SS
66
variance - population
take SS divide by N ∑(X-µ)2 / N large score = more variability = more scores are spread out = BAD
67
standard deviation - population
take square root of variance SS/N = σ2 \<- this is variance √σ2 \<- standard deviation
68
SS ( sum of square deviations) - sample
find deviation score x - M compute for each score square each deviation score (x - M)2 add up all deviation scores ∑(x - M)2 \<- this is SS
69
variance - sample
take SS divide by n-1 ∑(x-M)2 / n - 1 = s2
70
standard deviation - sample
square root variance for standard deviation √s2
71
unbiased statistic - how to correct?
unbiased statistic is an accurate representation of the population n - 1 in sample variance will correct for bias in sample variability
72
z-score
provides a precise description of a location in a distribution describes number of SD forom mean describes how common/exceptional a score is compared to others positive z-score = above the mean, negative z-score = below the mean
73
transforming z-scores
74
standardizing distributions
compare scores across test forms same shape as origianl distribution (scores renamed, but same location) e.g. z-score distribution when transforming x scores to z-scores, new M = 0, new s = 1
75
probability
likelihood that something will happen way to quantify randomness smaller # -\> less likely over the long run p = (# of certain outcome)/(#of all possible outcomes) probability is similar to findign percentile rank: what is the probability of having an IQ of 120 is the same as percentile rank of x = 120
76
experiement (probability)
act of flipping a coin or dice
77
mutually exclusive events
cannot happen at the same time - rolling a 2 and 6 on a die cant happen simultaneously
78
independent random sampling
probability of being selected is independ of the individuals already selected each individual in population has equal chance of being selected ensures that the probability of particular outcome does not depend on previous outcomes
79
sampling with replacement
returning selections back to the population probability of picking out a red m&m 1/10 - pick out an m&m, replace. probability is stil 1/10 instead of 1/9, 1/8, etc.
80
Unit normal table for probabilities in a normal distribution
transform score to a z-score (z = x-M/s) (x = M + zs) look up in unit normal table - proportions are always positive, even if z-score is negative negative z-score: tail is on the left, body on the right positive z-score: tail on the right, body on the left
81
distribution of sample means
set of means from all possible random samples (w/ replacement) of n from a population the larger the n, the smaller the st. error of the mean (means from multiple trials) -\> because there is less error between the sample mean and the population mean. the more people in the study, the less error between the sample and the population - sample means should be centered around population mean - expected that M = µx - the sample mean is an unbiased estimator of the population mean - distribution of sample means will approach a normal distribution even if original dist. is skewed.
82
standard error of the means
σM = σ/√n
83
sample mean relationship in distribution of mean
each sample mean, M, has a location in the distribution of sample means can be described in a z-score calculate: Z = (M-µ)/σM M of sample means - individual mean/standard error of the mean
84
hypothesis testing
determining whether the sample is representative of the population or merely the result of chance
85
null hypothesis
suggests that there are no difference between groups no effect assume null hypothesis is true unless data prove otherwise
86
alternative hypothesis
suggests there IS a difference between groups there is an effect
87
test statistic
of standard errors the sample value is removed from the null value use to determine whether to reject the null compared your data with that is expected under the null e.g. z-score
88
aplha level
probability of making a type 1 error decreasing significance level -\> decreases chance for type 1 error but increase chance for type 2
89
critical region
composed of the extreme sample values that are very unlikely to be obtained if the null is true boundaries determined by alpha level if sample data fall in the critical region, null is rejected calculate: 1. define alpha 2. use unit normal table to find which z-score to be larger (+) or smaller (-) than the critical region levels
90
hypothesis testing steps
1. state hypothesis (one tailed or two tailed - lower response vs. have a effect) 2. set the criteria - alpha level - find critical regions 3. collect data and evaluate - calculate standard error - calculate z-score 4. make a decision - reject null -\> sample data in criical region, tx had an effect - fail to reject null -\> treatment doesnt have an effect, not in critical region
91
effect size
magnitude of the treatment effect
92
Cohen's D
.2 = small effect .5 = medium effect .8 = large effect calculate: µtx - µnotx / s
93
power
probability that the test will correctly reject the null hypothesis helps determine # of participants needed related to effect size -\> higher effect size = higher chance of rejecting the null (both provide magnitude of tx effect) decrease standard error between two distributions -\> increase # of subjects factors that affect power: sample size, alpha level, 1 tailed vs. 2 tailed
94
R2
another way to calculate effect size - the amount of variability/percentage of variance accounted for .1 = small effect .09 = medium effect .25 = large effect
95
t - statistic
z stat used with unknown populatio mean and known standard deviation t stat used to test hypothesis about an unknown population mean when the standard deviation is unknown only difference between t and z is estiamted standard error calculate: t = M - µ / Sm difference between sample mean and population mean divided bt difference expected by chance
96
hypothesis testing using t - stat
1. set up hypothesis H0: M1 = M2; H1: M1 doesn not = M2 2. set the criteria - set alpha - find critical region 3. collect data and evaluate - calculate variance or SD (s2= ss/n-1 = ss/df) - calculate estimated standard error (sm = s/√n) - calculate t-stat (t = M - µ/ sm) 4. make a decision
97
percentage of variance explained - r2
r2 = t2/ t2+df
98
independent measures t test
comparing means of 2 independent groups uses separate sample for each of the tx populations compared examine difference between population means of 2 independent groups assumptions - independent obersvations -\> one observation doesnt affect probability of other observations - normal distribution - populations have equal variance -\> homogeneity of variance
99
hypothesis test for independent measure t-test
1. state H0 and H1 - H0: µ1 = µ2 OR µ1 - µ2 = 0 - H1: µ1 ≠ µ2 OR µ1 - µ2 ≠ 0 2. identify critical regions based on alpha - calculate total df (df = df1 + df2) - find critical region boudaries in t distribution table 3. evaluate assumptions 4. compute statistics - pooled variance - estimated standard error - independent samples t statistic 5. make decision regarding H0 - independent measures t test gives us total amount of error involved in using 2 sample means to estimate 2 population means - tells average distance between the sample difference and population difference - estimate the standard error using the sample standard devision or variance and, since there are two samples, we must average the two sample variances.
100
pooled variance
account for both standard errors, find them separate and then add together.
101
estimated standard error
102
estimated Cohens D - t-test
measures treatment effect mean difference divided by standard deviation (estimated standard error b/c its a t-test) M-µ/s
103
repeated measures design
repeatedly measures same individuals to assess change (within-subjects) - same sample, test twice, before/after tx - same subjects are being tested under different conditions
104
hypothesis testing repeated measures t - test
difference score (D) - change in an individuals score between two measures 1. state null and alternative H0: D = 0 H1: D ≠ 0 2. select alpha and criticial values 3. compute the t statistic (do not have to compute pooled variance because it is one group) - estimates standard error - dependent sample t statistic 4. make your decision
105
dependent sample t statistic
106
r2 for repeated measures
107
repeated measures (adv./disad.)
advantages - allows researcher to exclude effects of individual differences (own control group) - requires fewer participents -\> easier to recruit - study individuals over time disadvantages - order effects - variance reduced - other things can affect -\> history, maturity, attrition, testing, instrumentation
108
independent measures (adv./disad.)
advantages - order effects is not a problem - does not require as many materials as repeated measures because different people are being studies so you can reuse materials disadvantages - individuals differences
109
correlation
measures and describes a relationship between two variable
110
pearsons correlation
111
sum of products
calculate mean for x and y find deviation scores (x-M) multiply deviation score x and deviation score y add these (possibly more) take this amount minus (∑X)(∑Y)/n all together... ∑XY - (∑X)(∑Y)/n
112
spearman correlation
spearman uses ranks, one or both variables are ordinal d = differece in rank scores tied scores? - list scores smallest to highest - assign rank - if tied, compute mean fo their ranked positions and assign this value as final rank for each score
113
linear equation
line of best fit y = mx + b - m = slope of the line - b = y-intercept
114
least squared error solution
approach in regression to find the approximate solution of overdetermined systems (set of equations with more questions than unknowns)
115
linear regression equation
all you need is slope and the y-intercept to create a line of best fit y = bx + a b = SP/SSx
116
ANOVA
used to evaluate the diffrence between two or more sample meansm, compared variances ANOVA is used because multiple t-tests -\> more error compares between tx variance with within tx variance advantage: performs all tests with one hypothesis and one alpha, avoids the problem of inflated experiement-wise alpha hypotheses: null = all means are equal, alternative = there is at least one mean difference among the populations
117
ANOVA factors
number of independent variables between subjects = different subjects used for different levels of the factor within subjects = same subjects used for the different levels of the factor
118
ANOVA levels
number of conditions
119
ANOVA between tx variance
measures diffrences caused by - systematic tx effects - random, unsystematic factors
120
ANOVA within tx variance
measures differences caused by - random, unsystematic factors
121
when are posts tests necessary for ANOVA's
post tests are used when significant results are found and when additional exploration of the differences among means is needed provided specific info on which means are significanly different from each other
122
ANOVA effect size
r2 = ssb/ss total - this is the percentage of variance accounted for by the treatment
123
Chi-square test
determines association between 2 categorical variables - when scores violate assumptions of a parametric test - \> not normally distributed - \> unequally high variances - usually high variance - undetermined or infinite scores - this test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis e. g. relationship between personality and color preference
124
hypothesis test for chi-sqaure test goodness of fit
hypotheses H0: equal proportions or no difference from a known population Example: Men 50%, women 50% H1: unequal proportions or a difference from known population F0 = observed frequency - represent rela individuals - always whole numbers Fe = expected frequency (proportion times n) - predicted from the proportios in the null hypothesis and the sample sie - defines an ideal, hypothetical sample distribution that would be obtained if the sample proportions were in perfect agreement with the proportions specified in the null chi-square stat df = C-1 (C= # of categories) use table to determine if stat is in crtiical region
125
differences between F0 and Fe
small - small value for chi-sqaure - conclude there is a good fit between data and hypothesis - fail to reject null large - large chi-sqaure - reject the null - want a large value for chi square!
126
chi square for independence
variables are independent when there is no consistent, predictable relationship between them - two variables independent -\> frequency distribution for one variable has same shape for second variable - if there is no relationship between 2 variables (null) -\> distributions have equal proportions (null) each individual classified on each of the 2 variables - frequency distribution for sample tests hypothesis about corresponding frequency distribution for population - H0: distributions are the same (no differences, no relationship)
127
phi-coefficient
.1 = small .3 = medium .5 = large
128
cramers v
df small medium large 1 .1 .3 .5 2 .07 .21 .35 3 .06 .17 .29
129
percentage of variance accounted for - t-test
r2= t2/t2+df