Week 1&2 (Descriptive/Foundations/Experimental Designs/Comparing 2 Means/Inferential Tables/Statistical Software) Flashcards

1
Q

What are the types of biostatistics

A

descriptive statistics
probability
estimate population parameters
hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of population

A

target and accessible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

target population definition

A

the LARGER population to which results of a study will be generalized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

accessible population definition

A

the ACTUAL population of subjects available to be chosen for a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample definition

A

a subgroup of the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

parameter

A

statistical characteristics of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

statistic

A

statistical characteristic of sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

descriptive statistic

A

used to describe a sample shape, central tendency, and variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

inferential statistic

A

used to make inferences about a population (t-test, ANOVA, Pearsons R)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

measures of central tendency

A

mean, median, and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is central tendency

A

central value, BEST representative value of the target population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is variability

A

the “spread” of the data
small: spike like
large: wave

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

frequency definition

A

the number of times a value appears in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

frequency distribution

A

the pattern of frequencies of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

methods of displaying frequency distributions

A

histogram & stem and leaf plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

skewed to the left (image)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

skewed to the right (image)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

normal “skewed” (image)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

different shapes of distributions

A

normal (B)
skewed to right (A)
skewed to left (C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Skewed to right (words)

A

“tail” faces right not where the bulk of the curve lies
AKA “positive skew”
mean > median/mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Skewed to left (words)

A

“tail” faces left
AKA “negative skew”
mean < median/mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Measures of Central Tendency: best choice for MEAN

A

best choice for numberic
(not good for skewed data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Measures of Central Tendency: best choice for MEDIAN

A

best for non-symmetrical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Measures of Central Tendency: best choice for MODE

A

limited utility; nominal or ordinal data
common in surveys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Mean: Advantages
easy, don't have to arrange in order, all formulas are possible
26
Mean: Disadvantages
can't be used with categorical data, affected by extreme values
27
Median: advantages
easy, can be used with "ranked" data
28
Median: disadvantages
tedious in a large data set should be used with ordinal
29
mode: advantages
easy to understand and calculate
30
Mode: disadvantages
not based on all values unstable when the data consist of a small number of values sometimes the data has 2+ modes or no modes at all
31
common measures of variability
range, interquartile range, standard deviation, variance, coefficient of variation
32
range
difference between highest and lowest score
33
percentiles of range
a score's position within the distribution (divides into 100 parts)
34
quartiles of range
divides distribution into 4 equal parts
35
interquartile range (IQR)
difference between 25th and 75th percentile often used with median
36
What is a box plot?
five-number summary of data set (minimum, 1st quartile, median, 3rd quartile) box = interquartile range horizontal line at median "whiskers" = minimum and maximum scores
37
coefficient of variation
used for interval and ratio data only unitless helpful comparing variability between two distributions on different scales
38
what shape is normal distribution?
bell-shaped
39
constant and predictable characteristics of normal distribution
68% of scores are 1 SD of the mean 95% of scores are 2 SD of the mean 99% of scores are 3SD of the mean
40
z-scores
a standardized score based on the normal distribution allows for the interpretation of a single score in relation to the distribution of scores
41
probability definition
the likelihood that any one event will occur, given all the possible outcomes "what is likely to happen"
42
sampling error
difference between sample mean and population mean
43
what is sampling error measured by
standard error of the mean (SEM)
44
standard error of the mean equation
SEM = SD / square root of sample size (n)
45
what happens to the SEM if we increase our sample size?
decrease in error
46
What happens to the SEM if we increase our standard deviation?
increase in error
47
what is the standard error of the mean
allows us to estimate population parameters
48
90% SEM = z-score of what
1.65
49
95% SEM = z-score of what
1.96
50
99% SEM = z-score of what
2.58
51
point estimate
a single value that represents the best estimate of the population value
52
confidence interval
a range of values that we are confident contains the population parameters
53
increased precision (narrowed) by...
larger sample size less variance (lower s) lower selected level of confidence (90% vs 95%)
54
null hypothesis means
there is no difference
55
type I error
alpha "liar" we say there is a difference but there is no difference reject the null but the null is true
56
type II error
beta "blind" we say there is no difference but there is a difference do not reject the null but the null is false
57
normal value of alpha
.05
58
p-value
probability of type 1 error, if the null hypothesis is true
59
if p-value < a
reject the null
60
if p-value > a
Accept the null
61
if we "fail to reject" the null, we attribute any observed difference to
sampling error only
62
if a confidence interval does not have 0 it means
there is a real difference
63
if a confidence interval does have 0 it means
there is no difference
64
mistakenly finding a difference
false-positive
65
mistakenly finding no difference
false-negative
66
statistical power formula
1 - beta
67
critical values for a two-tailed test
+or- 1.96
68
one-tailed test is for
directional hypothesis
69
two-tailed test is for
nondirectional hypothesis
70
statistical power
the probability of finding a statistical significant difference if such a difference exists in the real world the probability that the test correctly rejects the null hypothesis
71
four pillars of power
alpha, effect size, variance, sample size
72
to increase power
higher alpha, large effect size, LOW variance, large sample size
73
decreased power
lower alpha, small effect size, HIGHER variance, smaller sample size
74
determinants of statistical power
Power (1-B), Alpha level of significance, N (sample size), Effect size PANE
75
A priori
before data collection before study
76
Post hoc
after data collection after study
77
A priori analysis standard effect sizes: small
d = .20
78
A priori analysis standard effect sizes: medium
d = .50
79
A priori analysis standard effect sizes: large
d = .80
80
True experimental design
RCT = gold standard IV manipulated by researcher at least 2 groups randomly assigned
81
Quasi-experimental designs
may lack randomization may lack comparison group may lack both
82
does a posttest-only control group give us all the information we need?
no we dont have all info
83
same people in each level of the IV
within-subject design
84
single-factor (one-way) repeated measures design
no control group, subjects act as their own controls
85
examples of parametric statistics tests
t-tests, ANOVA, ANCOVA, Correlation, Regression
86
Assumptions of Parametric Test
scale data (ratio or interval), random sampling, equal variance, normality
87
t-test
comparing 2 means 2 different groups
88
variance (differences) comes from 2 sources:
IV and everything else (error variance)
89
comparing means for INDEPENDENT groups
difference between means / variability within groups
90
comparing means for REPEATED measures
mean of differences between pairs / Std error of the difference scores
91
if t > 1 then
you have a greater difference between groups
92
if t < 1 then
you have more variability within groups
93
comparing means formula:
t = (treatment effect + error) / error
94
degrees of freedom definition
the number of independent pieces of information that went into calculating the estimate
95
degrees of freedom equation
df = n - 1
96
assumptions of unpaired t-tests
data from ratio or interval scales samples are randomly drawn from populations homogeneity of variance population is normally distributed
97
Effect size for t-test
the measure of effect the IV has on the DV
98
effect size: small
d = .20
99
effect size: medium
d = .50
100
effect size: large
d = .80
101
assumptions of paired t-tests
data from ratio or interval scales samples are randomly drawn from populations population is normally distributed
102
what is the best way to DECREASE the width of the CI?
decrease the percentage associated with the confidence interval