extras Flashcards

(91 cards)

1
Q

what to remember when describing a distribution

A
  1. centre - median need to SAY median
  2. Spread - IQR - such that the middle 50% of scores are situated btw x and y + max and min
  3. Shape - peaks and distribution of scores + skewness
  4. any outliers?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers can occur because of?

A

sampling error

participant error

researcher error

random chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

probability density functions

A

hypothetical population distribution are defined using mathematical formulas known as pdfs - give the probability of observing a particular value of a variable

total area under the curve defined by a probability density function always equals 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

normal distribution is a…

A

hypothetical population distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

should you describe a sample as normal?

A

No, it approximates a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

standard normal distribution

A

Normal distribution with u=0 and o=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

z score if x is an observation from a normal distribution - z-score of x is

A

z = x-u/o

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Z scores follow what kind of distribution…

A

follow a normal distribution with u=0 and o=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sampling distribution

A

We can imagine collecting an infinite number of samples of N = 40 Peabody scores, leading to an infinite number of sample means and standard deviations.

each of these samples came from the same population, then each sample
mean is an estimate of the same population mean, , and each sample standard deviation is an estimate of the same population standard deviation, .

Because of sampling error (not “bias”!), very few, if any, of these mean and standard deviation estimates will exactly equal the true population mean and standard deviation.

creating a frequency distribution table or graph for the collection of sample means obtained from repeatedly collecting different samples of size N = 40 from the same population. This collection of sample means would form the sampling distribution of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A sampling distribution is the distribution of a …

A

statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling distributions are blank blank distributions

A

theoretical population distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Central limit theorem

A

Describes the sampling distribution of the mean

also applies to sample regression slope estimates

Central limit theorem - for means calculated from samples drawn from any parent population with the mean and sd, the sampling distribution of the mean will converge to a normal distribution with mean u and sd o/sqrtN - as N approaches infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard error is what

A

standard error of a statistic is the standard deviation of that statistics sampling distribution

o/sqrtN and is often represented as o xbar

average amount that that a sample mean xbar is expected to be different from the population mean u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Z score for individual

A

z = x-u/o

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

zscore for a sample mean

A

z = xbar - u/o/sqrtN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

point estimate

A

single value used as an estimate of a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are point estimates influenced by?

A

point estimates are calculated using data from random samples drawn from a much larger population so they are influenced by sampling error

variation of a point estimate from one sample to another represents the extent of sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sampling error and sample size

A

smaller samples have more sampling error than larger samples

point estimates from small samples, more sampling error

standard error of the mean formula- bigger N gets, smaller standard error gets - less sampling error with larger N

CI from small samples have more sampling error than from larger samples = wider CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Confidence interval does what?

A

Conveys the degree of sampling error around a point estimate by presenting a range of plausible or reasonable values for the population parameter of interest.

CI is a range of values or an interval that is expected to capture a population parameter of interest with some prespecified level of confidence.

gives the precision of a point estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the Central Limit Theorem tell us about sample means?

A

Sample means can be treated as observations from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Interpretation of a confidence interval

A

This interval captures u with 95% confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Factors affecting the width of a confidence interval that are under the researcher’s direct control:

A

level of confidence

sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Type I error

A

is the rejection of a true null hypothesis. The probability of a Type I Error is alpha (a), given that the correct statistical model has been used to test H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Type II error

A

is the failed rejection of a false null hypothesis. The probability of a Type II
error is beta ().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Power
Power is the probability of rejecting a false null hypothesis. Power is the complement of the probability of Type II error
26
What is power greater for?
larger sample sizes and for larger effect sizes
27
statistical model
represents the value of a dependent variable (often symbolized with the letter y) as a function of one or more parameters plus an error term.
28
General Linear Model
and thus all models we examine will express the dependent variable as a linear function of the parameter(s).
29
error variance,
which represents the extent that professor salaries differ from the mean salary In an intercept-only model, the error variance is equivalent to the variance of the dependent variable
30
t distribution is used when
using sample estimate of the standard error of the mean t distribution has higher kurtosis that results from the added uncertainty due to estimating the standard error
31
The particular T distribution used depends on what?
the degrees of freedom
32
When df = infinity, t distribution =
standard normal distribution
33
t stat formula
t = ybar - uo/sybar uo = population mean value given by the null hypothesis
34
One sample T test report
The mean nine-month salary for professors was M = $113,706.46 (SD = 30,289.04), with 95% CI [110,717.90, 116,695.10]. A one-sample t-test confirmed that this mean significantly differs from the U.S. population median salary, t (396) = 41.76, p < .001
35
Effect size
magnitude of the association difference between two means
36
Assumptions for a one-sample t-test
1. independent observations 2. sample data come from normal pop distribution
37
general linear model
represents the dependent variable as a function of population means
38
Describe confidence interval of a slope estimate
The interval from blank to blank captures the population mean difference with 95% confidence
39
CI formula for slope parameter
Bhat1 +- tcrit Sbhat1 sbhat1 = standard error of the slope parameter
40
Df in binary independent variable for determining t crit
n-2 = two coefficients in the estimated model
41
Standard error estimate of Bhat1 for a binary independent variable
Sbhat1 = sqrt(s2pooled/n1 +s2pooled/n2)
42
pooled variance estimate assumes what?
Population variance of the dependent variable is equal across the two groups - homogeneity of variance
43
Two sample t test or binary categorical anova null hypothesis
H0: u1 = u2 H0: B1 =0
44
Error term formula for predicted errors from linear model
ehati = yi - uhat1 one error term for each group
45
error term formula for errors from null model if null is true
ehati = yi - ybar
46
what is the purpose of a statistical model?
describe or explain individual differences or variation in a dependent variable
47
If a model does a good job of accounting for individual differences, what should the variance of errors be like?
variance of the errors should be small relative to the overall variance of the variable ie. full model has accounted for or explained a portion of the dependent variable variance
48
Proportion reduction in error
R2 - represents the proportion of dependent variable variance explained by the model
49
In the context of a single binary independent variable R2 =
eta squared
50
ANOVA
involves partitioning the total sample variation of the dependent variable into variation explained by the model and error variation -residual variation
51
Relation btw sd and variance
sd is the square root of the variance
52
Variance
sum of squared deviations from the mean
53
Numerator and denominator of F statistic
MS model/MS error Variability explained by the model/residual variability
54
SS Total
Sum of squared deviations of observed values of y from the mean of y SUM (yi-ybar)^2
55
Model SS for Y
SUM (uhati - ybar)^2
56
Model SS for Y is called variability explained by the model because
it summarizes the predicted variation due to group membership relative to the overall mean
57
Residual SS for Y
the sum of squared residuals across all observations described earlier SUM (yi-uhati)^2
58
Write out the ANOVA table
59
Formula for R^2
SS model/SS total 1-(SSresid/SStotal)
60
Range of F stat
0 to infinity
61
Distribution of F stat
One tailed Postiviely skewed 0 to infinity varies by DF
62
Formula for T for the difference between two sample means
t = ybar2 - ybar1/sybar2-ybar1
63
When numerator df =1 then F =
t^2
64
Independent samples T test report
“The mean time reaction time was significantly greater for those with a reading disorder diagnosis (M = 2039.76ms, SD = 1128.36) than the control group (M = 1374.68ms, SD = 625.35), t (36) = 2.28, p = .03. The 95% CI for the mean difference was [72.14, 1258.02].”
65
1. The observations are independent 2. The dependent variable is normally distributed within each group Homogeneity of variance: The use of the pooled variance estimate in the formula for the standard error of the regression slope (i.e., standard error of the sample mean difference) is based on the assumption that the sample variances of the two groups are both estimates of a single population variance.
66
Robustness against non-normality and homogeneity of variance violations when
sample size large sample size equal
67
1st dummy variable step
J-1 separate binary dummy variables
68
Null hypothesis of one way anova
H0: B1=B2=B3=B4=0
69
APA report one way anova
“The overall proportion of variance explained by the linear model, R 2 = .45, was significant, F (4, 45) = 9.09, p < .001, indicating that the number of words recalled significantly varied across the five conditions representing different levels of depth of processing.”
70
What does the result of an anova indicate
at least one population mean is unlikely to be unequal to the other population means.
71
T formula for each slope coefficient estimate
t = Bhat/sbhat
72
When are anova t-tests valid
as planned comparisons if a researcher explicitly planned to compare the mean of the reference with the other categories
73
When to do post hoc
When comparisons not planned a priori or you want to compare group means that do not include the reference group
74
APA report for a priori t tests
Because the dummy variables in the linear model were defined a priori, the corresponding ttests represent planned comparisons. The rhyming mean (M = 6.90) did not significantly differ from the counting mean (M = 6.90), t (45) = 0.07, p = .94. But the adjective mean (M = 11.00) was significantly different from the counting mean, t (45) = 2.88, p = .006.” Etc. for the t-tests for the remaining dummy variables.
75
Assumptions for one way ANOVA
1. independent observations 2. normally distributed errors 3. homogeneity of variance
76
What happens if one performs multiple significance tests on the same data without proper adjustments?
Probability that at least one of the tests produces a type 1 error is greater than .05
77
Formula for type 1 error accumulation
1-(1-a)^c
78
Tukeys HSD
experiment-wise Type I error rate is maintained at the -level used to test the omnibus null hypothesis, regardless of whether the pairwise comparisons were planned a priori.
79
Bonferroni adjustment
the experiment-wise alpha level is simply divided by the number of specific hypothesis tests to be performed.
80
Moderation
the second independent variable may moderate the effect of the primary independent variable; for this reason, the second independent variable is often called a moderator
81
What does a population model represent
how these two independent variables combine to explain individual differences in the dependent variable.
82
what does it mean that the main-effects model is likely misspecified
meaning that it is an incorrect model in the sense that it cannot adequately account for the major regularities of the data.
83
interaction effects,
allows the effect of a smoking-group dummy variable to be moderated by the effect of a task-type dummy variable.
84
Null for two way anova
all interaction terms = 0
85
Questions asked by comparing full model with main-effects model
Does smoking group significantly interact with task type? Do the smoking group mean differences significantly vary across the task types? Is the effect of smoking group significantly moderated by task type?
86
MS effect
main and full model Because the two models differ by the inclusion of the interaction terms, the difference between their RSS values (14857 – 13587) gives the overall interaction sum-of-squares term = 1269.5. = MSeffect
87
interaction degrees of freedom
(J – 1)*(K – 1)
88
family-wise error rate
control the overall probability of at least one Type I error within each level of the moderator variable. There are three levels of the task moderator, thus there are three families, and three pairwise comparisons within each family. Thus, the pvalues are adjusted based on three comparisons. A correction for experiment-wise error rate, on the other hand, would be based on nine comparisons.
89
, simple main effects
refers to the separate omnibus effects of a focal independent variable within different levels of a moderator variable. e.g. e simple main-effect of smoking group within the driving task
90
How to report on simple main effects
simple main-effect of smoking group within the reading task is significant, F
91
Assumptions for t tests and anovas
1. Independent observations 2. Dependent variable is normally distributed within each cell of the study design 3. Homogeneity of variance: The variance of the dependent variable is constant across the cells of the study design.