Flashcards in Lecture 9 Deck (45):
Three common tools/indices to find 'meaningfulness' of statistical analysis?
*Statistical Significance (p-value)
*Confidence Intervals (95% or 99%)
*Magnitude of the effect (effect size)
Tools/indices for meaningfulness also tell us:
How well SAMPLE statistics generalize to the larger TARGET population
Making inferences about larger population based on sample--we should see the same results with a different sample
Statistical significance definition
The probability that a statistic from the sample represents a genuine phenomenon in the population--what we see in the sample we should see in the population
Statistical significance elements
*Null Hypothesis Significance Testing
*Systematic and Unsystematic Variation
*Comparing signal to noise
Null Hypothesis Significance Testing
We test null hypotheses… they are simpler. (H0) The question of interest is simplified into two competing claims (or hypotheses) between which we have a choice (between the null hypothesis and the alternative hypothesis). Special consideration is given to the null hypothesis. (e.g., H0 : there is no difference in symptoms for those receiving the new drug (Tx) compared to the current drug)
Variation that is explained by the model (SIGNAL)
Variation that cannot be explained by the model (NOISE)
We want the Effect(signal)>Error(noise)
What are the two possible conclusions of a hypothesis test with regard to the null hypothesis?
Fail to reject
What are the four possible outcomes of a hypothesis test?
*REJECT NULL THAT IS TRUE (Type 1 error, Incorrect Decision)
*REJECT NULL THAT IS FALSE (Correct Decision)
*ACCEPT NULL THAT IS TRUE
*ACCEPT NULL THAT IS FALSE
(Type 2 error, Incorrect Decision)
*"Simpler" and given priority over a more "complicated" theory
*We either REJECT or FAIL TO REJECT
*A statement of what a statistical hypothesis test is set up to establish
Type I error
Rejecting the Null when it is true
**Group differences were found when no actual differences exist
Which is the more serious error: Type I or Type II?
A Type I error is more serious and therefore more important to avoid.
*The test procedure is therefore adjusted so there is a guaranteed "low" probability of making a Type I error
The probability of a Type I error can be precisely computed:
Probability of a Type I error = alpha = p-value
The Probability Value (p-value)
The probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null is true
What do small p-values suggest?
The null is unlikely to be true
Common values for significance
.05, .01, or .001
What happens when you decrease the chance of a Type I error?
The chances of a Type II error increase!
Type II error
Accepting the null when it is false
**No group differences were found when group differences do actually exist
What is a Type II error frequently due to?
Sample size being too small
**If we accept the null, it may still be false as the sample might not be big enough to detect the differences between groups
What is the exact probability of a Type II error?
We don't know!
The probability of correctly rejecting a false null
**finding an effect when it exists
**In other words, the probability of NOT committing a Type II error
Max and Min values of Power
Reasons for low power
*Sample sizes too small
*Use of unreliable measures
What is the Power cutoff social scientists often use?
.80; there should be at least an 80% chance of NOT making a Type II error
Why is the Power more lenient than the .05 level used in significance testing?
Because greater care should be taken in asserting the relationship exists, rather than in failing to conclude that a relationship exists
One-Tailed Test of Significance
Researcher has ruled out interest in one of the directions, and the test is the probability of getting a result as strong/stronger only in ONE direction
Two-Tailed Test of Significance
Tests the probability of getting a result as strong/stronger than the observed result in either direction
A theoretical distribution of a sample statistic, used as a model of what would happen if the experiment was repeated infinitely.
Standard Error of Measurement (SE)
*The standard deviation of the sampling distribution of a given statistic
*AKA: The measure of how much RANDOM variation between observed scores and expected scores
Standard Error of the Mean
The average difference between the population mean and the individual sample mean
**How much error can we expect
**How confident the sample represents the population
What characteristics must be examined for the Standard Error of the Mean?
*How large is the sample?
*The standard deviation of the sample
How does sample size affect the Standard Error of the mean?
*Small sample size is related to Type II error (not big enough to detect differences)
*The larger the sample, the less error we should have in the estimate about the population (smaller standard error)
How does standard deviation of the sample affect the standard error of the mean?
*If the scores in my sample are very diverse (i.e., a lot of variation, a large SD), we can assume the scores in the population are also diverse
*The larger the sample SD = the greater the assumed variation of scores in the population = the larger the standard error of the mean
Small samples with large SDs produce large standard errors. Why?
These characteristics make it more difficult to have confidence that the sample accurately represents the population
*Conversely, a large sample with a small SD will produce a small standard error
A measure of strength or magnitude of experimental effect.
A way of expressing the difference between conditions using a common metric
Why do we use effect size rather than other significance testing?
*When examining effects using small sample sizes, significance testing can be misleading because its subject to Type II errors.
*When examining effects using large samples, significant testing can be misleading because even small or trivial effects are likely to produce statistically significant results.
Formulas for effect size
Mean difference divided by the pooled standard deviation
The effect size that expresses the difference between two means in standard deviation units
Cohen's d cut-offs
.2 = small effect
.5 = medium effect
.8 = large effect
A range of values within which the true differences between groups are likely to be
What p-values do a 95% CI and a 99% CI correspond to?
95% = .05
99% = .01