Statistical Analysis of Quantitative Data Flashcards

(153 cards)

1
Q

Purpose of Stat Analysis in Quan Research

A
  1. To describe the data (ex: sample characteristics)
  2. Estimate population values
  3. To test hypotheses
  4. To provide evidence regarding measurement properties of quantified variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Levels of Measurement from Lowest to Highest

A

Nominal
Ordinal
Interval
Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal Level

A

Lowest level

involves using numbers simply to categorize attributes

Named

ex: eye color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ordinal level

A

2nd Lowest Level

Ranks people on an attribute - hierarchy but unquantifiable - cannot know the distance betwen levels or it cant be quantified

Named and Natural Order

ex: Level of satisfaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interval Level

A

2nd highest level

Ranks people on an attribute AND specifies the distance between them - oftne used interchangeably with ratio

Named, Natural Order, and Equal distance between intervals

ex: Temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ratio Level

A

highest level

ratio scales, unlike interval scales, have a meaningful zero and provide information about the absolute magnitude of the attribute

Named, Natural Order, Qual distance between intervals, and a “True Zero” so ratio between values can be calculated

ex: Height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Nominal = ___

A

Names

ex: Male = 1; Female =2 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Nominal Level is more like taking ____ data

A

qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In many experiments, the independent variable is what level?

A

Nominal!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numeric Pain Scale is what level

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Age is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hours studied for a test is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Most biophysiologic data like pulse is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amount of money in bank account is what level

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What level are the following 4 things:

  1. Time of Day
  2. Completion time for running (hr/time)
  3. Runner registration # ina race
  4. Finish order for a race
A
  1. Interval (0 does not mean absence of time so its not ratio)
  2. Ratio
  3. Nominal
  4. Ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What level is gender

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What level is height wieght and pulse

A

ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What level is Grade in School

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What level is temperature

A

interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What level is zip code

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What level is dates on a calendar

A

interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Descriptive Statistics

A

Used to describe and synthesize data

Describes the data and what the sample looks like

Involves parameters and statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What sets parameters and statistics apart`

A

Parameters are descriptors for a population

Statistics is a descriptive infex from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Inferential Statistics

A

USed to make inferences about the population base don sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How does descriptive and inferential statistics differ
Descriptive stats is just for the group in front of you but inferential stats makes the inferences about the generalizable population
26
Frequency Distribution
A systemic arrangement of numeric values on a variable from lowest to highest and a count of the number of times (and/or percentage) each value was obtained
27
Frequency distributions can be described in terms of what 3 things
1. Shape 2. Central Tendency 3. Variability
28
In what ways can frequency distributions be presented
1. In a table (Ns and percentages) | 2. Graphically (ex: frequency polygons)
29
Frequency distributions can be described by their ____
symmetry
30
What is normal symmetry of a frequency distributionc alled
Normal Distribution (Bell Curve)
31
Skewed/Asymmetric Frequency Distribution
A distribution either skewed positively or negatively
32
Positive Skew
Long tails point right ex: Income
33
Negative Skew
Long tails point left ex: Youth death
34
Modality
number of peaks in a frequency distribution can be unimodal, bimodal, multimodal
35
Unimodal
1 peak
36
Bimodal
2 peaks can include normal distribution if averaging 2 peaks into a bell shaped curve
37
Multimodal
2+ peaks
38
Central Tendency
index of "typicalness" of a set of scores that comes from center of the distribution Includes mode median and mean
39
Mode
Measure of central tendency that is the most frequently occurring score in a distribution ex: 2333456789 - Mode = 3
40
Median
measure of central tendency where the point in a distribution above which and below which 50% of cases fall ex: 23334|56789 - Median = 4.5
41
Mean
measure of central tendency that equals the sum of all the scores divided by the total number of scores ex: 2333456789 Mean = 5
42
What measure of central tendency is most useful for when scores are skewed
Median
43
What measure of central tendency is seen msot frequently
Mean
44
Which measures of central tendency is least helpful and most helpful when using standard deviation
Mode - least helpful Mean - most helpful
45
Why is median helpful for skewed results
because it can offset the skew
46
Variability
the degree to which scores in a distribution are spread out or dispersed: homogeneity v heterogeneity
47
Homogeneity
Little variability in a frequency distribution sample Makes for a taller and less wide spike
48
Heterogeneity
Great variability in a frequency distribution sample
49
What are the 2 indexes of variability not seen in something like the mean
Range Standard Deviation (SD)
50
Range
The highest value minus the lowest value shows variability can be misled by outliers
51
Standard Deviation (SD)
average deviation of scores in a distribution shows variability - preferred to range
52
What is the Rule when it comes to standard deviations?
Rule of 68, 95, 99.7 68% of all Data/Sampling occurs within +/- 1 SD 95% of all data/sampling occurs within +/- 2 SD 99.7% of all data/sampling occurs within +/- 3 SD
53
What is important to know about the tails of Standard Deviation in a Normal Distribution
the 0.3% outside 3 SD means that the tails never truly touch 0 - so there is always a theoretical possibility for outliers any distance out
54
Bivariate Descriptive Statistics
Used for DESCRIBING the relationship between 2 variables Approachs: Crosstabs (Contingency Table) or Correlation Coefficients
55
Correlation Coefficient
Describes the intensity and direction of a relationship Rnages from -1 to 1
56
Negative Correlation Coefficient Relationship
-1 to 0 One variable increases in value as the other decreases ex: amount of exercise and weight
57
Positive Correlation Coeffieicnt Relationship
0 to 1 Both variables increase/decrease ex: Calorie consumption and weight
58
What does a correlation coefficient of 0 mean
there is no value difference / there is no relationship
59
The greater the absolute value of the correlation coefficient...
the stronger the relationship ex: r=-.45 is stronger than r=.40
60
If there are multiple variables and you want to see all of the correlations/relationships what can be displayed
A correlation matrix
61
Pearson's r
the product-moment correlation coefficient computed with continuous measurements r used for Ratio level / scales
62
Spearman's rho
used for correlations between variables measured on an ordinal scale (lower level) as compared to pearson's r being ratio
63
Clinical Decision Making in EBP involves the calculation of what ?
Risk indexes - so that decisions can be made about relative risks for alternative treatments or exposures ex: Absolute Risk, Absolute Risk Reduction (ARR), Odds ratio (OR), Numbers needed to treat
64
Absolute Risk
Index used a lot in clinical decision making to decide in doing an intervention and whether there will be actual reduction of poor outcomes
65
Absolute Risk Reduction (ARR)
Comparing risks in the group who got the outcome and who did not -estimated proportion of those spared undesirable outcomes because of their exposure to this intervention
66
Odds Ratio (OR)
Odds of proportion of those with the adverse outcome relative to those without it - what are the odds experimental group v control group develop undesirable outcomes Often seen in media/lay terms
67
Numbers Needed To Treat Risk Index
Estimation of how many people need to get an intervention before we see the prevention of one tru undesirable outcome So if 3.3 people need a smoking intervention before 1 quits smoking we can take this into account for budgeting purposes
68
Inferential Statistics
Used to make objective decisions about population parameters using sample data Provides a means for drawing inferences about a population, given data from a sample ex: Taking tylenol is the assumption of the trial's generalizations
69
Inferential stats is based on ...
the laws of probability
70
Why is sampling error a big issue for inferential statistics
Because fluctuation in samples/Unrepresentative samples do not allow accurate generalizability to the greater population A math program will assume we used best methods, but if our convenience sample under or overrepresented the population then it still runs the numbers assuming this and gives false results - this is why we should remain skeptical
71
Inferential statistics uses the concept of...
theoretical distributions (to the entire population) ex: Sampling distribution of the mean error
72
What do the stats/sampling distributions of inferential samplings act as a proxy for
Since we do not have the time or means to do infinite sampling we can assume principles of stats to assume what the general population mean would be
73
Inferential statistics always assumes that the population is...
normally distributed
74
What is the standard deviation called in inferential statistics
Standard Error of the Mean (SEM) So the SE is estimated from the SE of the actual sample
75
The ____ the SEM the better the generalizability
smaller
76
What improves accuracy of the estimate and shrinks SEM
larger sample size
77
alpha represents...
threshold of risk (5% chance for error and chance of being outside the 95% SE)
78
Why is it important to not udner/over represent in sampling as it impacts SEM
because we can end up in the tails of the distribution without knowing it sometimes it is not our fault but we have to prevent the times it is
79
Alpha states there is a 5% risk...
that our results came from the chance the null hypothesis is true
80
2 Purposes of Inferential Statistics
Point Estimation / Interval estimation Hypothesis Testing
81
Point Estimation
a single descriptive statistic that estimates the population value ex: a mean, percentage, or OR ex: mean BP, mean score on a scale, etc
82
Interval Estimation
a range of values within which a population value probably lies involes computing a confidence interval (CI)
83
Confidence INtervals reflect...
how much risk of being wrong researchers take in interval estimation
84
Confidence Intervals
indicate the upper and lower confidence limits and the probability that the population value is between those limits Confidence Limit is the estimate for a population range
85
What are the 2 main confidence interval numbers seen
99% 95%
86
What does a 95% CI of 40-50 for a sample mean of 45 indicate
that there is a 95% probability that the population mean is between 40 and 50
87
How do 95% and 99% CI differ
95% = tighter parameters but less confident, allows for a more accurate estimate 99% = less risk and less tolerance for risk, but naturally means estimate is not as precise
88
Hypothesis testing helps researchers...
to make objective decisions about whether results are likely to reflect chance differences or hypothesize effects
89
We can only ever ___ or ___ the ___ hypothesis with statistical decisions from hypothesis testing
accept or reject the null hypothesis | never proven or accepting the research hypothesis
90
Decisions of hypothesis is always made regarding which hypothesis
the null (accept or reject)
91
Rejecting the null implies..
there is a difference large enough between groups to say they are different from intervention rather than just general differences between the groups
92
If the value of the test statistic indicates that the null hypothesis is improbable then...
results are statistially significant
93
Nonsignificant results mean...
that any observed difference or relationship could have happened by chance
94
Statistical decisions (sig or not) are either ___ or ____
correct or incorrect
95
When can we know if a stat decision was correct or not
Not in the initial research but rather after enough replication
96
Type I Error
" False Positive " Rejection of the null when it should not be rejected - thought we saw something when there was not
97
Any stat decision in an initial trial has some level/risk of...
type I or II error
98
Telling a man he is pregnant would be what type of error
Type I Error
99
Risk of Type I and II error is controlled by ...
the level of significance (Alpha) ex: Alpha = 0.05 or 0.01
100
alpha is usually ____
0.05 the probability of rejecting the null hypothesis when it is true - if your p value is less than the alpha you reject the null
101
Type II Error
"False Negative" Failure to reject a null hypothesis when it should be rejected
102
Telling a pregnant woman she is not pregnant is what error
Type 2 -false negative
103
A type 1 error can only occur..
with statistically significant results
104
Power
the ability of a test to detect true relationships increases with larger samples --> larger power
105
Power needs to be at least...
0.80
106
Does Type I and II error mean an error was made necessarily?
No it means there was risk for making that error based on the conclusion
107
Hypothesis Testing Procedure
1. Select an appropriate Stat Test 2. Specify level of significance (ex: alpha = 0.05) 3. Compute a test statistic with actual data 4. Determine Degrees of Freedom (df) for the test stat (made by program) 5. Compare computed test stat to a theoretical value - decide if significant or not
108
Important Bivariate Stat Tests
t tests ANOVA chi squared test correlation coefficients effect size indexes
109
t-test
tests the difference between 2 means 2 types: independent groups between subjects and dependent (paired) groups within subjects
110
t test for independent groups: between subjects test
tests difference of means for 2 independent groups ex: men and women IV is nominal DV is continuous
111
t test for paired groups: within subjects test
to test the difference of means of a paired group ex: pretest v post test for same people IV is nominal DV is continuous
112
p-value
probability of the difference between the means meaning the null hypothesis is true So there is a 0.1(1%) chance that the difference in means is explained due to regular normal variation
113
alpha v p-value
Alpah is a 5% risk for error, but the p value is a 1% cahnce that the difference is from regular error if the p value is smaller than the alpha you can reject the null hypothesis error does not mean mistake ehre it means there is normal distribution - opposite of bias
114
ANOVA (Analysis of Variance)
Tests the difference between more than 2 means (3+ independent groups) IV - Nominal DV - continuous Can be one way (3 groups) Multifactor/Two Way, or Repeated measures ANOVA (within subjects)
115
What does ANOVA sort out
the variability of an outcome variable into 2 components: 1. variability due to the IV 2. Variability due to all other sources ex: Variation between groups is contrasted with variation within groups
116
What is the statistic yielded with ANOVA
F Ratio Statistics (it is the variation between groups contrasted wiht the variation within groups)
117
Chi Squared Test
Tests the difference in proprotions in 2+ independent groups Uses a contingency table - comparing observed frequencies in each cells with expected frequencies (the frequencies expected if there was no relationship) IV - Nominal (or ordinal) DV - NOMINAL!!! (or ordinal in some)
118
Chi Squared Tests are the inferential statistics version of a...
crosstab table
119
Test stat for Chi Squared Tests
X^2
120
What are test statistics
values used to compare in a table to get the p value - not used much anymore
121
If p is lower than the alpha..
results are statistically significant
122
Correlation Coefficients can be used in both...
inferential and descriptive statistics IV and DV -Continuous
123
What are the 3 things needed for any inferential statistic test
Test Statistic Number P Value Degrees of Freedom (DF) *could also include effect size*
124
Effect Size Indexes
summarize the magnitude of the effect of the IV on the DV - how much effect on the outcome measured an important concept in power analysis
125
In a comparison of two group means (ex. in a t test situation) the effect size is represented by...
Cohen's d
126
d < or equal to .20 means...
small effect
127
d = 0.50 means...
moderate effect
128
d > or equal to .80 means...
large effect
129
Multivariate Stat Analysis
stat procedure for analyzing relationships among 3 or more variables simultaneously ex: Multiple regression, ANCOVA, logisitc regression
130
Multiple Regression
used to predict a DV based on 2 or more IV (predictors) IV - continuous (interval or ratio) or dichotomous DV - continuous (interval or ratio level data) ex: What are things that effect birth weight: Grams at Birth - what is the number of IVs determining that ex: maternal age, income in dollars, maternal weight, SBP, smoking etc
131
What is the stat used in multiple regression
the Multiple Correlation Coefficient symbolized as R
132
Multiple Correlation Coeffiicient (R)
The correlation index for a DV and more than 2 IVs represented by R does not have negative values, but shows strength of relationships - not direction
133
R sees ___ not ___
strength not direction
134
R^2
an estimate of the proportion of variability in the DV accounted for by all predictors (multiple regression)
135
ANCOVA (Analysis of Covariance)
Extends ANOVA by removing the effect of confounding variables (covariates) before testing whether mean group differences are stat significant IV - Nominal (group status) Covariates - cont./dichotomous Individual differences variability due to all other sources
136
Logistic Regression
analyzes relationships between a nominal-level DV and more than 2 IVs yields an ODDS RATIO - the risk of an outcome occurring given one condition versus the risk of it occurring given a different condition
137
Reliability Assessment Tests
Test Retest Reliability Interrater Reliability Internal Consistency Reliability
138
Validity Assessment Tests
Content Validity Construct Validity Criterion Validity
139
Reliability
Accuracy of Results
140
Test Retest Relaibility
Give the same test over and over and hope to see similar results in that person
141
Interrater Reliability
Extent at which 2 raters will assign the same score to some attribute
142
Internal Consistency Reliability
Extent to which various components all measure the same thing -ex: chrombeck alpha
143
Content Validity
Multiple item scales whether content measures constructs of interest
144
Criterion Validity
How consistent with measurements on a scale with a comparison to a gold standard criterion Sensitivity and Specificity
145
Sensitivity
ability to correctly ID a case
146
Specificity
Ability to correctly rule out certain cases
147
Construct Validity
Extent to which measurement really measures the true construct done via hypothesis testing
148
When reading a research article and its hypothesis testing, what things are important to look for
1. The Test Used 2. The value of the calculated statistic 3. Degrees of freedom 4. Level of statistical significance (p-value)
149
A researcher measures the wieght of people in a study involving obesity and Type 2 diabetes. What type of measurement is being employed? A. Nominal B. Ordinal C. Interval D. Ratio
D. Ratio Rationale: Many physical measures, such as a person’s weight, are ratio measures. Gender is an example of a nominally measured variable. A measurement of ability to perform ADLs is an example of ordinal measurement, and interval measurement occurs when researchers can rank people on an attribute and specify the distance between them, e.g., psychological testing.
150
T/F: A bell shaped Curve is also called a normal distribution
True Rationale: A special distribution called the normal distribution (a bell shaped curve) is symmetric, unimodal, and not very peaked
151
The researcher subtracts the lowest value of data from the highest value of data to obtain: A. Mode B. Median C. Mean D. Range
D. Range Rationale: The range is calculated by subtracting the lowest value of data from the highest value of data. The mode refers to the most frequently occurring score. The median refers to the point distribution above which and below which 50% of the cases fall. The mean is the sum of all the scores divided by the total number of scores.
152
T/F: A correlation coefficient of -.38 is stronger than a correlation coefficient of +.32
True Rationale: For a correlation coefficient, the greater the absolute value of the coefficient, the stronger the relationship. So, the absolute value of −.38 is greater than the absolute value of +.32 and thus is stronger.
153
Which test would be used to compare the observed frequencies with expected frequencies within a cotningency table? A. Pearson's r B. Chi squared test C. t test D. ANOVA
B. Chi Squared Test Rationale: The chi-squared test evaluates the difference in proportions in categories within a contingency table, comparing the observed frequencies with the expected frequencies. Pearson’s r tests that the relationship between two variables is not zero. The t-test evaluates the difference between two means. The ANOVA tests the difference between more than two means.