Exam 1 Flashcards
1) Explain difference between the INDEPENDENT VARIABLE and the DEPENDENT VARIABLE in a study
2)
- Which one (IV or DV) is manipulated by the researchers?
- Which one is being measured?
1)
Independent variable: the INTERVENTION or OUTSIDE variable that is introduced to change the DV in a scientific experiment. Testing IV to see CHANGE or INFLUENCE on DV.
Dependent variable: the MAIN thing being tested / MEASURED in experiment. It RESPONDS to the Independent Variable. It is what you are trying to effect/impact/improve.
2)
Independent = manipulated by researcher
Dependent = what is measured (what the IV changes)
Does the independent or dependent variable have UNITS (Can be measured)?
Dependent
What is VALIDITY in a study?
How useful, accurate, truthful, and meaningful the study results are.
How upright and GOOD the research methods were.
1) Explain the difference between INTERNAL validity and EXTERNAL validity?
2) What is criteria to know whether a study/research/experiment is INTERNALLY valid or not?
1)
INTERNAL: Was the research/study done correctly? Did the researchers do blinding, randomized selection, non-bias, attrition, good instruments used, have bias, fudge the #’s, etc. *** Is there correlation between the IV and DV?
EXTERNAL: Is the study and results APPLICABLE to the generalized public? Is it valid to others outside of the study group participants (to our patients)? It’s the “So what” - does this RELATE to my patients, or does the same result happen in different settings, groups, or tests/studies?
2)
- Those listed above … but also:
- If independent variable has definite effect on dependent variable then study is internally valid (CORRELATION)
- ** - If other factors influence the dependent variable, then study’s internal validity is questioned.
Explain these different types of VALIDITY:
1) Face:
2) Content:
3) CRITERIAN (next two)
- Concurrent:
- Predictive:
4) Construct:
5) What is a construct?
*** Construct: is it testing what it is testing. Content is it measuring all the little aspects. Construct - does IQ test actually test my IQ. Content … does it include all aspects of my intelligence.
1) Face: Does it measure what it is supposed to? It is the degree to which a study / test / research appears effective in terms of its stated goal. Ex: A goniometer measures ROM, it doesn’t measure blood pressure. Face validity is determined subjectively and most often by expert opinion.
2) Content: Does test (survey, intervention) measure the desired “content” you are trying to assess - relevant to the desired content domain. The content needs to match and be relevant to what is trying to be measured. Does the content of the measure represent the content of what is being measured. Ex: if you want to test knowledge of world geography, all your ?s can’t be U.S. geography ?’s. Likewise, a survey to assess LE function/pain should not include ?s about UE or back or head. If a teacher tests’ questions don’t reflect what student should have known is ex of BAD content validity.
3)
- Concurrent: (***) How the measure relates to the GOLD STANDARD (or reference point). Ex: using a goniometer to measure ROM should be compared to X-ray.
- Predictive: The test can be used to predict a future score/outcome (can we do one test to predict another outcome … do a TUG to predict a person’s fall risk)
4) Construct: Construct is the BIG abstract idea. The content is all the pieces the built up to and related to the Construct. So, Does the test interrelate with other tests as a measure of this construct should? Construct Validity is used to ensure that the measure/survey/intervention actually measures what it is intended to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar with the construct is a way in which this type of validity can be assessed. The experts can examine the items and decide what that specific item is intended to measure. Students can be involved in this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning throughout the major. The questions are written with complicated wording and phrasing. This can cause the test inadvertently to become a test of reading comprehension, rather than a test of women’s studies. It is important that the measure is actually assessing the intended construct, rather than an extraneous factor.
5)
WHAT IS A CONSTRUCT:
When you’re talking about a construct in relation to testing and construct validity, it has nothing to do with the way a test is designed or constructed. A construct is something that happens in the brain, like a skill, level of emotion, ability or proficiency. For example, proficiency in any language is a construct.
It is abstract … a skill, attribute, proficiency, skill
How is VALIDITY measured?
What are value ranges?
Pearson’s r =
Pearson’s r is Correlation. (remember that validity measures the relationship between the IV and DV)
0-1 (with 0 being no correlation or validity, 1 being perfect correlation between the IV and DV).
1) What is RELIABILITY
1A) How is reliability measured:
2) Is there a degree of error with any measurement?
3) Is Reliability and Validity the same? How are they different?
4) Can a measurement be valid and not reliable, or reliable and not valid?
1) When you do multiple tests / studies / measures / measurements / interventions over time, you want them to be reliable and produce CONSISTENT repeated measures. SO reliability is CONSISTENCY in test measurements
1A) ICC (Interclass Correlation Coefficient). Closer to 1 means high reliability (or higher than 0.75), from 0.50-0.75 is moderate reliability, less than 0.50 means poor reliability.
2) Yes, we are humans and not machines, so there is a degree of error. That SEM will be there, but the more consistent your measurement is, the more reliable it is.
3)
RELIABILITY refers to the CONSISTENCY and repeatability of findings / test outcomes / measurements.
VALIDITY refers to the CREDIBILITY or BELIEVABILITY of the research.
**
Reliability is another term for CONSISTENCY. If one person takes the same personality test several times and always receives the same results, the test is reliable.
A test is VALID if it measures what it is supposed to measure, done in a professional way, and the IV impacts the DV being tested. If the results of the personality test claimed that a very shy person was in fact outgoing, the test would be invalid.
4)
Reliability and validity are independent of each other. A measurement that is valid typically means it has reliability. But just because something is reliable and consistent does NOT mean it is valid. Suppose your bathroom scale was reset to read 10 pound lighter. The weight it reads will be reliable (the same every time you step on it) but will not be valid, since it is not reading your actual weight.
Validity of an assessment is the degree to which it measures what it is supposed to measure. This is not the same as reliability, which is the extent to which a measurement gives results that are consistently repeated.
1) Explain two different types of reliability:
2) How can you remember the difference
1) Single Day vs. Multiple Days (You can do different measures on the same day, or over different days. Am I reliable when doing measurements on the same / different day?)
Interrater vs. Intrarater.
2)
Interrater = between 2 testers (has RR so it is between two people)
Intrarater = between SAME person (has one R so it is a single person).
How is reliability measured?
What is range / values?
What is Pearson’s r … and how is it different than ICC?
Correlation … as an ICC (If you see ICC … that is RELIABILITY, and it is between 0 and 1 as well (higher and closer to 1 is better or more reliable).)
Greater than .75 is GOOD
0.51-0.75 is moderate
Less than .50 is poor
Pearson’s r: is Validity or CORRELATION between 2 variables (IV and DV). If one measure increases in value the second measure will also increase (CORRELATION). So Pearson’s r ranges from 0-1. As one variable increases, the other increases (POSITIVE CORRELATION … closer to 1). As one variable decreases, the other decreases (NEGATIVE correlation … closer to 0). Or as one increases, the other doesn’t (NO correlation). As one increases, the other decreases (goes negative)
****** Pearson’s r is used between 2 variables, IV and DV. But ICC would be used for 3+ different raters or researchers trying to get reliability over different tests. *********
If you did a test with results of an ICC of 0.96 for WB vs. a second test of an ICC of 0.92 for NWB …(using a digital inclinometer), which would you use?
0.96 for WB because it is more reliable (higher consistency between different tests)
What is AGREEMENT in a study
How is Agreement measured?
Two or more PT’s need to AGREE on what is the “normal”
Agreement is measured with a KAPPA statistic “k” (also between 0-1). 0 is by chance, and 1 is perfect.
How would you interpret these Interrater Agreement ROM/Pain variables:
- Lumbar side-bending is a kappa of 0.6
- Lumbar rotation is a kappa of 0.17
- 0.6 means there is a moderate score because I can measure how well they do that activity (fingers down to knees, and measure it). And 0.6 means PT’s AGREE there is a more than moderate chance that movement is actually lumbar movement (rather than compensation).
- 0.17 says that the PT’s do NOT AGREE as much … meaning there is more of a “chance” other compensations like lumbar rotation can be compensating since other muscles can help compensate and help out with trunk rotation.
Hypomobility vs. hypermobility
Hypo = joints are not as flexible, limited ROM, ligaments too short or tight (stiffness, pain, contractures)
Hyper = joints are flexible, lots of ROM, ligaments can stretch more than normal. (Hyperextension)
Explain “changes over time”
It’s IMPROVEMENT / PROGRESS in therapy. If you do a measurement and ROM improves 1 degree over a week, is that really a big difference? You need larger “changes over time” to ensure pt is progressing for their confidence and goals, to ensure PT (clinician) is making a difference and interventions are working, for documentation, and for insurance compensation.
1) What is Minimal Detectable Change (MDC)?
2) Does a change in measurement need to be more than MDC to be significant
3) Does MDC provide clinical significance?
4) Are MDC and MCID different?
1) Smallest amount of change an INSTRUMENT can accurately measure that corresponds with a noticeable CHANGE in patient’s ability. So it is NOT just SEM or measurement error of the clinician, but enough of a change (minimal detectable change) recorded to indicate some progress. May NOT yet be MCID or important enough change to Dr. and patient to suggest improvement, but it at least is MDC enough that it is NOT DUE TO MEASUREMENT ERROR /SEM.
2) Yes
3) No (**)
4) YES
Not every research article gives you minimal detectable change (MDC), but if you have standard error of measurement (SEM), can you find MDC?
MDC = SEM * 1.96 *√2
1) If the MDC is 4, then if you measured a 3, what does that tell you? But if you measure a 5, what does that tell you?
2) Is a 5 measurement with a MDC of 5 indication of a MCID?
3) So, a measurement outside / above the MDC tells you what?
1)
3: probably a measurement error (SEM), or no improvement
5: change is probably NOT due to error (SEM), but shows improvement.
2) No. MCID could be an 8 for the Dr. and patient to really care or see progress.
3) It tells you it is significant and NOT due to measurement error. But may not be to the point of MCID or clinical significance yet though.
What would be the better test:
Reliability of ICC = 0.85, and a MDC of 8 degrees
Another reliabilitiy of ICC = 0.96 with MDC of 4 degrees
ICC of 0.96 is much better, and MDC of 4 degrees is much much better.
1) What is Minimal Clinically Important Difference (MCID)
2) Give examples:
Pain and ROM MCID’s:
1) Smallest difference that CLINICIANS and especially PATIENTS would care about to show actual IMPROVEMENT/progress/healing. So what amount of improvement (measure) is actually significant.
2) Example is PAIN. Range from 0 to 10, and if pain goes down from a 6 to a 5.5 … that is MINIMAL difference (Dr. and patient don’t care or see real difference). For it to be a significant difference, it needs to be more than 2 points. So from 6 to 5.5 is not significant, but from 6 to 3.5 is a big difference in pain (or whatever you are measuring).
• MCID Pain scales = 2 points
• MCID for ROM range = about 5 degrees
***** So if someone comes in and pain doesn’t decrease by 2 points, or ROM doesn’t change by 5 degrees, then no real change or won’t exceed MCID.
What is the ceiling and floor effect?
Ceiling effect: instrument does not register a further increase in score for high scoring individuals. UNABLE TO DETECT HIGH PERFORMERS. High bar (standard) is set too low.
Floor effect: instrument does not register a further decrease in score for low scoring individuals. UNABLE TO DETECT LOW PERFORMERS. Low bar (standard) is set too low.
The highest or lowest score you can get on the scale
There might be a scale for ADL (Activities for Daily Living) or a Sports scale. I might score high on ADL but score low on Sports scale.
So if outcome measure has a low ceiling, then you don’t have a way to improve.
Can a study have statistical significance but no clinical significance (and visa versa)?
Do we care more about clinical significance or statistical significance?
Yes … and Yes
Ideally we want both, but we for sure care MORE about clinical significance.
EXPLAIN EACH:
- Null Hypothesis vs. Research Hypothesis
Statistical Significance Values:
- p values (and measurement value)
- Type I and Type II errors
- Precision (and CI’s)
Explanation of Clinical Significance is down a few slides.
- Research Hypothesis (or alternative hypothesis) is Ha = independent variable will cause a change in the dependent variable.
Null Hypothesis is Ho = independent variable will NOT cause a change in dependent variable (have NO effect or NO correlation). - p values: a small p-value (typically ≤ 0.05) indicates strong evidence AGAINST the null hypothesis, so you REJECT the null hypothesis (it means the IV is impacting DV). A large p-value (> 0.05) indicates WEAK evidence against the null hypothesis, so you do NOT reject the null hypothesis (low correlation between IV and DV). Usually it is less than 0.05, but it tells us NOTHING about clinical significance.
- Type I errors: FALSE POSITIVE (‘you are pregnant’ to a man). You REJECT null hypothesis when it was true. (These are less common)
- Type II errors: FALSE NEGATIVE (‘you are not pregnant’ to a pregnant woman). You do NOT reject null hypothesis when it was false. (These are more common)
Ho True Ho False Reject Ho Type I Correct Do NOT Reject Ho Correct Type II
- Precision: Precision of measurement is how confident one is in the ACCURACY (reliability is the reproducibility and consistency … where precision is the pin point accuracy repeated) of repeated measures, or STANDARD ERROR OF MEASUREMENT (SEM) in the unit of measure. But CI is the confidence interval and says 68% / 95% / 99% of the data will fall in this range.
1) What is difference between variability and variance?
2) Is variability the same as variance?
1)
VARIABILITY is the extent to which data points in a statistical distribution or data set diverge/vary from the average, or mean.
VARIANCE is the average of the SQUARES of the deviations … the STANDARD DEVIATION.
2) No. Variability is the different types of variability (range, mean) from the mean. Variance is the squared measurement of those variable numbers.
Measuring data collected can be classified or categorized into one of 3 types:
1) Categorical
2) Ordinal
3) Continuous
- ** Categorical: Gender, blood type, injury, married, etc.
- Order does NOT matter
- ** Ordinal: Order of numerical classification is important
- ORDER MATTERS
- ** Continuous: Data is on scale that can be continuously broken down
- Weight -> Height -> Age