Flashcards in Exam 1 Deck (109):
1) Explain difference between the INDEPENDENT VARIABLE and the DEPENDENT VARIABLE in a study
- Which one (IV or DV) is manipulated by the researchers?
- Which one is being measured?
Independent variable: the INTERVENTION or OUTSIDE variable that is introduced to change the DV in a scientific experiment. Testing IV to see CHANGE or INFLUENCE on DV.
Dependent variable: the MAIN thing being tested / MEASURED in experiment. It RESPONDS to the Independent Variable. It is what you are trying to effect/impact/improve.
Independent = manipulated by researcher
Dependent = what is measured (what the IV changes)
Does the independent or dependent variable have UNITS (Can be measured)?
What is VALIDITY in a study?
How useful, accurate, truthful, and meaningful the study results are.
How upright and GOOD the research methods were.
1) Explain the difference between INTERNAL validity and EXTERNAL validity?
2) What is criteria to know whether a study/research/experiment is INTERNALLY valid or not?
INTERNAL: Was the research/study done correctly? Did the researchers do blinding, randomized selection, non-bias, attrition, good instruments used, have bias, fudge the #'s, etc. *** Is there correlation between the IV and DV?
EXTERNAL: Is the study and results APPLICABLE to the generalized public? Is it valid to others outside of the study group participants (to our patients)? It's the "So what" - does this RELATE to my patients, or does the same result happen in different settings, groups, or tests/studies?
- Those listed above ... but also:
- If independent variable has definite effect on dependent variable then study is internally valid (CORRELATION)
*** - If other factors influence the dependent variable, then study’s internal validity is questioned.
Explain these different types of VALIDITY:
3) CRITERIAN (next two)
5) What is a construct?
*** Construct: is it testing what it is testing. Content is it measuring all the little aspects. Construct - does IQ test actually test my IQ. Content ... does it include all aspects of my intelligence.
1) Face: Does it measure what it is supposed to? It is the degree to which a study / test / research appears effective in terms of its stated goal. Ex: A goniometer measures ROM, it doesn't measure blood pressure. Face validity is determined subjectively and most often by expert opinion.
2) Content: Does test (survey, intervention) measure the desired "content" you are trying to assess - relevant to the desired content domain. The content needs to match and be relevant to what is trying to be measured. Does the content of the measure represent the content of what is being measured. Ex: if you want to test knowledge of world geography, all your ?s can't be U.S. geography ?'s. Likewise, a survey to assess LE function/pain should not include ?s about UE or back or head. If a teacher tests' questions don't reflect what student should have known is ex of BAD content validity.
- Concurrent: (***) How the measure relates to the GOLD STANDARD (or reference point). Ex: using a goniometer to measure ROM should be compared to X-ray.
- Predictive: The test can be used to predict a future score/outcome (can we do one test to predict another outcome ... do a TUG to predict a person's fall risk)
4) Construct: Construct is the BIG abstract idea. The content is all the pieces the built up to and related to the Construct. So, Does the test interrelate with other tests as a measure of this construct should? Construct Validity is used to ensure that the measure/survey/intervention actually measures what it is intended to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar with the construct is a way in which this type of validity can be assessed. The experts can examine the items and decide what that specific item is intended to measure. Students can be involved in this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning throughout the major. The questions are written with complicated wording and phrasing. This can cause the test inadvertently to become a test of reading comprehension, rather than a test of women’s studies. It is important that the measure is actually assessing the intended construct, rather than an extraneous factor.
WHAT IS A CONSTRUCT:
When you're talking about a construct in relation to testing and construct validity, it has nothing to do with the way a test is designed or constructed. A construct is something that happens in the brain, like a skill, level of emotion, ability or proficiency. For example, proficiency in any language is a construct.
It is abstract ... a skill, attribute, proficiency, skill
How is VALIDITY measured?
What are value ranges?
Pearson's r =
Pearson's r is Correlation. (remember that validity measures the relationship between the IV and DV)
0-1 (with 0 being no correlation or validity, 1 being perfect correlation between the IV and DV).
1) What is RELIABILITY
1A) How is reliability measured:
2) Is there a degree of error with any measurement?
3) Is Reliability and Validity the same? How are they different?
4) Can a measurement be valid and not reliable, or reliable and not valid?
1) When you do multiple tests / studies / measures / measurements / interventions over time, you want them to be reliable and produce CONSISTENT repeated measures. SO reliability is CONSISTENCY in test measurements
1A) ICC (Interclass Correlation Coefficient). Closer to 1 means high reliability (or higher than 0.75), from 0.50-0.75 is moderate reliability, less than 0.50 means poor reliability.
2) Yes, we are humans and not machines, so there is a degree of error. That SEM will be there, but the more consistent your measurement is, the more reliable it is.
RELIABILITY refers to the CONSISTENCY and repeatability of findings / test outcomes / measurements.
VALIDITY refers to the CREDIBILITY or BELIEVABILITY of the research.
Reliability is another term for CONSISTENCY. If one person takes the same personality test several times and always receives the same results, the test is reliable.
A test is VALID if it measures what it is supposed to measure, done in a professional way, and the IV impacts the DV being tested. If the results of the personality test claimed that a very shy person was in fact outgoing, the test would be invalid.
Reliability and validity are independent of each other. A measurement that is valid typically means it has reliability. But just because something is reliable and consistent does NOT mean it is valid. Suppose your bathroom scale was reset to read 10 pound lighter. The weight it reads will be reliable (the same every time you step on it) but will not be valid, since it is not reading your actual weight.
Validity of an assessment is the degree to which it measures what it is supposed to measure. This is not the same as reliability, which is the extent to which a measurement gives results that are consistently repeated.
1) Explain two different types of reliability:
2) How can you remember the difference
1) Single Day vs. Multiple Days (You can do different measures on the same day, or over different days. Am I reliable when doing measurements on the same / different day?)
Interrater vs. Intrarater.
Interrater = between 2 testers (has RR so it is between two people)
Intrarater = between SAME person (has one R so it is a single person).
How is reliability measured?
What is range / values?
What is Pearson's r ... and how is it different than ICC?
Correlation ... as an ICC (If you see ICC … that is RELIABILITY, and it is between 0 and 1 as well (higher and closer to 1 is better or more reliable).)
Greater than .75 is GOOD
0.51-0.75 is moderate
Less than .50 is poor
Pearson's r: is Validity or CORRELATION between 2 variables (IV and DV). If one measure increases in value the second measure will also increase (CORRELATION). So Pearson's r ranges from 0-1. As one variable increases, the other increases (POSITIVE CORRELATION ... closer to 1). As one variable decreases, the other decreases (NEGATIVE correlation ... closer to 0). Or as one increases, the other doesn't (NO correlation). As one increases, the other decreases (goes negative)
********************** Pearson's r is used between 2 variables, IV and DV. But ICC would be used for 3+ different raters or researchers trying to get reliability over different tests. *********************************
If you did a test with results of an ICC of 0.96 for WB vs. a second test of an ICC of 0.92 for NWB ...(using a digital inclinometer), which would you use?
0.96 for WB because it is more reliable (higher consistency between different tests)
What is AGREEMENT in a study
How is Agreement measured?
Two or more PT's need to AGREE on what is the "normal"
Agreement is measured with a KAPPA statistic "k" (also between 0-1). 0 is by chance, and 1 is perfect.
How would you interpret these Interrater Agreement ROM/Pain variables:
- Lumbar side-bending is a kappa of 0.6
- Lumbar rotation is a kappa of 0.17
- 0.6 means there is a moderate score because I can measure how well they do that activity (fingers down to knees, and measure it). And 0.6 means PT's AGREE there is a more than moderate chance that movement is actually lumbar movement (rather than compensation).
- 0.17 says that the PT's do NOT AGREE as much ... meaning there is more of a "chance" other compensations like lumbar rotation can be compensating since other muscles can help compensate and help out with trunk rotation.
Hypomobility vs. hypermobility
Hypo = joints are not as flexible, limited ROM, ligaments too short or tight (stiffness, pain, contractures)
Hyper = joints are flexible, lots of ROM, ligaments can stretch more than normal. (Hyperextension)
Explain "changes over time"
It's IMPROVEMENT / PROGRESS in therapy. If you do a measurement and ROM improves 1 degree over a week, is that really a big difference? You need larger "changes over time" to ensure pt is progressing for their confidence and goals, to ensure PT (clinician) is making a difference and interventions are working, for documentation, and for insurance compensation.
1) What is Minimal Detectable Change (MDC)?
2) Does a change in measurement need to be more than MDC to be significant
3) Does MDC provide clinical significance?
4) Are MDC and MCID different?
1) Smallest amount of change an INSTRUMENT can accurately measure that corresponds with a noticeable CHANGE in patient's ability. So it is NOT just SEM or measurement error of the clinician, but enough of a change (minimal detectable change) recorded to indicate some progress. May NOT yet be MCID or important enough change to Dr. and patient to suggest improvement, but it at least is MDC enough that it is NOT DUE TO MEASUREMENT ERROR /SEM.
3) No (****)
Not every research article gives you minimal detectable change (MDC), but if you have standard error of measurement (SEM), can you find MDC?
MDC = SEM * 1.96 *√2
1) If the MDC is 4, then if you measured a 3, what does that tell you? But if you measure a 5, what does that tell you?
2) Is a 5 measurement with a MDC of 5 indication of a MCID?
3) So, a measurement outside / above the MDC tells you what?
3: probably a measurement error (SEM), or no improvement
5: change is probably NOT due to error (SEM), but shows improvement.
2) No. MCID could be an 8 for the Dr. and patient to really care or see progress.
3) It tells you it is significant and NOT due to measurement error. But may not be to the point of MCID or clinical significance yet though.
What would be the better test:
Reliability of ICC = 0.85, and a MDC of 8 degrees
Another reliabilitiy of ICC = 0.96 with MDC of 4 degrees
ICC of 0.96 is much better, and MDC of 4 degrees is much much better.
1) What is Minimal Clinically Important Difference (MCID)
2) Give examples:
Pain and ROM MCID's:
1) Smallest difference that CLINICIANS and especially PATIENTS would care about to show actual IMPROVEMENT/progress/healing. So what amount of improvement (measure) is actually significant.
2) Example is PAIN. Range from 0 to 10, and if pain goes down from a 6 to a 5.5 … that is MINIMAL difference (Dr. and patient don't care or see real difference). For it to be a significant difference, it needs to be more than 2 points. So from 6 to 5.5 is not significant, but from 6 to 3.5 is a big difference in pain (or whatever you are measuring).
• MCID Pain scales = 2 points
• MCID for ROM range = about 5 degrees
***** So if someone comes in and pain doesn’t decrease by 2 points, or ROM doesn’t change by 5 degrees, then no real change or won't exceed MCID.
What is the ceiling and floor effect?
Ceiling effect: instrument does not register a further increase in score for high scoring individuals. UNABLE TO DETECT HIGH PERFORMERS. High bar (standard) is set too low.
Floor effect: instrument does not register a further decrease in score for low scoring individuals. UNABLE TO DETECT LOW PERFORMERS. Low bar (standard) is set too low.
The highest or lowest score you can get on the scale
There might be a scale for ADL (Activities for Daily Living) or a Sports scale. I might score high on ADL but score low on Sports scale.
So if outcome measure has a low ceiling, then you don’t have a way to improve.
Can a study have statistical significance but no clinical significance (and visa versa)?
Do we care more about clinical significance or statistical significance?
Yes ... and Yes
Ideally we want both, but we for sure care MORE about clinical significance.
- Null Hypothesis vs. Research Hypothesis
Statistical Significance Values:
- p values (and measurement value)
- Type I and Type II errors
- Precision (and CI's)
Explanation of Clinical Significance is down a few slides.
- Research Hypothesis (or alternative hypothesis) is Ha = independent variable will cause a change in the dependent variable.
Null Hypothesis is Ho = independent variable will NOT cause a change in dependent variable (have NO effect or NO correlation).
- p values: a small p-value (typically ≤ 0.05) indicates strong evidence AGAINST the null hypothesis, so you REJECT the null hypothesis (it means the IV is impacting DV). A large p-value (> 0.05) indicates WEAK evidence against the null hypothesis, so you do NOT reject the null hypothesis (low correlation between IV and DV). Usually it is less than 0.05, but it tells us NOTHING about clinical significance.
- Type I errors: FALSE POSITIVE ('you are pregnant' to a man). You REJECT null hypothesis when it was true. (These are less common)
- Type II errors: FALSE NEGATIVE ('you are not pregnant' to a pregnant woman). You do NOT reject null hypothesis when it was false. (These are more common)
Ho True Ho False
Reject Ho Type I Correct
Do NOT Reject Ho Correct Type II
- Precision: Precision of measurement is how confident one is in the ACCURACY (reliability is the reproducibility and consistency ... where precision is the pin point accuracy repeated) of repeated measures, or STANDARD ERROR OF MEASUREMENT (SEM) in the unit of measure. But CI is the confidence interval and says 68% / 95% / 99% of the data will fall in this range.
1) What is difference between variability and variance?
2) Is variability the same as variance?
VARIABILITY is the extent to which data points in a statistical distribution or data set diverge/vary from the average, or mean.
VARIANCE is the average of the SQUARES of the deviations ... the STANDARD DEVIATION.
2) No. Variability is the different types of variability (range, mean) from the mean. Variance is the squared measurement of those variable numbers.
Measuring data collected can be classified or categorized into one of 3 types:
*** Categorical: Gender, blood type, injury, married, etc.
- Order does NOT matter
*** Ordinal: Order of numerical classification is important
- ORDER MATTERS
*** Continuous: Data is on scale that can be continuously broken down
- Weight -> Height -> Age
- SPin vs. SNout:
- Odds Ratio (OR):
- Likelihood Ratio (LR):
- Sample size:
- Effect Size (cohen's d =):
- Numbers Needed to Treat (NNT):
- Relative Risk (RR):
Relative Risk Reduction (RRR):
- Absolute Risk Reduction (ARR):
- CER and EER:
Has condition Doesn't have cond.
Positive Test Score True Pos False Pos
Negative Test Score False Neg True Neg
Test + A B
- C D
- SENSITIVITY: is the ability of a test / intervention to CORRECTLY identify those WITH the condition (disease/diagnosis/injury). The true positive rate above. HIGHER scores close to 1 are better (so a sensitivity of 0.87 is a really good test to rule IN those with the condition if test is positive).
- SPECIFICITY: is the ability of the test / intervention to CORRECTLY identify those WITHOUT the condition (disease/injury). The true negative rate above. Higher score close to 1 the better. (so a specificity of 0.87 is a really good test to rule OUT those withOUT the condition if test is negative).
Sensitivity = HAVING THE CONDITION (True +)
Specificity = NOT HAVING THE CONDITION (True -)
**** "SPIN and SNOUT." SPIN stands for "SPecific tests rule IN the condition when they're POSITIVE." SNOUT stands for "SeNsitive tests rule OUT the condition when they're NEGATIVE." ******
- Odds Ratio (OR): a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure/intervention, compared to the odds of the OUTCOME occurring in the absence of that EXPOSURE. OR = (odds of developing disease in exposed patients) / (odds of developing disease in unexposed patients)
OR = 1 implies that the event is equally likely in both groups
OR > 1 implies that the event is more likely in the exposed group
OR < 1 implies that the event is less likely in the exposed group, and more likely in the unexposed group
- Likelihood Ratio (LR): likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. It provides a direct estimate of how much a test result will change the odds of having a condition. IT HELPS PREDICT THE POST-TEST ODDS. So if sensitivity and specificity are high, then you'll get a high +LR, which will give you a high post-test probability which tells you the test will most likely help you know what condition/diagnosis is (rule it in).
+LR = sensitivity / (1-specificity)
+LR > 10 is a test result with a LARGE effect on
increasing the probability of having the
+LR between 5-10 is a test that has moderate effect on
increasing the probability of having disease/condition
+LR < 5 indicates a small effect on increasing the
probability of having disease/condition
-LR = (1-sensitivity) / specificity
-LR < 0.1 indicates that the result has a large effect on
decreasing the probability of having the
-LR between 0.1-0.5 indicates that the test has a
moderate effect on decreasing the probability of having
-LR > 0.5 indicates a small effect on decreasing the
probability of having disease/condition
- A likelihood Ratio of 1 (or close to 1) means that THIS TEST HAS VERY LITTLE INFLUENCE ON THE FACT THAT THE PATIENT DOES / DOES NOT have the condition. It means the test has a bad sensitivity and specificity. In other words, the test was USELESS.
- A likelihood ratio of greater than 1 indicates the test result is associated with the disease. A likelihood ratio less than 1 indicates that the result is associated with absence of the disease.
- Sample Size: how large the group is being studied. Obviously the larger, the better (means more statistical significance)
- Effect Size: Don't need to calculate, but it is Cohen's D. It is NOT sample size, but size of difference of INTERVENTION between two groups. Effect of the intervention. Small effect size is d = 0.2, and large effect size is d = 0.8 +. Effect size is the magnitude, or SIZE OF AN EFFECT of an intervention, or size of difference between two groups or interventions.
- Numbers Needed to Treat (NNT): The NNT is the average number of patients who need to be treated to prevent one additional bad outcome / number needed to be treated for one of them to benefit from intervention (it is the inverse of the absolute risk reduction ... or 1/ARR). Number of patients that must be treated in order to achieve one additional favorable outcome
NNTB = Numbers needed to treat to get a Benefit. NNTH = Numbers needed to treat to cause Harm
It's the inverse of ARR (so NNT = 1/ARR)
- Relative Risk: Risk in the experiment (E) group / risk in the control (C) group. So: EER/CER. Risk of developing disease for people with known exposure compared to risk of developing disease without exposure.
The ratio of ... risk in the exposed group (cases) to the risk in the non-exposed group (controls). Exposed group risk / nonexposed group risk. Or … risk in the experiment (E) group / risk in the control (C) group. So: EER/CER
A RR = 1 means risk is equal in both groups.
If RR > 1, the risk in the exposed group is greater than the non-exposed group (positive association)
If RR < 1, the risk more in the non-exposed (negative association)
Ex: Smokers (exposed group) have a RR of 1.61 to developing CVD to non-smokers (non-exposed group) ... or smokers are 1.61 times more likely to get CVD than non-smokers.
Relative Risk Reduction (RRR):
Percentage that the treatment reduces risk compared to control. RRR = (1-RR) * 100 (goal is to get it to 100%).
75% RRR means 75% less likely to have an ACL tear if they do this program, or you’ve reduced the risk by 75%
- Absolute Risk Reduction: CER - EER. Risk in control group - risk in treatment group. If 100 children were treated, 8 would be prevented from developing bad outcomes. Another way of expressing this is the number needed to treat (NNT). If 8 children out of 100 benefit from treatment, the NNT for one child to benefit is about 13 (100 ÷ 8 = 12.5).
The absolute arithmetic difference in event rates between control and experimental groups. Decrease in risk of treatment in relation to a control treatment. ARR= CER−EER (control group event rate minus experiment group event rate … or risk of control group minus risk of experiment group).
It is the inverse of NNT … so 1/NNT
- CER: Control group event rate
- EER: Experimental group event rate
Ok websites to get checklists for article review / appraisal:
PEDro scale checklist: https://s3.amazonaws.com/libapps/
1) Effect Size means what:
2) Ideally you want your effect size to be close to what number?
1) The size or strength of the intervention. Response to intervention or strength of treatment effect. You want a large effect size to show the intervention/treatment did make a difference in experimental group.
2) Effect size closer to 1 shows the intervention had a LARGE effect on participants.
Ceiling and Floor effect
Ceiling: A ceiling effect is said to occur when a high proportion of subjects in a study have maximum scores on the observed variable
Floor: In statistics, a floor effect (also known as a basement effect) arises when a data-gathering instrument has a lower limit to the data values it can reliably specify.
Ceiling effect: instrument does not register a further increase in score for high scoring individuals. UNABLE TO DETECT HIGH PERFORMERS. High bar (standard) is set too low.
Floor effect: instrument does not register a further decrease in score for low scoring individuals. UNABLE TO DETECT LOW PERFORMERS. Low bar (standard) is set too low.
If outcome measure has a low ceiling effect or measure, then you don’t have a way to improve. If most people are getting high scores then I can't really find way to see high performers.
Remember the ICF model .... review the main topics/boxes in this:
Health Condition (disability / disease)
Body Function / Structure (impairments)
Environmental (stairs, family)
Personal (Age, Gender, status)
1) When you see ICC it means what?
2) If you see a K (or k) of 0.60 - what does that mean?
3) So if you see a MDC of 4.5 degrees ROM and you measure a 4.1 ... what does that tell you? If you measure a 4.8, what does that tell you?
4) With PAIN, a MCID significant change needs to be more than about how much?
5) With ROM, a MCID significant change needs to be more than about how much?
6) If most people are getting high scores and I can't really find way to see high performers, this is called:
7) If most people are getting low scores and I can't really find way to see low performers, this is called:
8) If you see d= 0.2 or d = 0.8 and d = -0.3 ... it means what?
9) What is a GOOD +LR =
10) What is a GOOD -LR =
*** 11) A likelihood Ratio of 1 means what?
1) Reliability (consistency). Closer to 1 means the test was more reliable. It is consistency between intrarater or interrater testers, or between multiple tests done in the same day or over different days. You want consitency.
2) K is the Kappa statistic which is AGREEMENT BETWEEN TESTERS (INTER-RATER AGREEMENT between two PT's). It says two different PT's AGREE at 0.60 rate for this test/intervention. The higher to 1 the better or more agreement, closer to 0 is more chance / less agreement.
3) 4.1 means probably error in measurement
4.8 says measurement wasn't an error, but notable improvement is measured (but not MCID yet).
4) 2 points
5) 5 degrees
6) Ceiling effect
7) Floor effect
8) d = effect size. A large effect size is close to 1, small effect size is closer to 0. A negative effect size indicates a decrease. Larger the effect size, means the larger the difference in the effect of the intervention.
9) Value greater than 10
10) Value less than 0.1
11) A likelihood Ratio of 1 (or close to 1) means that THIS TEST HAS VERY LITTLE INFLUENCE ON THE FACT THAT THE PATIENT DOES / DOES NOT have the condition. It means the test has a bad sensitivity and specificity. In other words, the test was USELESS. A likelihood ratio of greater than 1 indicates the test result is associated with the disease. A likelihood ratio less than 1 indicates that the result is associated with absence of the disease. Tests where the likelihood ratios lie close to 1 have LITTLE practical significance as the post-test probability (odds) is little different from the pre-test probability. In summary, the pre-test probability refers to the chance that an individual has a disorder or condition prior to the use of a diagnostic test. The Likelihood ratio then factors in sensitivity and specificity to help you know or predict the post-test probability to help rule in / out a condition. So LR impacts the post-test probability.
Ok, imagine you have that 4x4 table. A and B are top row (reading left to right). C and D are bottom row (reading left to right).
Test + A B
- C D
How do you calculate:
PPV: a / a+b
NPV: d / c+d
Sensitivity: a / a+c Specificity: d / d+b
- +LR: Sensitivity / (1-specificity)
- -LR: (1-Sensitivity) / Specificity
PPV: probability patient with a POSITIVE test actually HAS the disease
NPV: probability patient with a NEGATIVE test actually has NO disease
LR: likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that that same result would be expected in a patient without the target disorder
Difference between Mean, Median, Mode:
Median: # in the middle
Mode: # that occurs the most
Explain difference between Sensitivity and Specificity
Explain difference between PPV and NPV
Sensitivity: That test results will be POSITIVE in patients WITH the disease.
Specificity: That test results will be NEGATIVE in patients WITHOUT the disease
PPV: Probability patient with a positive test actually has the disease.
NPV: Probability patient with a negative test actually doesn't have disease
Numbers needed to treat:
Number of patients that must be treated in order to achieve one additional favorable outcome / or prevent the bad outcome.
What is PEDro:
What is the PEDro scale
The Physiotherapy Evidence Database, abbreviated PEDro, is a bibliographic EBP database containing randomized trials, clinical practice guidelines, and systematic reviews in the field of Physical Therapy.
The scale is a checklist you can use when reviewing research to ensure the research is valid and good research (prove it's validity). The scale has you look for thing in the research like: eligibility criteria specified, subjects randomly allocated, blinding, etc.
Below are flashcards from the Treatment Outcomes power point
What is the NAGI disablement model
(how to remember)
Pathology -> Impairment -> Functional limitation -> disability -> quality of life decreased
(NAGI is big elephant on Zootopia who "remembers" everything ... but really she is disabled)
In the past, outcome measurements have focussed on what. Give example:
Now, we need outcome measures to focus on:
Measurable outcomes in physical therapy have traditionally focused on IMPAIRMENTS
- Impairment = Limited ROM
- Treatment goal = increase ROM
- Outcome measure = amount of ROM restored
Paradigm shift underway to not only measure outcomes and impairments, but how those impairments lead to functional/activity limitations, which impact participation restrictions, which impact quality of life. So more than impairments, we need outcome measures also to focus on:
- Functional limitations (activity)
- Participation restrictions
- Quality of Life
1) What are some Outcome Measurements:
2) Difference between objective and subjective outcome measures:
3) T or F: Clinician will focus more on impairments, disability, disease (objective measurable items). Patient will focus more on function, activities, and quality of life (subjective items).
4) Another term for these outcome measures the Dr would use, and the patient would use:
5) So how do we marry both the clinicians need to document objective findings (impairments) to get reimbursed, and the patients desire to see improved function / ADL's? Give examples:
6) T or F: Clinician typically reports patient progress higher than the patient
- Functional movements
- Pain / inflammation
Objective: something you can measure (ROM)
Subjective: patient report of their perceived improvement. Or can 'walk up stairs better' is really subjective, or pain is better.
4) DOE vs. POE: Disease oriented evidence vs. Patient oriented evidence.
5) Ensure our documented progress and GOALS includes DOE and POE. So, how ROM limitations are tied to a ADL or function the patient cares about. Ex: It's not about limited shoulder ROM, it is about being unable to lift arms up to put groceries away in the cupboard, or wash hair in the shower, etc. It isn't just about limited knee flexion, it is about not being able walk up stairs and get down on ground to play with grandkids.
1) What is SF-36 Questionaire
- What are Region-Specific Health Questionnaire's.
- What are some examples:
3) The OSWESTRY form is for what
4) DASH stands for =
5) What is SANE:
Examples of a SANE:
6) What is FABQ
1) SF-36 is a GENERIC 36-item, patient-reported survey of their OVERALL patient health. Asks about physical function, body pain, emotional health, mental health, etc. So the questionnaire asks how a patient's overall health is (in whatever category) on a rating scale compared to when they started treatment.
- Same concept, but questions are TARGETED to a specific body region or issue or injury.
- Examples: FAAM, DASH, LEFS
3) OSWESTRY = LBP
4) DASH: Disabilities of Arm, Shoulder, and Hand
5) SANE = Single Assessment Numeric Evaluation. How would you rate your pain. What % of normal function is your joint experiencing
Likert Scale = pain scale from 1-10 "where is your pain" (rate your pain, mood, function, etc.)
VAS = Visual Analog Scale, a VISUAL way to rate your pain or improvement or exhaustion (happy faces)
6) FABQ = Fear, Avoidance, Beliefs Questionnaire. A questionnaire Dr's can give patients in pain ... the scale quantifies the person’s beliefs about the adverse consequences of movement with general activities and how it relates to work.
1) What are the 2 subsets of Construct Validity:
2) Difference between Convergent and Discriminative Validity
3) Do you need both in order to obtain Construct Validity?
1) Convergent and Discriminative Validity
2) To establish CONVERGENT validity, you need to show that measures that should be related are in reality related.
To establish DISCRIMINATIVE validity, you need to show that measures that should not be related are in reality NOT related.
3) Yes. You need both.
Why did they do this study on 5STS:
See if 5STS is a reliable and valid test/measure to determine LE strength and Exercise Capacity in COPD patients.
Below are flashcards on the Librarian lecture
What are Controlled Vocabularies
There are SO many terms that are inter-related and similar. Controlled Vocabulary takes one term and uses it as the SUBJECT HEADING and everything related to it falls under that subject heading.
Ex: You can search "child" in the search bar, but it would fall under the subject heading of "pediatrics"
What are the Boolean operators
Which one makes your search NARROWER
Which one makes your search BROADER
Boolean operators: AND, OR, NOT
o AND makes it narrower (cause it has to include both)
o OR makes it more broad and includes more (can use either)
Using an * in your search does what
Use an asterisk (*) in your search. It is a wild card. So, if you want to search swim, swimmer, swimming, etc. you'd search "swim*" and it will include all of it.
o therapy or therapies could be: therap*
o child* for child and children or children's
o teen* for teen and teenager and teenagers
o YOU CAN ALSO DO THIS FOR COMMONLY MIS-SPELLED WORDS
Is it better to search 1 or many databases?
And what are some common databases?
PEDro is what?
What is RefWorks
MANY. Always best to search many databases
PUBMed and MedLine and CINHL and EBSCO are huge databases to find articles
PEDro is Physiotherapy Evidence Database specifically for PT students / PT's to find PT related research.
Bibliographic tools … or electronic file cabinet. A bibliography is your list of all the articles you used / referenced / read / sited for or in your paper.
Helps you bring all your references you've found into one site to use as references in your paper / research … and then you can use Microsoft Word to write paper, and it will then help you site those articles in your research paper.
What was the main point of the 5STS test as a functional outcome measure of COPD
1) Test-retest and interobserver reliability of the 5STS
2) Convergent validity with established measures of exercise capacity and lower limb muscle strength, health status and composite indices of mortality in patients with COPD
3) Discriminative validity by demonstrating significantly reduced exercise capacity and lower limb muscle strength in patients unable to complete the 5STS
4) Responsiveness of the 5STS to pulmonary rehabilitation
5) MCID of the 5STS in patients with COPD
What were some of the limitations of this 5STS study:
- Not clear if the same participants were used in each portion of the study
• Reliability, cross-sectional, pulmonary rehabilitation
• Arbitrary score of 60s for individuals who could not
• All subjects completed test in 120 sec
• Pulmonary rehab program not well described or referenced
• High drop out rate from pulmonary rehab
• No learning effect? Only performed 5STS two times
Below are flashcards on 'Case Reports'
What is an extended baseline
You extend baseline to make sure they actually have the condition. If someone gets a cold and drinks a juice and then gets better in 2 days - was it because of the juice or just the immune system got them better?
Is a case report retrospective or prospective
Is a case-control study retrospective or prospective
Can be both
Does a CASE REPORT study just a person?
Which one is experimental ... a case report or a case-control study?
Who are the 'cases' and who are the 'controls'
Case report just studies a person, case study studies an experiment among many participants.
Cases = those with the conditon
Controls = those without the condition
1) Let's imagine a pyramid with all the different types of studies we can do. Order them from the bottom (worst) to the top (best):
T or F: Each of these have pros and cons
Randomized Controlled Trial ***** (gold standard)
Case Control Study (retrospective)
1) What is a Case Report:
2) What are pros and cons of a case report?
3) MUST READ THIS ... how case report spur other research
1) It is an article that describes and interprets findings from a single case / intervention / study with ONE patient. It has no control group, no experimental factor, etc. Just studies one person's specific case / intervention / outcomes. It uses ICF model to study a single person's case.
2) Con's: they give us no statistical proof, no external validity, they don't compare groups/interventions, there is bias, there is no cause and effect comparison, can't generalize or add validity, etc.
Pro's: But typically they are what spur other more advanced research, share clinical experiences, develop hypothesis to test, suggest an area of further research.
3) A case report was done in 2000, then other big clinical studies followed as a result of the initial case report. It starts with someone doing a case report, and other people read it, get interested, and then study it more.
*** Just because there is NOT evidence for something, does NOT mean it does NOT work. Remember - the key is, did your patient get better? And … Just because there is evidence does not mean all evidence is good evidence that needs to be applied.
1) What is the best type of study ... what is worst
2) What is a Meta-Analysis:
3) What is a Systematic Review:
4) So what is best single study for a control group?
1) Systematic review with a Meta-analysis = best
Case Report = worst
2) A meta-analysis uses a statistical approach to combine the results from multiple studies in an effort to increase power (over individual studies).
3) A systematic review answers a defined research question by collecting and summarizing all empirical evidence that fits pre-specified eligibility criteria. A meta-analysis is the use of statistical methods to summarize the results of these studies.
4) For clinical trials with control groups thought, RCT's are "Gold Standard"
Is a high +LR better or worse. It tells us what?
So remember the NOMOGRAM with Pre-test Probability % on left, LR in middle, and Post-test Probability # on right ... explain it
High +LR is better. It provides a direct estimate of how much a test result will change the odds of having a condition. It tells us the sensitivity and specificity are high, which will result in a higher post-test probability to know if intervention will help rule in or out condition.
If pre-test probability is 24% and +LR is 8.5, then post-test probability % will be high (75%) cause of high +LR.
What is pre-test probability:
o It is the general prevalence in the population (you find this info from the literature / research articles)
Positive Pivot Shift:
o What does it mean? You are 8.5 times more likely to have an ACL tear (or whatever) if you have a LR+
o So now you have a 74% "sure" you have an ACL tear.
o So we are pretty sure at 74%, so maybe we should do another test to get that % higher and closer to 100% (although 100% will only come if you do surgery or a MRI to compare against gold standard and know for sure).
o What is "enough" post-test probability? It depends. It is a judgement call. Closer to 100% the better obviously.
So what is post-test probability
How high do you want this to be?
What is 100% going to come from?
Probability that the condition is present after considering pre-test probability and likelihood ratio of a special test.
As close to 100% as you can
Surgery, MRI, some 'gold standard'
Let's review ... what is MDC
Is MDC clinically relevant/significant?
What is MCID
MDC: Smallest amount of change an instrument can accurately measure with it being significant enough to not be considered error in measurement.
- Changes must exceed MDC to be beyond measurement error
- Does not provide context to clinical meaningfulness
MCID: Smallest difference that clinicians and patients would care about to consider actual improvement.
A review ... read about pros and cons of case reports
** Case reports are MOST applicable to patient care since it was used during patient care (vs. some randomized control trial or research study). They are so applicable since it is a real setting with patient and PT, where a controlled trial is in a controlled study environment with protocols and a control group.
Case reports are LEAST RIGOROUS. We sacrifice the internal validity because there were probably a lot of other factors that played into getting the results that were not controlled. There are less controls than a randomized control study.
Case reports represent that person/patient in that situation - and do not have external validity and represent the population. But do randomized controlled trials really represent each person? So case studies and randomized control studies are both good and have purpose, just for different reasons.
o Thus, there is no such thing as a perfect study. They each have good and bad - you as clinician need to just judge and apply what is good for you and your patients.
But main benefit of a case study is it helps spur other research.
T or F: Quality guidelines do not exist for case reports
T or F: ICF model is the typical structure for a case report
T or F: You can determine a cause and effect with a case report
T or F: There is a control group with a case report to compare interventions against
Review the Case Report Format ... what are main sections, and what is in each:
- Purpose statement and background
- Review of literature .. what research is out there
- Quick overview of methods, results, and discussion/conclusion
- CONVINCE reader the topic is important and why you are doing it.
- Describe patient (demographics, PMH, diagnosis)
- Explain in detail what was done (clinical decision making and interventions / treatment) so others can replicate
- Detail the intervention done
- Inclusion/exclusion criteria
- Provide the data, statistics, and findings in tables, graphs, etc.
- Provide the facts and findings / outcomes ... not interpretation of facts.
Discussion and Conclusion:
- Provide context and interpretation to the results
- Compare it to other research
- Relate to MDC and MCID
- Make recommendation to advance research
- NO making cause and effect leap with case reports
Below are a few flashcards on the article about Low Back Pain (LBP)
1) What was the purpose of that article?
2) What type of study was it?
3) Pros and cons of this approach:
4) Biggest con of this article on LBP:
5) How is a cohort study different than a randomized control group?
1) To explore LBP retrospectively in many patients to explore prognostic variables associated with RISK STRATIFICATION for patients with LBP who have received PT.
2) Secondary Analysis of a Cohort study (just looking at statistics from a past study)
3) It is good to find trends, you can get a LOT of data ... but bad because the researches just fit the data to what they want to study/prove.
4) They talk about impacting PT for LBP, but took data on everyone with any musculoskeletal pain. Not relevant data.
5) A cohort study is observing a cohort/group of people who already have condition (disease, intervention, drug) and studying them prospectively or retrospectively over time to see impact / correlation. RCT's are done where researchers give an intervention to an experiment group and compare against a control group to see different effects or see if exposure to some intervention causes the disease (or heals).
1) What is FOTO:
2) Why should we care about or use FOTO:
1) FOTO: Focus on Therapeutic Outcomes
A database of published articles that are peer reviewed that focus on FOTO ... therapeutic outcomes during rehab. It synthesizes research and data the researchers are interested in learning about for their study on how to impact clinical outcomes.
2) Why track this stuff? We need data and research to know how to improve care / outcomes, what treatments are working, etc.
What is regression
Regression is basically a statistical CORRELATION assessing relationships between variables (IV and DV).
They are just looking at relationship between these variables to PREDICT somebody's outcome.
What was summary of the LBP article:
Obviously lower risk patients (younger, no surgery history, payer type other than medicare/Medicaid, etc.) suggest or predict a better outcome with fewer visits. SO, certain criteria of patient data help determine the RISK reduction or HIGH risk of a patient on whether PT will help and they'll progress/improve. RISK STRATIFICATION
Major limitations of the LBP study:
Positive aspect of the study:
It was retrospective (just looking back in time)
It was a secondary analysis of a cohort study (= manipulation of data). It was just data re-synthesized to what authors wanted. And data was self reported by patients.
It was NOT really about LBP when they include all the musculoskeletal issues.
Positive: Large dataset. 6,379 was a lot of people to study
Below are flashcards on 'Outcomes Database' powerpoint
Why track clinical outcomes
We absolutely need to track outcomes to look back at the data to identify patterns, see where we can improve, see what works, see patient progress, insurance companies won't pay unless you do, avoid certain interventions that don't have good outcomes, etc.
What is a FFS model?
Why is FFS good and bad?
A FFS is fee-for-service .... is just a way to crank out as many appointments as we can since volume = money. It's about QUANTITY over quality.
A FFS is good cause it is efficient, cost effective, and revenue producing. But just cause you see a lot of people doesn't mean they are getting good quality care. So insurance says they want to see performance (which is why you have to track outcome measures).
What is a MIPS:
Pros and Cons to this payment management system:
Merit-Based Incentive Payment System:
Payment for services based on performance (not just fee for service)
- Quality matters based on several categories.
- The downside of this type of system is … there are factors out of your control that limit if/how much someone can improve.
What is Risk-Adjustment:
Accounts for variabilities that influence outcome measures in patients. Examples: Age, acuity, comorbidities, medication use, etc
You can't compare the 80 yr. old woman s/p to a 18 yr. old male athlete. They will progress differently - so their outcome measures and speed of progress is so different, so there is a risk adjustment based on various factors.
As a PT, it helps you adjust your prognosis and goals for patient dependent on circumstance of patient.
You change treatments / interventions dependent on the patient. A 25 yr. old with LBP and 75 yr old woman with LBP will be treated very differently.
What is the PT Outcomes Registry:
It is DATA from the profession, for the profession. It takes data from EHR's to provide data to help PT clinical decision making.
- A centralized registry … data from the profession for the benefit of the profession.
- A database where we can all get together and determine what we are doing well, and how we can improve, what interventions work, what outcome measures to implement.
- Data that helps dictate our clinical decisions (instead of your bias and experience)
- And you can use the data to see where you are strong and/or weak based on outcome measures of your patients. Use data to help you know what you can improve, how clinic can improve, as a PT or clinic, focus on what you are good at.
- In short, use EVIDENCE BASED PRACTICE where this registry helps guide what you do in practice based on data, clinical findings, etc.
Let's say an article showed pain scale go from 3.2 to 1.6 (difference is 1.7 between the two). Is it clinically significant if MCID is 1.5?
What if MCID was 3.8?
Usually with PAIN you want to see a 2 point difference (that is the typical MCID). But if MCID is 1.7, then YES it clinically significant.
If MCID is 3.8, then this is NOT clinically significant.
Typical pain MCID is:
Typical ROM MCID is:
If p value is less than 0.05 it means what
If p value is more than 0.05 it means what
P value of less than 0.05 is statistically significant ... means you can reject null hypothesis and there is a correlation.
If above 0.05, you can NOT reject null hypothesis, and correlation between variables probably does NOT exist.
What is FFS:
What is MIPS:
Fee for service payment system. Clinicians are paid on VOLUME, not on value provided.
A Merit-Based Incentive Payment System ... more based on quality of care provided, improving patient abilities, advancing care, etc.
Below are the flashcards for the last lecture on "EBP Overview"
There are 3 somewhat competing and yet should be overlapping aspects to clinical decision making. What are they
T or F: Sometimes people use statistics as a drunk uses a lamppost. Explain
True. Researchers use stats to verify and SUPPORT what their pre-conceived hypothesis is, factual or EBP or not.
Remember the A's of EBP
Define clinically relevant question (ask)
Search for the best evidence (acquire)
Critically appraise the evidence (appraise)
Apply the evidence
Evaluate the performance of EBM (assess)
Remember the example of the parachute ... what was the lesson about EBP
It speaks to the point of … do we need evidence for everything? Some things we don't need to do evidence for, right!
Is it true that we as Dr's sometimes do things NOT because there is evidence per se, but because it helps the patient?
Dr's keep doing treatments that have no evidence that they actually help. Examples: injections for back pain, or OA (which never goes away). Sometimes Dr's see something working in their practice despite what is published nationally. And, it takes a LONG time to change insurance and procedures (lag time) to change things. Maybe arthroscopic surgery does nothing for OA long term, but if it helps in the short term relieve pain then that may be important to patient.
Lag time for when research comes out to when it is broadly accepted
Lag time is about 10 years from when something is published to actually see the industry adopting it.
**** Key to remember about EBP (for me, not for this class) .... remember conversation with Dustin who got his PhD. How he said people do "research" and publish statistics to prove their point, but it often is hyped, overinflated, biased, etc.
So take research articles with a grain of salt.
EBP should be used and integrated, but never replace clinical practice, experience, and reasoning.
But, you'll find a lot of research on many topics, and some of it confirms an intervention works, and other studies actually refute that.
Most researchers fudge numbers to prove their hypothesis anyway.
Are there times you reject the clear cut evidence and go against it?
The evidence shows that walking is the BEST treatment approach for improved endurance with those with COPD, but if a patient has bilateral hallux valgus (feet hurt), they won't walk. So you disregard the evidence and put them on a bike. Why? Because if they have to walk, they will NOT do it and thus won't improve. So for that patient, you do a bike because it won't make their feet hurt, they thus will do it, and improve.
Point: Be willing to adapt. Follow evidence, but be adaptable.
1) Does a test's reliability imply validity?
2) Does validity imply reliability?
1) Not necessarily.
Reliability does not imply validity . That is, a reliable
measure is measuring something consistently, but you
may not be measuring what you want to be measuring.
2) Validity implies reliability but, again, not vice versa.
What is risk stratification
Risk stratification is a tool for identifying— and predicting—which patients are at high risk—or likely to be at high risk— and prioritizing the management of their care in order to prevent worse outcomes.
The process of separating patient populations into high-risk, low-risk, and the ever-important rising-risk groups is called risk stratification. Having a platform to stratify patients according to risk is key to the success of any population health management initiative.
What would a secondary analysis study be:
Secondary analysis is the re-analysis of either qualitative or quantitative data already collected in a previous study, by a different researcher normally wishing to address a new research question
Explain what DOE and POE are:
DOE: Disease oriented evidence
POE: Patient oriented evidence
DOE outcome measures provide info about a patient's pathology (what the Dr. cares about). Examples would be: blood sugar levels, hypertension, urine sample, ROM, etc.
POE are outcome measures the PATIENT cares about (quality of life, participation, activities, function, ADL's)
What entity created the ICF model?
So, the WHO model is what:
What is a similar, but different model to this one:
Which one is more universally used?
WHO: World Health Organization.
Nagi's model (explained above)
T or F: We should incorporate the ICF model into any patient / patient situation we encounter?
T or F: Clinicians want to use DOE outcome measures because they are objective and measurable?
T or F: Patient's don't really care about DOE outcome measures, they just care about POE or daily life functions?
T or F: If clinicians had to use a POE type reporting system to rate their patient's improvement, or subjectively assess their patient's outcome or level of disability, they will typically always rate it higher than the patient will?
1) Most common type of POE is:
2) Share some examples of these:
3) General or global health measures focus more on:
4) An example of #3 would be:
5) More effective health measure focus on:
6) Examples of #5 would be:
7) What are examples of "single item outcomes measures?
1) Patient self-report
2) There can be a one answer report verbally, or a form filled out. It can be a generic form, or region/injury specific self report on a form.
3) General overall health (not specifics)
4) SF-36, SF-12
5) Region specific, injury specific pathologies.
6) DASH, UEFS, AKP (anterior knee pain scale), LEFS, etc.
7) Clinician asking "how are you" or "rate your pain from 1-10" - A SANE like a RPE likert scale, pain from 1-10, how are you, rate this, etc.
1) The best clinical decisions implement 3 things:
2) T or F: The best research evidence ideally consists of patient-oriented evidence from well-conducted clinical trials?
- Clinician experience
- Patient input
The "gold standard" for CLINICAL trials are:
RCT's = randomized controlled trials
Is a p value of less than 0.05 statistically significant?
Does it mean the stats / data is clinically meaninful or useful?
Explain what a confidence interval is:
So a 95% CI means what:
Wider CI's are associated with larger/smaller standard deviations, and large/small sample sizes?
An estimate of dispersion, or variability, around a point or estimate (usually the mean).
We are 95% confident that the true population mean lies within this certain confidence interval (about the mean).
Wider CI's come from LARGER SD's and SMALLER sample sizes. (Opposite is true). So the MORE people you get in the study, the smaller the SD will be and thus smaller the CI will be.
What is an EFFECT SIZE:
Effect sizes greater than _____ are STRONG effects
Effect sizes lower than ______ are WEAK effects
Provides an estimate of the strength of a treatment effect ... so it is an indication of the meaningfulness of the results.
T or F: Peer-reviewed published papers are held in higher regard than professional presentations at professional meetings?
Why would presenting research at a professional meeting be advantageous over going straight to a peer reviewed published journal?
It allows the researchers to get feedback, be picked apart, and solidify argument. Then they can go back and refine research before submitting to article for review.
A few widely used known databases for research papers is:
Review the main sections of a professional research paper:
Discussion / Conclusion
T or F: Titles to papers should be short and brief?
Abstracts are usually how long?
What section of paper are the specific details listed so the research can be replicated?
What section provides the data, answers, graphs
False: They should be highly descriptive without being exceedingly long
1) What is VARIANCE:
2) What is the Standard Deviation
3) What is VARIABILITY:
4) 4 different types of variability
5) What is ANOVA:
1) Variance is the difference between a score/value and the mean.
2) Standard Deviation is the SQUARE ROOT of variance.
3 )Variability is the extent to which data points in a statistical distribution or data set diverge from the average, or mean, value as well as the extent to which these data points differ from each other.
4) There are four commonly used measures of variability: range, mean, variance and standard deviation.
5 )ANOVA: Analysis of variance (ANOVA) is a collection of statistical models and their associated procedures (such as "variation" among and between groups) used to analyze the differences among group means. ANOVA was developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance.
What is a T-Test:
T-tests are handy hypothesis tests in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test.
GOOD REVIEW OF EBP:
*** Most research is fabricated by researchers / professors / students / employees to try and get published, write a dissertation, etc. But, if I find a study that has a HIGH sensitivity with a HIGH specificity, and the sample size in the study was statistically significantly large. There has to be a high correlation, ICC, effect size, etc. There has to be NO bias, random sampling and allocation done, participants and researchers / therapists were blinded, and the study was replicated in multiple times in multiple settings. IF ALL OF THIS IS TRUE ... I have confidence that it is a valid study with enough data to replicate in my practice with patients. Otherwise … clinical experience typically will trump EBP.