Flashcards in Exam 2 / Final Deck (85):
Which of the following is NOT a purpose of case reports?
a. Build problem solving skills
b. Determine cause and effect
c. Develop hypotheses for research
d. Share clinical experiences
Answer is: B
a. This is one purpose of case reports as they help other clinicians with similar cases critically think
about the other treatments.
b. This is not a purpose of case reports and cause and effect cannot be determined because case
reports are typically retrospective and do not have a control group.
c. This is one purpose of case reports as researchers ask questions while reading reports to
d. This is one purpose of case reports as they are written about unique cases.
A study found that the most common injury in baseball players are labral tears. The results
yielded an ICC value of .88. Which of the following is a true interpretation of the results?
a. This study has excellent reliability.
b. This study has poor reliability.
c. This study showed that 88% of its participants had a labral tear.
d. This study showed that there was a ceiling effect of 88%.
a. An ICC value greater than .75 is considered to have excellent reliability. Therefore, this is the
b. An ICC value less than .40 is considered to have poor reliability. Therefore, this answer is
c. ICC has nothing to do with the percentage of participants that have labral tears. Therefore, this
answer is incorrect.
d. ICC has nothing to do with a ceiling effect. Therefore, this answer is incorrect.
The purpose of this prospective case series was to determine the MCID in Functional gate assessments
(FGA) for the older community-dwelling adults relative to patients’ and physical therapists’ estimates of
change and the extent of agreement between patients’ and physical therapists’ estimates of change.
After completing this case series, the kappa statistics score was 0.163 (weighted kappa = 0.163). The
MCID was 4 points, (SN = 0.66, SP = 0.84, LR+ = 4.07, LR- = 0.40). Interpret these findings.
a. A patient whose FGA score improved by 7 points would be considered to have a clinically
b. Prospective case series has higher level of evidence on the clinical research design compared to
a cohort study
c. The kappa value in this series shows a moderate level of agreement
d. The LR+ shows a strong shift in probability of the patient having the disease or complaint
a. This answer is correct because it applies the found MCID (4 points) into a clinical example. The
patient in this answer had a change of 7 points which is greater than the required 4-point
b. This answer is incorrect because a cohort study is higher on the hierarchy of studies compared
to the case series.
c. This answer is incorrect because kappa values do show agreement, however the level of
agreement is incorrect. The correct interpretation of the kappa value would be slight
d. This answer is incorrect because the positive likelihood ratio was 4.07 making it have a small
shift in probability.
An elementary school health fair takes the weight of a child at the fair three times, on the same scale
each time, within the hour he/she spends at the fair and gives an intraclass correlation coefficient of
0.76. This intraclass correlation coefficient indicates which of the following?
a. At minimum, 76 children needed to participate in this health fair to show any clinical meaning.
b. Not able to interpret the meaning of this value because a control group was not used.
c. The reliability of this scale measurement is good.
d. The weight of the child increased by 0.76 pounds during the three measurements.
a. Intraclass correlation coefficient does not provide any information about the number of children
who needed to participate in the health fair study to show clinical meaningfulness.
b. Intraclass correlation coefficient requires continuous data to calculate reliability, not a control
c. An intraclass correlation coefficient of >0.75 is deemed good reliability.
d. Intraclass correlation coefficient does not provide insight into changes in the specific weight
variable being measured.
Study A is evaluating the correlation between drinking an energy drink or sugar water with their
corresponding blood pressure. Group I will drink one 16 fl oz can of RockStar energy drink per day.
Group II will drink one 16 fl oz glass of water with 3g of sugar per day. Group III will be the control group
and will drink one 16 fl oz glass of water per day. Blood pressure will be measured using a blood
pressure cuff and stethoscope by a licensed physician at 5pm every day for five days. A volunteer sample
of 25 individuals from the University of Minnesota Biology department is randomly assigned to one of
the three groups. The average age of participants is 24.4 with an average pre-screen blood pressure
reading of 120/80. How would you describe the validity of this study?
a. The blood pressure reading measurement does not have face validity because the measurement
tools do not measure blood pressure.
b. The strength of the internal validity is increased because the researchers use independent
variables to measure a difference in a dependent variable.
c. This study has strong external validity because it uses a convenience sample, which allows the
results to be generalized to the real world.
d. This study is not internally valid because the study design does not allow the results to be
generalizable to the real world.
a. Having a licensed physician use a blood pressure cuff and stethoscope is the correct way to
measure blood pressure. Thus, this measurement has face validity
b. Internal validity deals with the study’s experimental design. Since the researchers are using
independent variables to manipulate a dependent variable, the study’s internal validity is
c. External validity has to do with how generalizable the results are to real world. Since the study
uses a convenience sample with seemingly healthy college-aged individuals, it is has weak
d. Internal validity does not refer to how generalizable the results are to the real world.
The following contingency table is provided in the results section of Study A:
Labrum Tear Present Labrum Tear Absent
Labrum Test Pos: A = 10 (true pos) B = 30 (false pos)
Labrum Test Neg: C = 20 (false neg) D = 15 (true neg)
The researchers state that the sensitivity for this Labrum test is 0.33. Is the sensitivity value correct?
a. No, sensitivity does not have to do with contingency tables.
b. No, because the sensitivity equation is A/(A+B), which equals 0.25.
c. Yes, because sensitivity is D/(B+D), which equals 0.33.
d. Yes, because the equation for sensitivity is A/(A+C), which equals 0.33.
a. Sensitivity is calculated using contingency tables for test results.
b. A/(A+B) equals 0.25. This is not the sensitivity equation.
c. This does not use the correct equation for sensitivity. This is the equation for specificity.
d. This is uses the correct equation to calculate sensitivity.
Study Corp is researching anterior knee pain in runners. Study Corp used the Visual Analog Scale during
their research and wanted to analyze the results. The participant responses are as follows:
• Mean 1 = 3.7
• Mean 2 = 6.1
• Pooled SD = 4.2
Using the equation d = (Mean 2 – Mean 1)/Pooled SD, the effect size of Study Corp’s research study
would be considered:
a. Correct. The effect size is equal to 0.57, and effect sizes are considered “moderate” if they are
between 0.5 and 0.8.
b. Incorrect. The effect size is equal to 0.57, and effect sizes are considered “small” if they are
between 0.2 and 0.5.
c. Incorrect. The effect size is equal to 0.57, and effect sizes are considered “strong” if they are
d. Incorrect. The effect size is equal to 0.57, and effect sizes are considered “weak” if they are
You create a new test to measure BMI and want to see how your calculated values compare to the
values calculated by the Bod Pod. What type of validity are you referring to?
a. Concurrent validity looks to see if a test measure correlates with a gold standard measure. In
this case, you are looking to see if your calculated values correlates with values found using
the gold standard measure (Bod Pod). The closer your value is to the gold standard value, the
higher the concurrent validity.
b. Content validity looks to see if a measure represents all constructs of the measure. For example,
an AP Physics Exam (measurement) represents all of an individual’s AP Physics knowledge
c. Face validity looks to see if a specific measure actually measures what it is designed to measure.
For example, IQ tests are supposed to measure intelligence. It would be valid if it accurately
measured intelligence. However, if an IQ test has a picture of a tennis ball missing from a tennis
court and asks what is missing, it could be biased against the poor, who may not have seen a
tennis court. This invalidates the test because it does not actually measure intelligence in some
d. Predictive validity looks to see whether a test can be used to predict a future score or outcome.
For example, tests could be administered to job applicants and after those individuals work this
job for a year, these scores could be correlated with their first-year job performance scores to
see if the test scores accurately predicted how well these individuals would perform in this job.
A Timed Up and Go (TUG) test was conducted on 2,985 patients in various national Parkinson
foundation centers across the country. An initial TUG test was performed and measured, followed by
two weeks of physical therapy, with a second test being conducted with a MCID of 1.8 seconds. This
MCID can be interpreted as indicating what?
1. A high intrarater reliability among the two tests, helping improve the validity of the study.
2. The agreed upon range of measurement error allowed between two tests.
3. The minimum amount of change in a patient’s score that ensures the change was not the result
of measurement error.
4. What the physical therapist or patient would consider as a smallest amount of change needed
to indicate improvement.
1. Intrarater reliability indicates consistency of data being measured and recorded by the same
individual over two or more trials/tests. The question asks about MCID, which is interested in
improving scores, not consistent measurements among two different tests.
2. This is talking about the Standard Error of Measurement (SEM), not MCID. SEM is simply is the
amount of error that you can consider as measurement error.
3. This answer is talking about Minimal Detectable Change (MDC), not MCID. The difference
between MDC and MCID is that MDC is the smallest amount of change that is not due to SEM,
whereas MCID is the smallest amount of change the clinician or patient consider significant
enough to suggest improvement.
4. This is the correct answer because MCID represents the smallest amount of change (or
improvement) in a measurement that a clinician or patient would consider significant enough
to indicate improvement.
In manual muscle testing, muscles are given a grade of 3 (Fair) if the patient can take the muscle/muscle
group through a full range of motion against only the resistance of gravity. In MMT, reliability has been
measured “…among examiners and in successive tests with the same examiner, the results should be
within one half of a grade (or within a plus or minus of the base grade)).” In a study, reliability for this
grade was reported as ICC= 0.43. What does this measure tell us about the reliability of this grade?
a. The reliability of the grade is poor.
b. The reliability of the grade is fair.
c. The reliability of the grade is moderate.
d. The reliability of the grade is good.
a. This is correct. Per ICC classifications, reliability is poor for < 0.50
b. This is incorrect. There is no fair category in ICC classifications.
c. This is incorrect. Per ICC classifications, reliability is moderate for 0.51-0.75
d. This is incorrect. Per ICC classifications, reliability is good for > 0.75
A 13 year-old patient comes in with ankle pain. The physical therapist utilizes the Ottawa Ankle Rule
guidelines to determine whether the patient has a fracture. The test comes out negative. Knowing the
Ottawa Ankle Rules has a high specificity but a low sensitivity, the PT can…
a. Can confidently rule in but cannot rule out the condition
b. Cannot rule in but can confidently rule out the condition
c. Cannot rule in and cannot rule out the condition
d. Unable to determine given the results
a. To confidently rule in, the test must be positive and have a high specificity.
b. To confidently rule out, the test must be negative and have a high sensitivity.
c. Since the test does not have a high sensitivity, the PT cannot rule out the condition and since
the test was not positive the PT cannot rule in the condition.
d. There can be a decision made based on the results.
There are several checklists that are used in clinical trials to help researchers follow guidelines and help
others critically appraise research. Which checklist allows quantification of the quality of a research
A study examined the five time sit to stand test (5STS) as a functional outcome measure for patients
with COPD. The results showed that the test-retest and inter-observer ICCs were 0.97 and 0.99,
respectively. The 5STS scores correlated significantly with other measures of function or impairment
such as the ISW, QMVC, SGRQ, ADO and iBODE (r=−0.59, −0.38, 0.35, 0.42 and 0.46, respectively; all
p<0.001). The MCID for the 5STS was determined to be 1.7 s in this population. Which of the following is
a correct interpretation of these results?
a. For each measure correlated to the 5STS, the risk of a Type I error is less than 1%
b. The smallest detectable change in measurements (not due to error) for the 5STS is 1.7s
c. The 5STS scores are more strongly correlated with scores of the SGRQ than the ISW
d. This test is considered to have good validity because the ICC scores are above 0.75
a. The p value describes the risk for a Type I error
b. This describes an MCD not the MCID for the test
c. An r of -0.59 shows a stronger correlation than 0.35 because +/- only describe the direction of
the linear relationship, not its strength.
d. ICC scores are a measure of reliability, not validity
A treatment effect is found to have moderate strength. What is a possible effect size?
a. Effect sizes between 0.2 and 0.5 are considered “small.”
b. Effect sizes between 0.2 and 0.5 are considered “small.”
c. Effect sizes between 0.5 and 0.8 are considered “moderate.”
d. Effect sizes above 0.8 are considered “strong.”
A study examined vegetarians and their frequency of ACL tears over two years. One of the participants
had a +LR of 3.57. A +LR of 3.57 indicated which of the following?
a. Strong probability of an ACL tear.
b. Moderated probability of an ACL tear.
c. Small probability of an ACL tear.
d. Very small probability of ACL tear.
a. For the answer to be a strong probability of having an ACL tear there would need to be a
positive likelihood ratio greater than 10.
b. For the answer to be a moderated probability of having an ACL tear there would need to be a
positive likelihood ratio between 5 and 10.
c. A small probability of having an ACL tear would mean the positive likelihood ratio would need to
be between 2 and 5. 3.57 fits this requirement and is the correct answer.
d. For the answer to be a very small probability of having an ACL tear there would need to be a
positive likelihood ratio between 1 and 2.
A new study looked at the effectiveness of aquatic therapy in treatment of knee and hip osteoarthritis.
In the study, the researchers determined an ARR=0.50 and an NNT=2 for the reduction of pain in the
aquatic therapy group. Based on this data, one could expect:
a. that 50% of the population in this study falls between the corresponding range of scores
b. to reduce knee pain in 1 out of 2 patients who participate in the aquatic therapy program
c. the absolute risk of knee pain was reduced by 20% for those that participated in the aquatic
d. the patient to improve their knee pain after completing two sessions of aquatic therapy
a. This answer deals with the confidence interval, which was not stated in the question and would
most likely be much higher than .50
b. This is the correct answer. This follows the direct meaning of the numbers needed to treat and
what can be determined from that.
c. Absolute risk deals with this answer, but it should be 50% not the 20%
d. If the tester is not sure of what NNT measures, then seeing the number 2 may lead them to
think that it deals with number of sessions needed.
A 38-year-old male comes into your clinic following a cervical injury surgery, who is looking to return to
activity. Through the interview portion of your assessment, you discover he is a professional football
player for the Indianapolis Colts. During your test and measures portion you observe the client
achieving the highest measurements possible and at times exceeding the instruments available range.
What phenomenon do you note?
A. Basement effect
B. Ceiling effect
C. Floor effect
D. Positive likelihood ratio
a. Basement effect is synonymous with floor effect; therefore, measurement is reading at the
lower limit or not being registered.
b. The ceiling effect refers to when there is a limitation of measurement where the score is at the
top or near the top of the maximum for the tool or test used for measurement.
c. The floor effect refers to when the achieved measurement would read lower than the
instrument can record or near the lower limit of the instrument.
d. The positive likelihood ratio is how much more likely a pathology is present following a positive
A physical therapist measured the ROM for a patient who was treated for a knee injury. The reliability
had an ICC of 0.67. What does this reliability value show of this particular measurement?
a. Estimates of reliability for measures of continuous data are not reported as ICCs.
b. Good chance of reproducibility.
c. Moderate chance of reproducibility.
d. Poor chance of reproducibility.
a. This statement is false. Estimates of reliability for measures of continuous data are typically
reported as intraclass correlation coefficients (ICCs), which range from 0-1.
b. ICCs above 0.75 are considered a good chance of reproducibility.
c. ICCs in-between 0.51-0.75 are a considered moderate chance of reproducibility.
d. ICCs below 0.50 are considered a poor chance of reproducibility.
A baseball player comes in to your clinic and presents signs and symptoms of a rotator cuff tear. You
decide to use a disablement framework as a basis for this case and you chose the ICF model. Which of
the following would be a possible participation limitation?
a. Lack of a support group at home
b. MRI confirmed tear in the right supraspinatus tendon
c. Pitching in a baseball game
d. Reaching for an item in an overhead cabinet
a. This answer represents an environmental factor. These are factors that can affect one’s home,
work and attitudinal environment.
b. This answer represents the impaired body functions and structures. This section includes altered
physiologic and anatomic structures/ functions.
c. This is the correct answer. Participation limitations include performance of activities in a social
setting such as a sporting event.
d. This answer represents an activity limitation, which are similar to functional limitations.
Following the Center for Evidence-Based Medicine (CEBM). What study design is considered to be level 3
a. Case series
b. Case control studies
c. Expert opinion and disease-oriented evidence
d. Prospective cohort studies
1a = Systematic reviews of RCT's
1b = RCT's
2a = Systematic review of a Cohort study
2b = Cohort study
3 = Case-control study
4 = Case report / case series
5 = Expert opinion
During a study conducted upon collegiate soccer players, the positive likelihood ratio of a new screening
technique for medial menisci tears was determined to be 11.3. The negative likelihood ratio is 0.08.
What is the most accurate statement concerning this new technique?
e. It is highly indicative that the target condition exists when a positive test occurs.
f. It is a poor indicator that the target condition exists when a positive test occurs, due to its low
negative likelihood ratio.
g. It is a poor measurement because when the patient tests negative for the condition, it is often
an erroneous result and the person has been misinformed.
h. It is difficult to say because the details of the incidence of the condition were not disclosed.
Rationale: Option A is most correct because a positive likelihood ratio of over >10 means that a positive
test is strongly correlated with an increased probability that the target condition exists. Option B is
incorrect because the negative likelihood ratio applies to a situation in which the examination findings
are negative. Option C is incorrect because a low negative likelihood ratio (<0.1) means that a negative
finding can correctly rule out that the condition is present. Option D is incorrect because while
prevalence is important for determining likelihood ratios, incidence is not.
A soccer player has received a left hamstring strain during practice for the second time this year. Due to
the injury he will not be playing in the team’s championship game in two weeks. This is an example of
which of the following:
a. Injury Rate
b. Injury Risk
c. Prevalence of injury
d. Time loss injury
A Receiver Operating Characteristic (ROC) Curve examines the tradeoff between sensitivity and
specificity when you select different cutoff values for a specific test. If the area under the ROC curve has
an area of 1.00, what can be inferred from the test?
a. The test has one false positive and zero false negatives
b. The test has zero false positives and zero false negatives
c. The test is classified as good since the area under the curve is 1.00
d. The test is no better at identifying true positives than flipping a coin
a. If there was one false positive the area under the curve would not be 1.00
b. The area under the ROC curve is 1.00 and thus implies the test has a specificity and
sensitivity of 100%, meaning that there should be zero false positives and zero false
c. The test should be classified as excellent (.90-1.00), classification as good would mean the area
under the curve is between .80-.90
d. If the ROC curve had an area under the curve of 0.5 or less there would be no predictive value,
thus meaning the test is no better at identifying true positives than flipping a coin
A physical therapist wants to test a patients Lumbar Spine ROM and pain. He narrows down his test of
measurement to two movements; sidebending or rotation. Both movement can test the ROM and pain of
the lumbar spine. He knows that sidebending has a kappa statistic of .60 and rotation has a kappa statistic
of 0.17. These kappa statistics indicate which of the following?
a. Sidebending has a higher percent agreement than rotation.
b. The kappa statistic of sidebending is substantial and the kappa statistic of rotation is fair.
c. There is a 60% chance that sidebending will be a type 1 error.
d. There is a 17% disagreement for rotation and 60 % disagreement for sidebending to check lumbar
spine ROM and pain.
a. Sidebending has a higher kappa statistic than rotation, therefore making it have a higher
b. The kappa statistic for substantial is .61-.80 and the kappa statistic for fair is .21-.40.
c. Kappa statistic does not calculate p-values and does not indicate chance of type 1 error.
d. Kappa statistic does not calculate disagreement rate.
You have conducted a study looking at the effects of a new manual therapy technique on shoulder pain.
After collecting and analyzing your data, you get a p-value of 0.25 (p=0.25). Which of the following is
the most likely explanation for your finding?
a. You committed a Type I error, likely because your effect size was too small
b. You committed a Type I error, likely because your variance was too large
c. You committed a Type II error, likely because your sample size was too small
d. You committed a Type II error, likely because your statistical power was too large
a. With a p-value of 0.25, you would not reject the null hypothesis, meaning you could not commit a
Type I error. Further, effect size does not affect statistical significance (p-values)
b. With a p-value of 0.25, you would not reject the null hypothesis, meaning you could not commit a
Type I error
c. Type II error is not rejecting the null hypothesis when it is actually false. Studies with few
participants are sometimes unable to detect a change even when there is one
d. Type II error is not rejecting the null hypothesis when it is actually false. Type II errors occur
when a study does not have enough statistical power
A study comparing the flexibility of older adults who stretch versus those who do not was
conducted. The researchers reported a p value of 0.03. Which of the following is true regarding
this p value?
A. The researchers must fail to reject their null hypothesis
B. The results are clinically significant
C. The results are neither clinically nor statistically significant
D. The results are statistically significant
A. Researchers can reject their null hypothesis if the p value is less than 0.05
B. P values do not indicate if the data is clinically significant. Other values such as effect
size or MCID would need to be reported to determine if it is clinically significant.
C. The p value is less than 0.05 indicating that it is statistically significant, but there is no
information indicating that the research is clinically significant.
D. A p value less than 0.05 indicates that the difference between the two groups is
In The five-repetition sit-to-stand test as a functional outcome measure in COPD article, the
same 50 patients were measured simultaneously for the 5 time sit-to-stand test by two observers
on the same occasion. What type of reliability does this demonstrate?
a. Concurrent is a type of validity that assess whether a test correlates with the gold
standard. This has nothing to with the type of reliability demonstrated.
b. Content is a type of validity that assess whether or not the measure represents all
constructs of the measure. This has nothing to do with the type of reliability
c. Intrarater means that the test was administered by a single rater. This test was done by
two observes so it is wrong.
If a researcher is designing a study and is trying to make sure their measurements have concurrent
validity, what should the researcher be looking for?
e. How well their measures represent the constructs they are supposed to measure.
f. How well the measures correlate with the existing gold standard measure.
g. The ability of different testers to repeatedly produce consistent measurements.
h. The Property of whether their specific measures assess what they are designed to measure.
Answers: B (f)
Rationale: where are you getting your information?
d. From the Research Methods textbook, page 144, this is not the right answer because it is the
definition of content validity.
e. From the Research Methods textbook, page 144, this is the definition of concurrent validity.
f. From the Research Methods textbook, page 145, this is not the right answer because it is the
definition of interrater reliability
g. From the Research Methods textbook, page 144, this not the right answer because it is the
definition of face validity
A Therapist is seeing a patient for knee pain and wants to rule out whether or not the patient’s pain is
caused from an ACL tear. The Therapist decides to use the Drawer Test and obtains a positive test, along
with the following data: +LR = 11, -LR = 0.3. Based off this information, what can the therapist safely
a. The therapist can definitively rule in the possibility that the patient has an ACL tear.
b. The therapist can definitively rule out the possibility that the patient has an ACL tear.
c. The test revealed a large likelihood that the patient has an ACL tear.
d. The test revealed a small likelihood that the patient has an ACL tear
Rationale: where are you getting your information?
a. From the Research Methods textbook, page 229, these data results cannot definitively rule in
the condition of an ACL tear because a negative likelihood ratio of 0.3 means that there is still a
small possibility that there is another cause for the knee pain, besides an ACL tear.
b. From the Research Methods textbook, page 229, the data given cannot definitely rule out the
condition of an ACL tear because the test came back positive, which means there is a high
likelihood that the condition does exist.
c. From the Research Methods textbook, page 229, a positive likelihood
ratio of greater than 10 often means that there is a large likelihood that the condition is
d. From the Research Methods textbook, page 230, a positive likelihood ratio would need to be
around 1 or 2 for there only to be a small likelihood that the condition is present.
A 40-year-old male patient is being seen for a sore right knee. This patient has difficulty standing or
walking for long periods of time and is unable to reach full knee extension (decreased ROM). The patient
is a construction worker, frequently plays basketball and also enjoys hiking with his wife. What
information from above belongs to the “activity (limitations)” component of the ICF model?
a. Decreased ROM
b. Difficulty Walking
d. Playing basketball
a. Decreased ROM is a body structure/function impairment
b. Walking and standing for long periods of time are examples of activity imitations
c. Hiking is a participation restriction
d. Playing basketball would be a participation restriction
A given special test used for identifying a torn rotator cuff has a specificity of 0.93 and a sensitivity of
0.02. A physical therapist performs this special test on a patient and finds the results to be positive.
What does this test result reveal to the therapist?
a. Can confidently rule in a torn rotator cuff
b. Can confidently rule out a torn rotator cuff
c. There is a 2% chance of a torn rotator cuff
d. There is a 93% chance of a torn rotator cuff
a. A highly specific test with a positive result means you can confidently rule in a condition
b. The test was positive and has a low sensitivity, thus this is not correct
c. The 0.02 sensitivity value does not equal the predictive value
d. The 0.93 specificity value does not equal the predictive value
The gold-standard of radiography is used to establish validity of goniometric measurements.
Which type of validity correlates with this gold standard?
a. Concurrent Validity
b. Construct Validity
c. Content Validity
d. Face Validity
a. Criterion-related validity is a type of concurrent validity that helps determine if a
measuring instrument is comparable to a well-established gold-standard measurement.
b. Construct validity refers to the ability of an instrument to measure an abstract theory or single
idea (ex. amount of ROM related to the ability to perform a task or functional activity).
c. Content validity refers to if an instrument measure represents the concept of the variable of
interest (ex. functional abilities of the knee joint).
d. Face validity refers to whether an instrument measures what it is supposed to measure (ex.
A 54-year-old female gets referred to PT for fibromyalgia. Which of the following is true
regarding implementation of an exercise regimen for her?
a. Exercise that the physical therapist implements should be daily, low level, aerobic
b. Fatigue should NOT be taken into consideration when creating an exercise program
c. Stretching should NOT be part of the exercise regimen
d. The physical therapist should implement vigorous exercise that should be done twice a
day, three days a week
a. This is correct because there should be daily, low level aerobic activity
implemented with patients with fibromyalgia. These patients are more vulnerable
to overuse syndromes, so a slower, longer rehab process is key.
b. This is incorrect because patients with fibromyalgia fatigue quickly and have a low
tolerance for exertion.
c. This answer is incorrect because stretching should be included in the exercise regimen.
d. This answer is incorrect because patients with fibromyalgia should be started on low
intensity exercise programs done once a day throughout the week, not twice a day, three
days a week.
You are assessing a patient for pain in their lower back. At the initial evaluation, the patient
stated that their pain was an 8.5/10. What value is appropriate to conclude that there is a
significant improvement or deterioration in the status of the patient’s pain?
a. The patient is sedated and unable to give an accurate rating
b. The patient reports their pain at a 7.5/10
c. The physical therapist makes a conclusion that the patient is better without using the
d. The rated pain from the patient is 5/10
a. This answer is not correct because the patient is not able to give a reading in the first
place because of sedation.
b. This answer is incorrect because the MCID needs to be 2 points in order to be a
significant change in functional status.
c. This answer is not correct because this data must be collected from the patient, not the
d. This answer is correct because the MCID is more than 2 points, so the PT can
conclude that there is a significant change in functional status.
A measure that addresses all aspects of the object or item being measured is represented by which type
a. Concurrent validity
b. Construct validity
c. Content validity
d. Convergent validity
a. Concurrent validity is how well a measure correlates with the gold standard (pg 144)
b. Construct validity is how well a measure or scale measures a certain item or object (pg 144)
c. Content validity is defined as the amount that a particular measure represents all facets of the
constructs it is supposed to measure (pg 144)
d. Convergent validity is a property of measurement showing if a measure is correlated to other
related measures (pg 145)
A physical therapist found a change of 5 degrees of ROM in the elbow after treating a patient
for 4 weeks and wants to know if she can document this as clinically meaningful based on
minimal clinical important difference (MCID). What information will the physical therapist receive
from the literature on MCID related to elbow ROM?
a. Degree of elbow ROM improvement that is meaningful to patients
b. MCID will not give the therapist any information relevant to elbow ROM
c. The p-value is < 0.05 and therefore the ROM is clinically meaningful
d. The therapist can say that 5 degrees of ROM is clinically and statistically significant
a. The MCID is a measure used to show what improvements are meaningful to patients
b. If the therapist searches literature that is related to elbow ROM she will find a MCID value and
will be able to know if the change she sees in her patient is clinically meaningful or not
c. The MCID does not produce p-values and is not related to significance values and p-values are
not related to clinically meaningful findings
d. The MCID does not provide any information about statistical significance
You are evaluating a patient in the clinic with a script from the physician that says “treat patient
for partial rotator cuff tear.” Upon taking a history, systems review, and tests and measures, the
patient, Ruby, tells you that her main complaint is that her shoulder injury prevents her from
playing tennis with her granddaughter. Your assessment reveals that Ruby has limited ROM,
pain with flexion, extension, abduction, and lateral rotation. According to the International
Classification of Functioning Disability and Health (ICF) model, what is Ruby’s participation
a. Inability to play tennis with granddaughter
b. Inability to swing the racket
c. Limited ROM during the evaluation
d. Pain with movement on and off the court
a. Inability to play tennis with granddaughter is Ruby’s participation restriction
because it prevents her from participating in her normal societal roles.
b. Inability to swing the racket is an activity limitation.
c. Limited ROM during the evaluation is a body structure and function impairment.
d. Pain with movement on and off the court is also a body structure and function
A study found that low intensity training with blood flow restriction can improve peripheral blood
circulation in elderly adults. Data looking at muscle strength levels for a one rep max leg press
indicate a difference between the blood flow restriction group and the non-blood flow restriction
group of (F = 11.7, P < 0.01). How would you interpret the P value?
a. Don’t reject the null hypothesis because the result is due to chance
b. Since P < 0.01 there is little if any correlations between the groups
c. There is a statistically significant difference between groups
d. There is not a statistically significant difference between groups
a. If P > 0.05 then we don’t reject the null, but the P-Value in the question is <0.01 so we
can reject the null.
b. This interpretation would be correct if we were interpreting r values.
c. Since the P < 0.05 the result is statistically significant.
d. If P > 0.05 then the result would not be statistically significant.
As a physical therapist and researcher, you are developing a study that examines a new special
test for ACL tears. You decide to test for the correlation between the special test results and MRI
results. By doing this, you are adhering to what type of validity?
a. Concurrent validity refers to how well one measure is correlated with the gold
b. Content validity is the amount that a particular measure represents all constructs
c. External validity relates to the generalizability of the results
d. Internal validity refers to the actual study design and the control of variables
The Berg Balance Scale has a SEM of 2.3 for institutionalized older adults. If an institutionalized
older adult improved their Berg Balance Scale score from 28 to 35 points, is that considered to
be due to error and what is the MDC?
MDC = 1.96 x SEM x (square root of 2)
a. MDC = 9.02, Yes the increase in score could be from measurement error
b. MDC = 9.02, No the increase is not from error and is true improvement
c. MDC = 6.3, Yes the increase in score could be from measurement error
d. MDC = 6.3, No the increase is not from error and is true improvement
a. Wrong because the MDC was calculated wrong, from the MDC equation, I multiplied
(1.93 x SEM) x 2, instead of square root of 2.
b. Wrong because the MDC was calculated wrong, from the MDC equation, I multiplied
(1.93 x SEM) x 2, instead of square root of 2. Also if it were calculated correctly to be
9.02 the increase is less than the MDC.
c. Wrong because the increase in score was 7 points and the MDC was 6.3 so the change is
enough to be confident it’s not from measurement error. The MDC is calculated correctly
d. Correct because the MDC is calculated correctly and the change in score is more than the
MDC making us confident that the change is not due to error.
Jones S, Kon S, Canavan J, et al. determined the MCID for the five-repetition sit-to-stand (5STS)
was 1.7 seconds. How would this be interpreted for clinical application?
a. If your patient improved by 1.7 seconds it is clinically significant.
b. If your patient’s score was 1.7 seconds slower than norm values they were at risk of falling.
c. You can expect to have a 1.7 second difference between testers.
d. Your patient’s score must be 1.7 seconds faster than norm values to be clinically significant.
a. This is the correct answer because MCID is the minimal clinically detectable change and
therefore tells us how much the score needs to change to see clinical significance.
b. This answer is incorrect because this would be an example more so of cut-off scores. Cut-off
scores would actually be more so that the patient hit a certain score and this has correlated and
indicates higher risk of falling or lower extremity strength.
c. This answer is incorrect because this would be an example of inter-rater reliability. In order to
interpret this concept we would need to have an ICC value.
d. This answer is incorrect because while it seems that the MCID would mean the minimal
A patient comes in to see you at the clinic. The patient is a 78-year old male with Parkinson’s
disease. His wife has come in because she has been worried about him falling and concerned
about his walking ability. The following are test and measures that are specific to the Parkinson’s
- Five-time sit-to-stand (test/retest reliability ICC= 0.76, inter-/intrarater reliability ICC=0.99,
sensitivity= 0.89, specificity=0.47, cut-off score for risk of fall is >16 seconds)3,5,6
- Four Step Square Test (test/retest reliability ICC= 0.78, inter-/intrarater reliability ICC=0.99,
sensitivity= 0.73, specificity=0.57, cut-off score for risk of fall is 9.68 seconds.)2
- Functional Reach Test (test/retest reliability ICC= 0.84, inter-/intrarater reliability ICC=0.74-
0.64, sensitivity= 0.52, specificity=0.53, cut-off score for risk of fall is >16 seconds)1,10
- Timed Up and Go Test (test/retest reliability ICC= 0.85, inter-/intrarater reliability ICC=0.99,
sensitivity= 0.69, specificity=0.62, cut-off score for risk of fall is <25.4 cm.)4,8,9
Which test is the safest and most appropriate test to administer to test the patients fall risk and
a. Five-time sit-to-stand
b. Four Step Square Test
c. Functional Reach Test
d. Timed Up and Go Test
All of these tests would be possible to administer on this patient. I created this question to test
clinical and statistical reasoning of tests that we should know the protocol for.
a. This answer is incorrect because while the statistically sound, the 5STS is not appropriate
because gait is not being specifically tested.
b. This answer is incorrect because while the statistically sound, the FSST is not appropriate
because gait is not being specifically tested.
c. This answer is incorrect because statistically it has the lowest specificity and sensitivity
combination as well as it does not specifically test gait.
d. This answer is correct because it specifically tests gait and balance and has moderate to
good specificity and sensitivity.
clinically detectable change would be from the norm values, it actually means the improvement or
setback the patient has from their own, personal past scores.
You are reading an article about an imaginary test and find that this particular test has an MCID
of 10 and a MDC of 5. If your patient were to score an 8 on this test what could you conclude
from this information?
a. The patient has had a change that is due to measurement error and this change is
b. The patient has had a change that is due to measurement error and this change is not
c. The patient has had a change that is not due to measurement error and this change is
d. The patient has had a change that is not due to measurement error and this change
is not clinically significant.
a. MCID: 8 is less than 10 so there is no clinically important difference; MDC: 8 is greater
than 5 so there is a detectable change that is not due to measurement error. Therefore,
both parts of this answer are wrong.
b. The first part is incorrect.
c. The second part is incorrect.
d. Both are correct.
BELOW ARE FLASHCARDS ON CLINICAL REASONING
Quiz 2 flashcards are all on Clinical reasoning, so review those flashcards.
1) What is Clinical Reasoning:
2) Clinical reasoning is the thinking or the decision part of patient care?
3) T or F: Because the evidence was a systematic review, good specificity, good reliability, huge sample size and statistically significant = intervention is good for every patient.
4) T or F: Interventions as a PT for pt's usually always includes exercise?
5) What are some other synonymous terms for clinical reasoning, but have slightly different meanings:
6) T or F: You can decide/act without clinical (critical) thinking, and you can do critical thinking without acting.
1) It is the thinking, working out problems, consulting patient/family, using judgement, consulting other health professionals, relying on experience, consulting literature, etc. ... and then processing all the info in order to make a decision that is best for the pt.
2) Both. It is the sum of thinking and decision-making processes associated with clinical practice.
3) False. It is important to know what the evidence says, but doesn’t mean that works for every person. You need to be creative, use your clinical reasoning skills, and individualize POC for each pt.
- Critical thinking: the COGNITIVE piece of thinking
- Clinical decision making: the ACTION or decision
- Clinical reasoning: broad term that encompasses BOTH the thinking AND decision making.
1) Clinical reasoning is just one peice of the big picture in patient management. What else goes into it (in addition to clinical reasoning skills)?
2) An example of this would be ... you know how to do a transfer of a pt, but how does clinical reasoning come in to play:
3) T or F: There is only one correct way with clinical reasoning (and decision making)
4) T or F: Clinical reasoning requires creativity and innovation?
5) Clinical reasoning is a destination that you arrive at and master at some point?
PT knowledge and eduction
PT values, care, compassion
Get patient to MOVE (movement system / anatomy)
Clinical reasoning is just one important component of the process.
2) Making a transfer is easy, but what if the pt has a tibial fracture, catheter, are bi-polar, NWB, lacking cognition, etc. Now you need to know what leg (so you know what way to transfer), are they WB or NWB, strength of the good leg (will you need to block), cognitive ability (follow directions). So now you can have all the knowledge in the world, but given the pt condition/situation, you need good clinical reasoning skills, care, compassion, and knowledge, and still get them to MOVE. Have to do ALL these. Most importantly, how do you take all the info and complexities and use your clinical reasoning.
3) False. There may not be just one correct way. There may be multiple ways, but take in all the info., and using clinical reasoning to create a plan.
5) False. It always develops, changes, is challenged, and needs improvement (and is different for every situation).
*** KEY POINT
1) T or F: Clinical reasoning is a continuous REFLECTIVE process? Meaning:
2) What are the 3 different types of REFLECTION that take place within clinical reasoning:
2A) Of those 3 reflection options, one is more for deductive and one is more for inductive (the other in between). Which is which
3) Generally, which reflection would the novice do, which would the expert do?
1) True. YOU HAVE TO REFLECT for clinical reasoning to go well. If you never reflect on what went well, or what you can improve, your clinical reasoning skills will never improve.
- Reflection-on-action: reflecting on experience AFTER event
- Reflection-in-action: reflecting on experience DURING event
- Reflection-for-action: Anticipating changes for FUTURE interactions
2A) Reflection-on-action is more deductive, and reflection-for-action is more inductive
- Novice = reflection-on-action
- Expert = reflection-for-action
There is Deductive Reasoning and Inductive Reasoning.
1) What is another term for Deductive Reasoning:
2) What is another term for Inductive Reasoning:
3) WHO uses the deductive reasoning process:
4) Explain generally what the DEDUCTIVE reasoning process is:
5) WHO uses the inductive reasoning process:
6) Explain generally what the INDUCTIVE reasoning process is:
7) T or F: Experts never use deductive reasoning?
8) Would deductive reasoning or inductive be used for clinicians who need a structured process to help them in patient care?
9) Would deductive reasoning or inductive be used for clinicians who formulate a hypothesis and then test it during patient care?
10) Novice students use deductive reasoning and need a structure. What are some examples of processes or structures they follow:
11) Can a new clinician use inductive reasoning skills?
12) Novice clinicians use tools like the ICF Model and Clinical Practice Guidelines. What tools do experts use who use inductive reasoning:
2) Pattern recognition
3) NOVICE (new) clinicians ... the rookies
- Generate a hypothesis, then do tests/measures to test hypothesis
- Lots of thinking it through
- Lots of structure, processes, systems – and they need structure and to follow a process.
- The expert recognizes patterns from experience
- Pattern recognition, intuitive thinking, experience
- It is quicker
7) False. Experts, do go back to ‘drawing board’ and use deductive reasoning if they don’t recognize a pattern.
8) Deductive (novices/new clinicians)
9) Deductive (novices/new clinicians)
- ICF model
- Patient Management Model
- Clinical Practice Guidelines
- Clinical Prediction Rules/Guides
11) No, not really. You can’t really use inductive reasoning unless you have experience and can recognize patterns from seeing MANY many patients with the same issues.
- Reflection (on Experience)
- Open ended questions
What are the 3 domains that impact learning:
This relates to you as a PT student, but how does it relate to PATIENTS:
- Behavior (action)
- Affective (attitude)
Learning is not only in the mind … you have to put it into action, and you have to CARE. If patients want to learn, they have to DO and they have to CARE.
Where does Clinical Reasoning break down and not work:
- When you are not informed (lack of knowledge)
- Bad data / information
- Inability to interpret or comprehend data/info
- Bad attitude
- Don't believe
- Not considering patients needs
- Not consulting and factoring in the research
- Not reflecting
- Coming to an answer too early without enough information (not ruling in/out other issues).
BELOW are flashcards on Qualitative Inquiry
Review flashcards for quiz 3 ... that one was all on qualitative inquiry
1) Main difference between QUANTITATIVE and QUALITATIVE research:
2) Which one is more objective, which is subjective?
3) Which one cares more about the #'s and stats?
4) Which one has an independent variable (intervention) and a dependent variable you are measuring
5) Which one is done in a lab (typically), which one is in a more natural environment?
6) Which one focuses on the WHAT, and which one focuses on the WHY
7) Which one typically has a larger sample size:
8) Which one is a case-control study approach
9) QuaLitative Inquiry is also known as:
10) Coding is done in this one:
11) NUMBERS are used as results in _________, WORDS are used as results in __________
12) Surveys are used in which approach:
13) How might a survey differ in a QuaNtitative vs. a QuaLitative study:
- Which one has a broad research question where you come up with a hypothesis
- Which one already has the hypothesis and you want to test / prove the hypothesis.
15) *** Remember:
- Validity for quaLitative research =
- Internal Validity for quaLitative research =
- External validity in quantitative =
- Reliability in quaLitative research =
- ____________ sampling is all about numbers, randomized, non-biased, so you can generalize to the whole population.
- With_________, you get more specific person/group, fewer people, smaller sample size, but you still need inclusive and exclusive requirements on who to use.
-________ results you try to generalize to all patients / population.
-________ results you try to generalize to a THEORY to try and generate a hypothesis to result in more research.
18) T or F: There IS bias in any research, and you can bias data or words (quantitative or qualitative) to get whatever result you want. You want to do everything you can to reduce bias.
19) Which one has a control group, which one typically does not?
- QuaNtitative is more objective, focuses on stats and numbers and statistical significance from a large sample size.
- QuaLitative focuses less on the numbers and objective measures, but is more subjective research on real life patients
2) QuaNtitative is objective, QuaLitative is subjective
- Lab: QuaNtitative
- Natural: QuaLitative
QuaNtitative research is typically NOT done in a natural setting, it is in a lab in controlled environment. That is why a randomized controlled trials design is the highest level of quaNtitative research to try to mimic true patients and real settings.
- What: QuaNtitative
- Why: QuaLitative
- Numbers: QuaNtitative
- Words: QuaLitative
- QuaLitative uses more interview’s, focus group, questionnaire (open ended), participant observation
- QuaNtitative uses more survey’s (closed-ended), collect data, numbers
- Validity = Confirmability (triangulation)
- Internal Validity for quaLitative research = credibility
- External validity in quantitative = transferability in qualitative
- Reliability in quaLitative research = dependability
19) QuaNtitative has a control group typically, where quaLitative does not.
1) What is Triangulation:
2) How do you get triangulation data?
1) Triangulation is gathering info to generate VALIDITY in a QuaLitiative study. It is a powerful technique that facilitates validation of data through cross verification from two or more sources. In particular, it refers to the application and combination of several research methods in the study of the same phenomenon.
2) Interview, observation, artifact data – get it from all areas. You can get triangulation data from your research team, from an outside research team, from another research article.
A research project where there are focus groups, individual interviews, observations, documents, etc. And they come from academic settings, clinical sites, and from all over the country. This is triangulation.
1) What is CODING:
2) T or F: With QuaNtitative sampling, you can be analyzing data as you go along, and then change your data collection approach.
1) After data is collected (interview, document, focus group, observation) … you then need to categorize the data/info from research to make sense of it. It involves reading the data and giving labels or codes to the themes and ideas that you find. You create categories as you work through the data.
You read it, interpret what you think it is saying, and then pull out key words and categorize / organize it into categories.
Coding is an analytical process in which data, in both quantitative form (such as questionnaires results) or qualitative (such as interview transcripts) is categorized to facilitate analysis. Coding means the transformation of data for computer software.
2) False. In QuaLitative you CAN. But with QuaNtitative you can NOT. QuaNtitative is just collect all the data, then see results.
1) Data reduction =
2) With QuaLitative studies, you are constantly doing data
collection and data analysis. Just remember the line between them changes as you go (in QuaLitative studies) and coding and cateogorizing and hypothesis change throughout.
Dr. Gail Jensen talked about how a PT needs 3 things (similar concept to cognitive, behavioral, and affective). What were her 3:
1) The excellence of quaLitative research is in large part in the excellence of the _________
2) Is coding making sense of numbers or words?
3) Where does quaLitative data come from?
4) What errors are made when coding?
5) 5 step approach for analyzing quaLitative data (called what):
- Open ended questions
- Focus Groups
- Case studies
- Just listing everything without categorizing it or analyzing it
- Listing info that could identify the person
- Taking comments and generalizing to the whole (quaLitative analysis and studies do NOT generalize to the whole).
- Have a bias or interpretation of comments that benefit researcher or study
5) Called: Content Analysis
- Get to know your data (spend lots of time ... read and re-read over and over, transcribe data, etc.)
- Focus the analysis
- Categorize the information (identify patterns/themes, organize into categories, give it labels). Also called CODING
- Identify patterns within the various categories
- Interpretation (bring it all together, what did you learn)
Below are flashcards on EPIDEMIOLOGY
1) Define Epidemiology:
1) study of what befalls a population (related to health). Study of disease or conditions among a population. What their etiology and risk factors are, the impact on a population, the prognosis, preventative measures, etc.
1) In study designs, there are 2 major ones. Explain
2) Of the 2 listed in #1 question, which is the Gold Standard and why?
3) Explain how a prospective study works
3A) Explain how a retrospective study works
- A statistical analysis of baseline measures between groups is what study?
- A statistical analysis of measures between groups
- Prospective: The now and into the future. Who NOW has or will become injured / get the condition / get cured.
- Retrospective: Looking into the past. Who already had the condition / injury.
Prospective: study that watches for outcomes (development of a disease) during the study period and relates this to other factors such as suspected risk or protection factor(s). The study usually involves taking a cohort of subjects and watching them over a long period.
Retrospective: study that looks BACKWARDS and examines exposures to suspected risk or protection factors in relation to an outcome that is established at the start of the study. Retrospective investigations are often criticized.
2) Prospective. Because when looking in the past (retrospective), there are so many why's that could contribute that you can't measure or account for. Can't determine cause and effect in retrospective studies.
3) Participants are all measured at the beginning - AT BASELINE - and then followed over time. Then it is recorded on who develops the condition and who doesn't, or who gets healed and who doesn't. It is then a comparison of individuals who develop the condition versus individuals who do not develop the condition.
3A) You have 2 groups at the start … those WITH the disease and those WITHOUT the disease/condition. It is a case-control study since some have the 'case' and some are the 'control' group.
1) What is a 'cohort'
2) What is the most common example of a cohort study?
1) A “cohort” is a group of individuals followed over time. A cohort could be people from same region, age, school, demographic aspect, condition, disease, injury, etc.
2) Framingham Heart Study (generations of people followed over time observing lifestyle conditions to determine cardiovascular disease development).
(*** Will be a test ? on case-control studies)
1) What is a case-control study
- The “cases” are those people who _____ the condition (or the case).
- The “controls” are the people who _____ the condition
3) T or F: Case control studies are only observational. Why?
4) T or F: Case control studies have an intervention implemented?
5) Another name for case-control studies
6) In a case-control study, can you determine risk factors or what caused the disease/condition? Give an example:
1) A study that compares patients who have a disease or outcome of interest (cases) with patients who do not have the disease or outcome (controls), and looks back RETROSPECTIVELY to compare how frequently the exposure to a risk factor is present in each group to determine the relationship between the risk factor and the disease.
- Do NOT have
- Case control studies are observational because NO INTERVENTION is implemented/attempted and no attempt is made to alter the course of the disease. The goal is to retrospectively determine (or observe) the exposure to the risk factor of interest from each of the two groups of individuals: cases and controls.
5) Retrospective studies
6) NO. An example ... if you are looking at pt's with OA retrospectively, you don't know if the OA was caused by obesity, injury, quad weakness, etc. Or ... if you are overweight you are more at risk for knee osteoarthritis. OR, did they get knee osteoarthritis that led to them being overweight. It is a chicken-egg thing and you don’t know what led to what.
1) Explain the difference between endemic, epidemic, and pandemic
(*** Will be a test ? on this)
**** IT's ALPHABETICAL
- Endemic: presence of a disease within a specific geographic region
- Epidemic: if the % of people with condition goes up in that region (basically if the diseases rises above the normal in that region)
- Pandemic: it spreads world-wide
Differentiate between incidence and prevelance:
- Incidence: The number of NEW CASES of pathology in a given PERIOD OF TIME (expressed as a # / period of time). The injuries / exposures. 3.2 injuries per year (or per athlete exposures).
- Prevalence: Proportion of population sample with pathology at given point in time (expressed as a % of the population). 2.7% of population has CHF
1) T or F: There are usually many risk factors that contribute to a person developing a disease / condition
2) Explain and give examples of these risk factors:
3) For Knee Osteoarthritis, what are some examples of risk factors for ALL the 4 categories above:
1) TRUE. It is not just 1 risk factor, it is usually many that play into getting condition.
- Intrinsic: within the person (internal)
Ex: Sex, skeletal alignment, flexible or not, previous
- Extrinsic: outside of person or within the environment
Ex: Playing surface, protective equipment, aggression
of other athletes, accidents, rules, etc.
- Non-Modifiable (can’t change):
Ex: Age, gender, genetics
- Modifiable (CAN change)
Ex: Occupation, obesity, joint injury, muscle strength
- Intrinsic: gender, age, skeletal alignment
- Extrinsic: running surfaces,
- Non-Modifiable: age, genetics
- Modifiable: obesity, activities participating in, strengthening muscles, improving ROM, occupation
For each of these below, you will NOT need to calculate them, but do know about them:
Sensitivity (Sn =):
Specificity (Sp =):
SPin vs. SNout:
Effect Size (d=):
Type I and Type II Errors:
Statistical Significance vs. Clinical Significance:
Numbers Needed to Treat (NNT):
Agreement (kappa, k):
Multiple Settings and Studies:
Relative Risk (RR):
Relative Risk Reduction (RRR):
Absolute Risk Reduction (ARR):
Odds Ratio (OR):
Likelihood Ratio (LR):
Sensitivity (Sn =): Ability of a test / intervention to CORRECTLY identify those WITH the condition
Specificity (Sp =): Ability of the test / intervention to CORRECTLY identify those WITHOUT the condition
SPin and SNout:
- SPIN stands for SPecific tests rule IN the condition when the test is POSITIVE.
- SNOUT stands for SeNsitive tests rule OUT the condition when they're NEGATIVE.
Validity: How useful, accurate, truthful, and meaningful the study results are. Internal validity is how well the study was done (bias, randomized, blinding), and External validity is how applicable the results are to the general population (other patients). Correlation between IV & DV (Pearson’s r=): If Independent Variable (IV) impacts or creates change in Dependent Variable (DV) then there is a high correlation (validity). High correlation is closer to 1 (a r= 0.1 is low correlation between IV and DV).
Reliability (ICC): When you do multiple tests / studies / measurements / interventions over time, you want them to be reliable and produce CONSISTENT repeated measures. Whether two tests over same day, or over multiple days, or through an intra-rater (same person doing multiple tests) or inter-rater (different testers) … Reliability is CONSISTENCY in test measurements over multiple tests.
Effect Size (d=): Effect size is the magnitude, or SIZE OF AN EFFECT of an intervention, or size of difference between the two groups or interventions (experimental vs. the control group).
P-value (p=): A small p-value (typically ≤ 0.05) indicates strong evidence AGAINST the null hypothesis, so you REJECT the null hypothesis (it means the IV is impacting DV, so there is correlation). A large p-value (> 0.05) indicates WEAK evidence against the null hypothesis, so you do NOT reject the null hypothesis (low correlation between IV and DV). Usually it is less than 0.05, but it tells us NOTHING about clinical significance.
Type I and Type II Errors:
- Type I errors: FALSE POSITIVE ('you are pregnant' to a man). You REJECT null hypothesis when it was true. (These are less common)
- Type II errors: FALSE NEGATIVE ('you are not pregnant' to a pregnant woman). You do NOT reject null hypothesis when it was false. (These are more common)
Sample Size: How large the group is being studied. Obviously the larger, the better (more statistical significance).
Statistical Significance vs. Clinical Significance: Just cause you have statistical significance does NOT mean it is clinically relevant (and visa versa).
Numbers Needed to Treat (NNT): Number of patients that must be treated in order to achieve one additional favorable outcome / prevent a bad outcome. It's the inverse of ARR (so NNT = 1/ARR)
NNTB = Numbers needed to treat to get a Benefit. NNTH = Numbers needed to treat to cause Harm
Agreement (kappa, k): K is the Kappa statistic which is INTER-RATER AGREEMENT (two PT's agree). It says two different PT's AGREE at 0.60 or 0.17 of this test/intervention. The higher to 1 the better or more agreement, closer to 0 is more chance and less agreement between testers.
Bias: When researchers select participants in a biased way, or allocate / assign participants to a group (control or experimental) – this bias reduces the validity of the study.
Blinding: Internal validity is improved when both the participants of the study, and the researchers / therapists are blinded to who is chosen for what group (control or experimental).
Multiple Settings & Studies: Obviously the more the study is replicated, and then produces similar findings, the more valid and applicable it is. If the study was replicated by many different groups over time, in many different settings = more validity.
Relative Risk (RR): The ratio of ... risk in the exposed group (cases) to the risk in the non-exposed group (controls). Exposed group risk / nonexposed group risk.
A RR = 1 means risk is equal in both groups.
If RR > 1, the risk in the exposed group is greater than
the non-exposed group (positive association)
If RR < 1, the risk more in the non-exposed (negative
Ex: Smokers (exposed group) have a RR of 1.61 to
developing CVD to non-smokers (non-exposed
group) ... or smokers are 1.61 times more likely to
get CVD than non-smokers.
Relative Risk Reduction (RRR): Percentage that the treatment reduces risk compared to control. RRR = (1-RR) * 100 (goal is to get it to 100%). 75% RRR means 75% less likely to have an ACL tear if they do this program, or you’ve reduced the risk by 75%.
Absolute Risk Ratio (ARR): The absolute arithmetic difference in event rates between control and experimental groups. Decrease in risk of treatment in relation to a control treatment. ARR= CER−EER. It is the inverse of NNT … so 1/NNT
Odds Ratio (OR): A measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.
OR = 1 implies that the event is equally likely in
OR > 1 implies that the event is more likely in the first
OR < 1 implies that the event is less likely in the first
Pre-test Probability: The probability of a patient having the target disorder BEFORE a diagnostic test result is known.
Post-test Probability: The probability of a patient having the target disorder AFTER a diagnostic test result. (** To calculate post-test probability, use scale to set pre-test probability, then draw dot at the LR, then draw line through two dots to determine post-test probability).
Likelihood Ratio (LR):
+LR = sensitivity / (1-specificity)
+LR > 10 is a test result with a LARGE effect on
increasing the probability of disease/condition
+LR between 5-10 is a test that has moderate effect on
increasing the probability of the disease/condition
+LR < 5 indicates a small effect on increasing the
probability of the disease/condition
-LR = (1-sensitivity) / specificity
-LR < 0.1 indicates that the result has a large effect on
decreasing the probability of the disease/condition
-LR between 0.1-0.5 indicates that the test has a
moderate effect on decreasing the probability of the
-LR > 0.5 indicates a small effect on decreasing the
probability of the disease/condition
*** Most research is fabricated by researchers / professors / students / employees to try and get published, write a dissertation, prove their hypothesis, etc. But, if I find a study that has a HIGH sensitivity with a HIGH specificity, and the sample size in the study was statistically significantly large, there has to be a high correlation between independent and dependent variables, high reliability (ICC) between different tests and testers, effect size is high, agreement (k), etc. There has to be NO bias, random sampling and allocation done, participants and researchers / therapists were blinded, and the study was replicated in multiple times in multiple settings. IF ALL OF THIS IS TRUE ... then I have confidence that it is a valid study with enough data to implement in my clinical practice with patients. Otherwise … clinical experience > EBP.
1) What is the heiarchy of studies from BEST to WORST evidence:
2) Review below the different types of studies:
- Systematic Reviews (and Meta-Analysis)
- Cohort Studies
- Case Control Studies
- Case Reports
- Ideas, editorials, opinions
- Animal research
- In vetro research
- Systematic Reviews: are the best studies as they summarize multiple studies on a topic into one paper. Instead of having to read 50 articles on the topic, a systematic review will bring all that research, data, and conclusions from all 50 into one succinct paper to not only save the clinician time, but to also provide evidence from multiple RCT’s on outcomes and interventions (and clinical practices) that are best for the patient based on research / evidence.
- Meta-Analysis: These will thoroughly examine a number of valid studies on a topic and mathematically combine the results (data) using accepted statistical methodology to report the results as if it were one large study.
- Randomized Controlled Studies: **** Gold Standard for Clinical Trials. These are carefully planned experiments that introduce a treatment or exposure to study its effect on real patients. They include methodologies that reduce the potential for bias (randomization and blinding) and that allow for comparison between an experiment group (gets intervention) and a control group (who doesn’t get intervention) to see if IV will impact the DV being studied. A randomized controlled trial is a planned experiment and can provide sound evidence of cause and effect.
- Secondary Analysis of a Cohort Study:
Researchers just look at the data from a previously done Cohort Study and analyze that data. Typically researchers do this to manipulate data to whatever their hypothesis is.
- Cohort Study: Identify a group of patients who are already taking a particular treatment or have an exposure, follow them forward over time (prospective), or looking back in time (retrospective), and then compare their outcomes with a similar group that has not been affected by the treatment or exposure being studied. Cohort studies are observational and not as reliable as randomized controlled studies, since the two groups may differ in ways other than in the variable under study.
- Case Control Studies: Studies in which patients who already have a specific condition (the 'cases') are compared with people who do not have the condition (the 'controls'). The researcher looks back to identify factors or exposures that might be associated with the illness. They often rely on medical records and patient recall or surveys for data collection. These types of studies are often less reliable than randomized controlled trials and cohort studies because showing a statistical relationship does not mean than one factor necessarily caused the other. Researchers could gather certain data, or manipulate it, to prove hypothesis.
- Case Report: Studies of a single patient. It follows the ICF model, but has no control group, no experimental factor, etc. Just studies that one person's specific case / intervention / outcomes. It doesn’t suggest cause and effect, but it does typically lead to further research studies.
1) Can you calculate RR in a case-control study?
2) When or what study is RR used in?
3) OR can be used in what study(s)
1) NO. You cannot because you do not know the incidence of the exposed population. All individuals in the exposed group have the condition.
2) RR used for a PROSPECTIVE study. *** I can't determine the risk of you getting a condition if you already have it (retrospective). That is why RR can only be calculated in a prospective study, and case-control studies are retrospective.
3) OR can be used in case-control and prospective study
Difference between Mortality and Morbidity
Mortality: Provides an insight of disease severity
# of deaths / population
Morbidity: # of people in population who become ill (where mortality is about actual deaths)
- Outcome measures for the clinician will typically be:
- Outcome measures for the patient will typically be:
2) How does randomization relate (or not relate) to bias and blinding
3) What is a "historical control"
4) Another term for case-control study participants:
5) What's the difference between efficacy and effectiveness:
- Clinician: Objective. DOE. (ROM, MMT)
- Patient: Subjective. POE. Functional abilities (patient self report) verbally or through SF-36, DASH, LEFS
2) Randomizing should limit bias, but not necessarily mean there is blinding. Blinding limits experimenters from knowing, and randomization just picks people for control vs. experiment group at random.
3) A “Historical Control” is someone who has had a condition for a while now (and it’s not going away soon). They are a "control" participant in the study.
4) “Utilizers vs. Non-Utilizers” … some with LBP go to PT, and some do NOT go to PT (went somewhere else or didn’t go in). And compare them.
*** Effectiveness relates to how well a treatment works in practice, as opposed to efficacy, which measures how well it works in clinical trials or laboratory studies.
***Sometimes the line between the two is gray.
- Efficacy: Determined in a controlled experimental situation/enviornment with ideal conditions (very controlled). Very very controlled study, and a smaller controlled study.
Effectiveness: Examines outcomes in real life situation. More real world and less standardized study, and bigger study and less controlled.
Below is power point on Clinical Prediction Guides
1) What are clinical prediction guides:
1A) Best example of a clinical prediction guide:
2) What are other names for a clinical prediction guide:
2A) What are a few reasons why we have these clinical prediction guides:
2B) What do we need to know in order to implement a clinical prediction guide:
2C) What are clinical practice guidelines
3) Clinical prediction guides vs. Clinical Practice Guidelines:
4) Another name for Clinical Practice Guidelines:
5) Are clinical prediction guides and clinical practice guidelines the same thing:
6) Are Clinical Practice Guidelines intended to be prescriptive?
7) Criteria for good CPG's:
8) Main key points about CPG's:
9) Do CPG's tell clinicians how to treat and practice?
1) Decision making tools for clinicians including 3 or more variables. Quantifies individual results from various components of the history and physical exam to help the clinician make a diagnosis, prognosis, or likely response to treatment in an individual patient.
They are the combination of clinical findings (cluster) that provide meaningful predictions about an outcome (diagnosis, treatment) of interest.
1A) Ottowa Ankle Rules
- Clinical prediction rule (but they are NOT rules, just guides)
- Clinical decision rule
- Helps improve clinician decision making
- It improves patient outcomes
- It improves, predicts and helps establish a diagnosis' better
- It improves patient care / satisfaction
- It reduces cost
- Helps PT know when to refer pt
- Who the guide is intended for (what type of pt or condition)
- Performance characteristics
- Conditions of when to apply it
*** We need to know who the guideline is intended for (not everyone fits into the same treatment bucket).
2C) General industry wide statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options. *** Make recommendations for best practice
- Clinical prediction rules: derived from original research involving many patients and mathematical analysis.
- Clinical Practice Guidelines: Consensus among many experts and much more broad comprehensive advice/guidelines.
** CLINICAL PREDICTION GUIDES are not clinical practice guidelines. The guide is a single investigation and focused. The guidelines are a systematic summary of findings and more comprehensive.
4) Position Statement
- Should be <5 years old or must be revised ever 5 years
- Sponsoring organization should not be biased, but rather should be objective.
- Recommendations from industry experts (expert panel of clinician-scholars) to optimize patient care
- Informed by MANY systematic reviews of evidence
- Updated regularly to be up to date
- Synthesize the best evidence / research to improve care for patients
- Assist clinicians in making choices
- Much more comprehensive than a single study
- Not prescriptive
9) No. CPG's are broad recommendations for the practice of physical therapy. These guidelines do NOT tell you specifically what to do, they give broad general guidelines.
When you see "relationship" think:
Regression or correlation
So values close to one are more related or correlate to each other.
1) Why are clinical prediction guides so helpful for NEW clinicians / PT's:
2) What is a common example of this (done in an ER of the hospital):
3) T or F: Typically these clinical prediction guides outperform clinician judgement
- These Clinical Prediction Guides are helpful tools and resources to new clinicians who need help to get started.
- There are patterns, historical experience, proven methods, etc. that other clinicians have been through – and these guidelines are their experience and advice, and it is helpful.
- They can enhance accuracy and efficiency of decision making.
- They help to make a diagnosis
- Give structure
The Goldman Algorithm:
- Basically shows a flow chart for Dr's to use when people come into the ER having chest pain – a systematic way to know whether they were having a MI or not. It is just a guideline to help you have a plan or pattern on if they have a disease, and how to tell.
- So if a person met the 4 criteria (ECG, pain, fluid, BP) it had a high sensitivity to predict who was having a MI.
1) What is a high or good +LR
2) What is a good -LR
3) So a +LR of 1.77 and a -LR of 0.02 means what?
4) So how do you get the post-test probability
5) If we used OAR (Ottowa Ankle Rules) to see if someone had an ankle injury and got a +LR of 1.57 and -LR of 0.08 ... what would that mean?
1) 10 or higher (LR > 10 indicates that the test result has a large effect on increasing the probability of disease presence)
2) Less than 0.1 (a LR of <0.1 indicates that the result has a large effect on decreasing the probability of disease presence)
3) Shows this test is NOT good for having the test predict HAVING the injury (since it is a +LR of 1.77), but it is REALLY GOOD at having the test predict NOT having the condtion (since -LR is 0.02)
4) To calculate post-test probability, use scale to set pre-test probability, then draw dot at the LR, then draw line through two dots to determine post-test probability.
5) OAR's are NOT good at ruling IN an ankle injury since +LR is 1.57, and OAR is very good at ruling OUT an ankle injury since -LR is 0.08
Below are flashcards on the "Article Summary" power point
1) Review what a Systematic Review and a Meta-Analysis study is:
2) How is evidence / studies graded:
3) T or F: It is not possible to have bias in a Systematic Review:
4) T or F: A systematic review is always applicable to patient care:
- Systematic Reviews: are the best studies as they summarize multiple studies on a topic into one paper. They focus on a specific question, but instead of having to read 50 articles on the topic, a systematic review will bring all that research, data, and conclusions from all 50 into one succinct paper to not only save the clinician time, but to also provide evidence from multiple RCT’s on outcomes and interventions (and clinical practices) that are best for the patient based on research / evidence.
- Meta-Analysis: These will thoroughly examine a number of valid studies on a topic and mathematically combine the results (data) using accepted statistical methodology to report the results as if it were one large study. It is a quaNtitative synthesis of the data of the systematic review of these multiple studies.
2) Obivoulsy systematic reviews are better than RCT's, and RCT's are better than case-control studies, etc. etc. But you also have sub grades within each category.
So JOSPT and Modified CEBM scales are other ways to grade evidence.
3) False. Researchers can still pick and choose what studies to include. There can still be bias.
4) False. Just cause there is statistical significance, does NOT mean there is clinical significance. Plus, if the systematic review was done 20 years ago, it is out of date. Good systematic reviews are current within last few years (but those are hard to find).
1) Big difference between a systematic review and a literature / narrative review:
2) What is a Meta-Analysis:
3) Who/What are the "subjects" in a Meta-Analysis study:
4) Meta-Analysis studies must have strict inclusion and exclusion criteria. What does that mean?
5) Give examples from #4 above:
6) The process of identifying a study for a Meta-Analysis is similar to others ... go to a database, search terms, pick a timeline, select studies, etc. What are two main search databases to do this:
7) T or F: In a Meta-Analysis, authors typically use a flow chart to show readers what process they went through to find, choose, and filter studies.
1) Basically a systematic review starts with a question and follows a formal process. Traditional (Narrative or Literature) reviews typically do NOT have a clear methodological approach with subjective summaries (maybe from a biased author/expert).
**** A literature review is often limited because it supports a position taken by the author, whereas a systematic review begins with an answerable question and works through a planned process.
- Literature reviews: don't start with a ?, may be biased, are more broad, not evidence based, and offer a quaLitative or subjective position taken by the author.
- Systematic reviews: starts with a focussed clinical ?, follows a formal process, provides results from multiple studies, is evidence based, etc.
2) It is a type of systematic review ... pools the data to give a quaNtitative estimation of the magnitude of the effect. Basically it is a type of systematic review where the DATA in multiple studies are pooled.
3) The “subjects” in these Meta-Analysis studies are NOT people, they are the actual STUDIES being reviewed.
4) Need to make sure all "subjects" (studies) are similar. Ideally each study is a RCT, had blinding, and use same methods and measures.
5) In order to do an Meta Analysis, the studies have to be similar enough. You can’t compare study with comparing knee OA and people did exercise for 3 weeks, and then a different study did exercise for 40 weeks. Those two studies are NOT similar in their methods, so it is comparing apples to oranges.
You also can’t get several RCT’s and combine those with a bunch of case-reports and those all go into Meta-Analysis. Or comparing 18 yr olds in one study, to 65 yr olds in another study. Or one study does the TUG and another does the 10m walk test for balance … those two results can’t be combined and averaged. That won’t work since they aren’t similar studies. The studies have to be similar enough (there is a little bit of gray, but need to be similar enough).
So "inclusion" criteria for each study used needs to be similar with: participants, demographics, interventions, measurements, timing, etc.
PRISMA‐ Preferred Reporting Items of Systematic Reviews and Meta‐Analyses
MOOSE‐ Meta‐Analysis of Observational Studies in Epidemiology
PRISMA and MOOSE are databases to find these studies.
1) **** RRR and NNT are both dependent upon the ____________ of the injury.
2) I don't get the NNT concept ... If NNT is 90, it doesn't mean that 89 have the condition ????
1) incidence rate
Below are REVIEW BIG PICTURE CONCEPTS:
- Lack of evidence does not mean NO EVIDENCE
- Current evidence should be critically evaluated and peer reviewed
- Remember the parachutte example ... do we need to have evidence for everything? Do we need to do a study to see if using a parachutte prevents deaths compared to not using one?
- Clinicians should use EBP, but combine it with clinical experience, their own knowledge, patient needs/requests, other health professionals, etc. But the big 3 are: EBP, clinician experience, patient values
- Everything we do as a PT is doing the scientific method: observe, ask a question, research/study and create evidence, create a hypothesis, test the hypothesis, evaluate, come to a conclusion
What is the difference between a CASE REPORT and a GROUND ROUND:
What must a GOOD grand round include:
- Case Reports: Most likely your CI will have you go look at the literature and come up with a plan on how to treat a patient. Or if you get experience with a difficult case, then publish it to help other clinicians in the future who see that case. Can be retrospective or prospective.
- Grand Rounds = An important teaching tool and ritual of medical education (students) and inpatient care, consisting of presenting the medical problems and treatment of a particular patient to an audience consisting of doctors, residents and medical students. When a student presents a case to other clinicians and gets their input based on their experience. Where students / residents present a case and the more Sr. members of the group provide input. It is basically presenting a patient case, do a literature review, and get input from others.
Grand rounds depends on:
- Willing participants who can provide GOOD feedback
- Respect for everyone involved
- A review of the literature
- Get the student to think (not just given answers)
An expert clinician combines all these things:
Knowledge, clinical experience, clinician values (care, compassion), patient needs, and movement/exercise/treatment.