statistics Flashcards

(95 cards)

1
Q

What is nominal data?

A
  • Categories without order
    • eye colour
    • marital status
  • Discrete data
  • Qualitative
  • Non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ordinal data?

A
  • Ordered categories
    • e.g. fiscat grades
  • Discrete data
  • Qualitative
  • non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is integer data?

A
  • Number of counts
    • papers published
  • Discrete
  • Quantitative
  • parametric or non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Ratio data?

A
  • Zero at origin
  • value dependent on units
    • e.g. age,distance
  • Continuous data
  • Quantitative
  • Parametric/non parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is interval data?

A
  • Distance between units are of known size
    • e.g. hours spent revising
  • Continuous
  • Quantitative
  • Parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the different types of distribution curves?

A
  • Normal distribution- bell shaped curve
  • Skewed distribution
    • positive
    • negative
  • Kurtic distribution
  • Platykurtic distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is skewed distribution?

A
  • asymmetrical
  • tail
  • positive or negative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is used in skewed distribution to measure the central tendency?

A
  • Median or mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is kurtosis?

A
  • Measure of the relative peakness or flatness of a distribution cf normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is leptokurotosis?

A
  • Positive kurtosis
  • indicates a realtively peaked distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is platykurtosis?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name given to how normal data can be normalised in order to allow parametric testing

A
  • transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is mean?

A
  • The average of the data
  • measured by dividing the sum of all observations by the number of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is median?

A
  • The central value of the data
  • used for ordinate data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mode?

A
  • The data value with the most frequency
  • used for nominal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In perfectly normalised data what is significant about the mean. median and mode?

A
  • They are the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the range?

A
  • The lowest and highest values of data
  • the range does not give much information about the spread of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is percentiles?

A
  • grouping of data into brackets of 1%, 10%, or more commonly 25%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is variance?

A
  • the measure of spread where the mean is a measure of the central tendency
  • variance is the correct sum of the squares about the mean
  • (σ (x-mean)2/ (n-1) )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the standard deviation?

A
  • The square root of the variance
  • for a resonable symmetrical shaped bell data, one standard deviation contains roughly 68% of the data, 2 SD contains roughly 95% of the data, 3 SD contains 99.7% of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is normal distribution defined by?

A
  • 2 parameters
  • the mean
  • the standard deviation
  • symmetrical = mode= median= mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the coefficient of variation?

A
  • SD/mean x 100
  • indicates how big the SD is in comparison with the mean
  • if SD high then the data are highly variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the standard error of the mean?

A
  • as the SD divided by the square root of the sample size
  • used in relation to sample rather than the population as a whole
  • the formula does not assume a normal distribution
  • it measures how closley the sample mean approximates the population mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the confidence intervals?

A
  • ranges on either side of a sample mean giving a rapid visual impression of significance
  • CI are equal to the values between the confidence limits and area set of number of standard errors of estimate size
  • for a large sample size 95% CI are approx 2 SEMs either side of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why are CI prefered to P values?
* Ci's relate to sample size * a range of values are provided * CI provide a rapid visual impression * CI have the same units as the variables
26
what is a null hypothesis?
* **where a primary assumption is made that any difference seen occurred purely by chance** * collected data are then tested to disprove the null hypothesis * if the result is significant the null hypothesis is rejected on the basis that it is wrong
27
What is the p value?
* 5% probability that the difference was seen was due to chance * often p = 0.05 * if the p value is \< 0.05 then this suggests the probability of the difference seen being due to chance is less than 5%
28
what is type 1 error?
* **When a difference is found ** * **but in reality there is not a difference** * ie a false positive * null hypothesis is rejected incorrectly * this is the 5% of cases where the difference occured by chance * ie convincting an innocent man
29
How can we protect against type 1 errors?
* Reducing significant levels ( although this increases type 2 errors) * as reduce p values= reduces type 1 errors * but then bigger samples sizes are required to protect against type 2 errors
30
What is a type 2 error?
* **When no difference is found but in reality a difference does exist** * is a false negative * there fore the null hypothesis is falsely accepted * failing to convict a person guilty of the crime
31
what is a type 2 error the result of?
* **Small sample size** * nb important to preform **power analysiss before** undertaking the study * protect against type 2 errors by statistical power * type 2 errors are common in ortho studies
32
what is a type 3 error?
* occurs rarely * **when the researcher correctly rejects the null hypotheiss but incorrectly attributes the cause** * ie the researcher misinterprets the cause and effect
33
What is a statistical power analysis?
* A method for determining the number of subjects needed to study in order to have a resonable chance of showing a difference if oen exists
34
What is statistical power?
* **is the probability of demonstrating a true effect or statistically significant difference** * 1-ß * expressed as a % * is the probability that the test will correctly reject the null hypothesis * if the power of the expt is low then there is a good chance the expt wil be inconclusive or give a type 2 error
35
What are the factors affecting power analysis?
1. **Size of the difference between the means** * the larger the difference the easier to detect a difference & \> the power 2. **Spread of the data** * the larger the spread, the less likely a difference will be detected 3. **Acceptable level of significance** * is the p value set 4. **sample size** * power increases with increasing sample size
36
What is an observational study?
* **The investigator observes rather than alters events.** * e.g. review of PE after THR
37
What is an experimental study?
* **the investigator applies a maneovre and then oberves the outcome** * e.g. a surgeon may conduct a rct cf warfarin & heparin on the prevalence of DVT in pt with thr
38
What is the different study timelines?
* **Retrospective** study * **Prospective** study * **Cross sectional** study
39
What is a retrospective study?
* **The outcome of interest has already occurred and teh pt or cohort ( group) is followed forward in time from a point in the past**
40
What is a prospective study?
* **follows pt or cohort forward in time** * _stronger than a retrospective study_
41
What is a cross sectional study?
* **examines pts or events at 1 point in time without followup** * used when looking at the prevalence of a condition or desscribing distribution of variables
42
what is type I and type 2 errors set at when conducting a study?
* type 1 = 0.05 * type 2 = 0.20 * ie power of 80%
43
When preforming mutliple tests there is a risk of what? how can this be corrected
* **increased risk of type 1 error ** * so **p value** may have to be **decreased** * or preform **Bonferonni correction**
44
What is the Bonferonni correction?
* if one is testing n independent hypotheses, then one should use a significance level of 0.05/n. * e.g. 2 independent hypotheses then a result would be declared sig if P is less than 0.025
45
What are the features of a parametric test?
* Assumes data were sampled from **normal population** * observations must be **independent** * populations must have the **same variance** * can use **absolute difference between data points** * **increased power for a given sample size n** * rarely exists in orthopaedics
46
What are the features of non parametric tests?
* **No assumptions** are made about origins of data * **no limitations on types of data** * **rank orde**r of values * **less likely to be significant** * **decreased power for a given n** * cannot relate back to parametric properties of data
47
when is a paired t test used?
* When there is a **pair of observations on a single subject ** * e.g. blood pressue before and after application of tournquet * aka students T -test
48
Where there are multiple observations in a normal distribution what test is used?
* Analysis of one way variance **ANOVA** * **it determines the proabability that 2 or more samples were drawn from the same parent population**
49
When is a unpaired t test used?
* used to **compare 2 random samples provided they both follow a normal distribution** * samples can be differing size but should be independent * ie no chance that a subject could appear in both of the groups being tested
50
When is a chi squared test used?
* for **qualiatitve data** * used only on **actual numbers of occurances** ( frequencies) but not porportions/%/means or derived statistics * the test compares **distribution of a categorical variable in a sample with a distribution of a categorical variable in another sample** * it assess whether the observed data
51
What does correlation measure?
* the **degree of association between 2 parameters**, with the correlation coefficient r being anywhere inbetween -1 and +1
52
What is pearson's coefficient?
* a measure of linear ie parametric association * if one pararmeter increases as the other does, then the correlation coefficient is positive
53
What is the data for coefficent is always expressed on what?
* A scatter plot * if a curve line is needed to express the relationship , then a more complicated measure of correlation must be used- Spearman's rank non parametric data
54
What is the regression coefficient?
* regression is a straight line drawn over the scatted plot using the equation y=a+bx * the regression coefficient is the **direction** of the regression line
55
What is regression show?
* shows **how one variable changes on average with another** * it can be used to find out what one variable is likely to be when the other is known * regression relationships may be linear, multiple or logistic
56
What does the regression function r2 show?
* **Indicates the amount of variance in the dependent variable is related to variance in the independent variable.** * ie if knee pain correlates with walking distance by r2= 0.6 then 60% of the variation in walking distance can be explained by variation in knee pain. The remaining 40% of variability is not explained
57
List the types of studies and the level of evidence?
* _Level 1_ * **Meta-analysis/Systematic reviews of RCT ** * **Randomised controlled trials ** * _Level 2_ * **Prospective cohort study ** * **systematic review of level 2 trials** * _Level 3_ * **Case control study** * **Retrospective study** * **Systematic review of level 3 trials** * _Level 4_ * **Case series** * _Level 5_ * **expert opinon**
58
What are expert opinons?
* Experts in their field has to say on a given subject * level 5 evidence
59
What are case series?
* Level 4 evidence * the outcomes of the group are reported, but there is **no comparison group/ control group** * **weak in relation to causation** * should act as a stimulus for more powerful studies
60
What are case- control studies?
* **retrospective studies** where cases are gathered with a **certain outcome** and then **compared with controls that did not have the same outcome in order to look back at the effects of interventions /tx** * **quick and cheap to preform** * **limited by methological bias**
61
what are cohort studies?
* 2 groups, **one** of which has undergone an **intervention** or tx are **followed up over time** in order to **compare outcomes** such as onset of disease or adverse effects * useful in identify incidence and established relative risk
62
What are the disadvantages of cohort studies?
* **Diagnostic access bias**- due to preselection * **expense** * **decreased validity** - due to loss to follow up
63
What are RCT?
* Gold standard * **groups of patients are randomised to either recieve or not recieve an intervention or tx, and the outcomes are compared in a prospective manner**
64
What are the criteria of an RCT?
* Randomisation * Generalisability * Sample selection * Outcome selection * Bias * Confounding factors * Masking/blinding * Ethics * Publication * Sequenetial analysis * Equvalence study
65
What is randomisation?
* it ensures that all prognostic variables both known and unknown will probably be distributed equally amongst the tx groups * this **avoids bias in treatment assignment**
66
what should the outcome measures be?
* Valid * Reproducible * responsive to change * choice of outcome clinically relevant
67
What is intention to tx?
* ie if a subject drops out during the study/tx the subject should still be included in the analysis * opposite is analysis per protocol/study
68
What is bias? how can it be redued?
* Refers to flaws in impartiability that introduces systematic error into methodology and results in a study * Reduced by * **Randomization** * **masking** ( blinding) * meticulous attention to **study protocol**
69
Name the types of bias?
* **experimental** * during either selection or tx * reduced by randomisation * **Observational** * errors in measurement or classification of disease * use of hip and knee scoring systems * **Patient bias** * **Publication bias**
70
What are the confounding factors?
* Independent variables that interfere with the drawing of statistically valid conclusions from a study * these factors may not be distrubuted equally between groups =\>**confounding bias**
71
How can confounding bias be reduced?
* Matching e.g. age * Stratification
72
What is an equivalent study?
* A RCT in which 2 treatments are expected to have the same outcome * the research hypothesis is that there is a difference between the 2 groups aka alternative hypothess cf null hypothesis
73
What is different between meta- analysis and systematic review?
* Meta - analysis- aim is to find **relevant evidence** from **several studies** in an unbiased manner and to apprasie each paper in all rct for metholoigcal quality. results are reported as a common estimate with confidence intervals * SR in that no common estimate of confidence intervals * cochrane collabration organises and publishes highly detailed systematic reviews in its database.
74
What are the screening criteria?
* "IATROGENIC" * **_Important_** conditon with known **Incidence** * **_Accepted_** and effective tx * **_Treatment_** and diagnostic facilities available * **_Recognizable_** latent and early symptomatis stages and consideration given as to whether early pick up at the latent stage leads to intevention adn whether intervention improves outcome * **_Opinions_** on who to tx are agreed * **_Guaranteed_** safety, sensitivity and specificity of test * **_Examination_** &/or tx are acceptable to pt * **_Natural history_** of condition known * **_Inexpensive_** tests, simple to preform * **_Cost effective screeening_** with a policy drawn up on whom to tx and it should be **_Continuously_** rolled out and repeated at intervals
75
What is epidemiology?
* Is the study of **frequency** and **cause of diseases** in human populations
76
What is incidence?
* **Is the rate of occurance of new disease in a population previously free of disease** * the rate is found by dividing the number of new cases in the study period by the number of individuals at risk at the beginning of the study period
77
What is prevalence?
* **Is the frequency of a disease at a given time** * found by dividing the no of patients with the disease by the sum of the number of patients with the diease and number of patients at risk
78
What is sensitivity?
* **The ability of the test to exclude false negatives** * ie the ability of the test to pick up all causes of disease * **no true positives / true positive + false negative**
79
What is specificity?
* **Ability of the test to exclude false positives** * ie ability to exclude the disease * no of true negatives/ true negatives + false postives
80
What is the positive predicitive value?
* **Is the probability that a subject who tests positive is truly positive** * ie the PPV indicates the significance of a positive test * PPV= true positive/ true positive + false positives
81
What is the negative predictive value?
* is the probability that a subject who test negative is truly negative * is the NPV indicates the significance of a negative test * NPV= True negative/ true negative + false positives
82
What is accuracy?
* **Gives an idea of how often a test is correct** * True positive + true negatives/ True positive + False positive+ True negative + false neagative
83
What is odds ratio?
* Used in case control studies * **is the ratio of the odds that an event will occur in one group to the odds that the event will occur in the other group.** * OR = (c/d)/( a/b) * = cb/ad
84
How is relative risk reduction measured?
* **Success rate of tx group - sucess rate of control/ success rate of control** * succes rate of tx = c/ (c=d) * Success rate of control group = a/(a+b)
85
What is validity?
* is the extent to which a test or outcome measure actually measures what it purports to measure * test have to be **precise** ( consistency of repeated measures ) and **accurate** ( represent what they mean to represent)
86
What are the different type of validity?
* **Construct validity** * the extent to which a measure corresponds to theoretical concepts or constructs concerning the phenomenon of interest * **Content validity** * the extent to which a measure represents the domain of interest * **Criterion or concurrent validity** * correlating scores on a new instrument or test with external criteria known or believed to measure the attribute
87
What does realiabilty assess?
* The random error of a measure * important to consider reliability within the same assessor **intra-observer** and different- **interobserver**
88
What is kappa analysis?
* Involves adjusting the observed proportion of agreeement in relation to the porportion of agreement expected by chance * Used for categorical data * a value of 1.0= complete agreement * a value of 0 = agreement can be explained purely by chance * a negative value= systematic disagreement * can be **weighted or unweighted** * weighted kappa statistics allow for the measuring of observer agreement in rank scales taking into account agreeement by chance and bringing the magnitude of disagreement into calculation
89
What is survival analysis?
* Is the study in which the **outcome of an intervention is plotted over time which allows for variable dates of entry and for patients to be followed up for different lengths of time** * analysed continuously = **actuarial method** * times at failure= **Kapalan -Meier product limit method** * combo of both= **life table analysis**
90
How do you construct a life table for joint replacements?
* **Define end point / outcomes** * sucess * failure * death * revision * For each joint replacement **the number of joints** being followed and the **no of failures** are determined for **each year** after operation * at each time point the **no of pts at risk, the no of failures, no of pts withdrawn** ( death/LTFU) recorded. * pts who complete trial and deaths= successful withdrawals / censored data/ non endpoints * these don't count as failures and only affect no of pts at risk * each year the no of pts at risk calculated = as no of pts at beginning of year- 1/2 the no of withdrawals
91
How is the percentage failure rate for each year calculated?
* **no of failures/ no of pts at risk during period**
92
how is the cumulative estimate survival calculated?
* Cumulative estimated survival = **100%- cumulative proabability of failure**
93
How is the annual survival rate calculated?
* By cumulating the success rate for all previous years and year in question
94
When is the survival rates measure in a life-table analysis cf a kaplan-meier analyses?
* Survival rates annually for life table analysis * Recalculated every time a failure occurs for a kaplan meier * the steps in the graph represent failures at each time point
95
What is the survivorship cuvre?
* **IS the cumulative estimate of suvival plotted with 95% CI** * upwards blips of solida circles are used to represent censored data on graphs * when reporting survival analysis on emust inlcude 95% CI , best and wirse case scenerios adn no of pts left at longer follow-up