Week 5 Flashcards
Quantitative data: Identifying the appropriate classification scheme for variables can help determine what?
what types of statistical methods are appropriate for describing data or for making inferences.
Quantitative data: Identifying the appropriate classification scheme for variables can help determine what types of statistical methods are appropriate for describing data or for making inferences. Give 2 examples of these schemes and describe
1) Discrete
Variables have a “yes” or “no” value
Alive or dead
Number of hospitalizations
2) Continuous
Infinite number of values within a given range
Age
Body weight
What are the exceptions to quantitative data rules?
Exceptions to the rules:
Ordinal data may imply an underlying continuous scale when large numbers of categories are present.
Example: using a pain scale of 0 – 100
Interval/ratio data may be discrete if the variable can only take on integer values.
List 3 measures of central tendency (Quantitative Data)
Mean – average; applicable with interval and ratio values
Median – represents central tendency better than mean if outliers are present
Mode – not commonly used in clinical research
List and describe some measures of dispersion (Quantitative Data)
Range – difference between the lowest value in the data from the highest value in the data
Interquartile range (IQR) – restricted to values that lie within the middle 50% of the distribution
Variance – how far the values of a variable lie from the mean
Standard deviation – square root of variance
Coefficient of variation – standard deviation / mean
Skewness – indicates that the data are not evenly distributed around the mean, in other words, more of the data are concentrated to either the right or the left of the mean value and the “tail” on the opposite side of the mean is longer
Quantitative Data: List some ways to organize and visualize
Tables: Frequency table
Plots: Box & whisker plot
Graphs
Charts: Bar Chart, Pie Chart
Describe box and whisker plots
IQR = box = middle 50% of the distribution
Median noted by line in the middle of the box
Minimum and maximum values noted by whiskers
Mean noted by diamonds
Useful when making comparisons across different groups that may not have equivalent underlying distributions
Data: Define rates and give examples
Proportions over a specific time-period with a base (multiplier)
Morbidity and mortality
Incidence and prevalence
Describe morbidity and mortality rates
1) Centers for Disease Control and Prevention’s National Center for Health Statistics Data:
-2,437,163 deaths in the United States in 2009
-US population of 307,024,820 yields a mortality rate of 793.8 deaths per 100,000 population in the United States in 2009
Define incidence and prevalence
1) Incidence: measures the risk of developing an outcome
2) Prevalence: measures the probability of having an outcome
-More appropriate to say “point prevalence” compared to “prevalence rate” since this is a proportion and not a rate
True or false: You can’t have a test that’s 100% specific and 100% sensitive
True
Describe diagnostic data
1) How do you truly identify patients that have an outcome (true positive) and avoid detection of an outcome in patients that do not have it (false positive)?
2) Sensitivity (true positive rate)
3) Specificity (true negative rate)
1 – false positive rate = specificity (true negative rate)
4) Most tests used to diagnose diseases have a sensitivity of 80% and specificity of 90%
-Sensitivity is gained at the expense of specificity and vice versa
-Receiver operating characteristic curves further define this tradeoff
Describe
The area under the curve for ROC-1 is closer to 1 and further from the chance diagonal. It would identify a larger number of true positives.
Point A: larger proportion of patients with the outcome are detected and there are more false positives
Point B: detects less true positives and less
Diagnostic data: Describe the ADA Standards of Care (2022)
Is a BMI > = 25 kg / m2 an appropriate risk factor for diabetes for all ethnicities?
“…BMI cut points fall consistently between 23 and 24 kg/m2 (sensitivity of 80%) for nearly all Asian American subgroups (with levels slightly lower for Japanese Americans).”
“An argument can be made to push the BMI cut point to lower than 23 kg/m2 in favor of increased sensitivity; however, this would lead to an unacceptably low specificity (13.1%).”
Describe Normal distribution (bell curve)
1) z distribution / z scores
-Mean = 0
-SD = 1
-Continuous variables
2) t distribution / student’s t distribution
-Modification to the standard normal distribution (or z distribution) when the sample size is relatively small
-Useful whenever the actual population standard deviation is not known or when a good estimate is not available
-Continuous variables
Statistical Distribution (other than standard distr.); describe:
1) Binomial distribution
2) F distribution
3) Poisson distribution
4) Gamma distribution
1) Mutually exclusive categories of data
2) Analysis of variance (ANOVA) and linear regression analysis
3) Useful for detecting rare outcomes
4) Variable of interest is interval or ratio but is very highly skewed
Stat. distribution: List the 3 main bullet points of the Central limit theorem
1) The mean of all sample means will equal the population mean
2) The standard deviation of the sampled means is equal to the standard error of the mean
3) As the sample size increases, the distribution of the sample means will approach a normal distribution regardless of the underlying distribution of the variable
Define and describe Statistical inference. When are latin or greek letters used?
1) Process of analyzing data from a sample and using those results to infer the related values in the source or target population
2) Data related to the population of interest are referred to as parameters and are usually represented by Greek letters
3) Data from a sample is referred to as a statistic and is represented by Latin letters
Inference:
1) Define stat. estimation
2) Define hypothesis testing
1) Process by which estimates of the population parameters are generated from sample statistics with a focus on generating precise estimates with minimal bias
2) Making a conclusion about a hypothesized difference or relationship using observations from the sample
Inference: Define and describe Statistical estimation
1) Mean, median, and standard deviation are basic examples of estimation
2) Point estimation: One single value is estimated for the statistical quantity of interest (e.g., mean)
3) Interval estimation
-Confidence intervals
-Inference of the precision of an estimate
Hypothesis testing: What makes a good hypothesis?
Declarative
Describes a relationship between two or more variables
Hypothesis testing: What is the Null hypothesis (HO)? Describe
1) No relationship or no difference between the variables of interest
2) Conclusions of studies are made with respect to the null hypothesis
Reject
Fail to reject
Hypothesis testing: what is the opposite of the null hypothesis?
Alternative hypothesis (HA)
Hypothesis testing: What are the 2 types of errors? What do they have in common?
Type I error (α)
Type II error (β)
Both of these errors represent quantities that the researcher sets acceptable levels for when designing the study before (ad hoc) data are analyzed.