quiz 5 (pt. 2) Flashcards
(20 cards)
What is the simplest measure of the relationship between two scalar variables?
correlation
Why would we want to compare two or more scalar variables in research?
- Examine potential patterns or relationships
- Identify shared variability
What does Pearson’s coefficient of correlation (r) measure?
Strength and direction of the linear relationship between two variables.
What is the main limitation of Pearson’s correlation coefficient?
- Parametric
- Normally distributed
- Sensitive to outliers
- Assumes a linear relationships
What is the difference between correlation and causation?
- Correlation: refers to a statistical relationship between two variables, but it does not imply that one variable causes the other.
- Causation: suggest that one variable directly influences the other.
What is an example of a situation where correlation does not imply causation?
Number of pirates and global temperatures. As the number of pirates decreased over time, global temperature increased, but this doesn’t mean pirates caused global warming; it’s a spurious correlation
What type of correlation would you expect between height and weight in a population?
Positive correlation; both tend to increase together in population
What is Spearman’s rank correlation, and when is it used?
Non-parametric version of correlation that works by ranking the data and then calculating the difference between ranks. It is used when the data doesn’t meet the assumptions of Pearson’s correlation.
What happens to the value of r if the axes are switched in a correlation plot?
If the axes are switched in a correlation plot, the value of r remains the same because correlation is symmetrical. The relationship between the variables is the same, regardless of which variable is placed on the x-axis or y-axis.
What is an important consideration when interpreting significant correlations in observational studies?
In observational studies, even if a correlation is significant, we cannot assume that one variable causes the other. It’s important to consider potential confounding variables and the possibility of reverse causality or other underlying factors that may influence the observed relationship.
What is linear regression used for in research?
Linear regression is used to build a mathematical model that describes the relationship between one or more predictor variables (independent variables) and a response variable (dependent variable). It helps to predict the value of the response variable based on the predictors
What is bivariate linear regression?
Bivariate linear regression involves using a single predictor variable and an intercept to explain or predict the variation in a response variable. For example, height might be used to predict weight using a straight-line model.
How does multiple linear regression differ from bivariate linear regression?
Multiple linear regression involves two or more predictor variables. It accounts for the combined effect of multiple predictors, whereas bivariate linear regression uses only one predictor to explain the response variable.
What does the formula y = mx + b represent in linear regression?
The formula y = mx + b represents the linear equation of a line, where y is the predicted response, m is the slope (indicating the rate of change), x is the predictor variable, and b is the y-intercept (the value of y when x is zero).
What is the purpose of using Ordinary Least Squares (OLS) in regression?
Ordinary Least Squares (OLS) is used to minimize the sum of the squared differences between the observed data points and the values predicted by the model. This helps to find the line that best fits the data.
What is R², and why is it important in linear regression?
R² is the coefficient of determination and measures the proportion of variation in the response variable that can be explained by the predictors. A higher R² indicates a better fit of the model to the data
What is the role of ANOVA in regression analysis?
ANOVA (Analysis of Variance) is used to compare the variance explained by the regression model with the variance of the data around the mean. It helps to determine if the model significantly improves the prediction compared to a simple model (null model)
What is the Akaike Information Criterion (AIC), and how is it used in model selection?
The Akaike Information Criterion (AIC) is a measure used to compare different regression models. It considers both the goodness of fit and the number of parameters in the model. A lower AIC indicates a better model, balancing fit and simplicity
Why is it important to avoid over-parameterization in regression models?
Over-parameterization occurs when too many predictor variables are included in a model, leading to overfitting. This can result in a model that fits the sample data well but performs poorly on new or untested data. It’s important to include only significant predictors to improve generalizability.
How can linear regression be applied to predict heart disease-related outcomes, like hypertension?
Linear regression can be used to predict heart disease-related outcomes, such as hypertension, by modeling the relationship between predictors like BMI, age, and cholesterol levels. A model might use BMI as a predictor for systolic blood pressure, allowing clinicians to estimate a patient’s hypertension risk.