Correlation and Regression Analysis Flashcards
What does correlation measure in statistics?
Correlation measures the strength and direction of the relationship between two variables.
Explain the difference between positive and negative correlation.
Positive correlation means that as one variable increases, the other also tends to increase, while negative correlation means that as one variable increases, the other tends to decrease.
How is the strength of correlation determined?
The strength of correlation is determined by the absolute value of the correlation coefficient, with values closer to 1 indicating stronger correlation.
What does a correlation coefficient of 0 indicate?
A correlation coefficient of 0 indicates no linear relationship between the variables.
What is the range of values for the Pearson correlation coefficient?
The range of values for the Pearson correlation coefficient is -1 to 1.
When should you use Spearman’s rank correlation coefficient instead of Pearson’s?
Spearman’s rank correlation coefficient is used when the relationship between variables is not linear or when the data are ordinal.
Describe the process of linear regression analysis.
Linear regression analysis involves fitting a straight line to the data points to model the relationship between a dependent variable and one or more independent variables.
What is the difference between simple linear regression and multiple linear regression?
Simple linear regression involves one independent variable, while multiple linear regression involves two or more independent variables.
How do you interpret the slope coefficient in regression analysis?
The slope coefficient represents the change in the dependent variable for a one-unit change in the independent variable.
What does the intercept term represent in a regression equation?
The intercept term represents the value of the dependent variable when all independent variables are set to zero.
What assumptions must be met for regression analysis?
Assumptions include linearity, independence of errors, homoscedasticity, and normality of errors.
What is the purpose of residual analysis in regression?
Residual analysis involves examining the differences between observed and predicted values to assess the model’s performance.
What are influential points in regression analysis?
Influential points are data points that have a large impact on the regression coefficients and may significantly alter the results.
How do you assess the goodness of fit in regression analysis?
Goodness of fit is assessed using measures like R-squared, which indicates the proportion of variance in the dependent variable explained by the independent variables.
What is multicollinearity, and why is it problematic in regression?
Multicollinearity occurs when independent variables in a regression model are highly correlated, leading to unreliable estimates of regression coefficients.
Explain the concept of homoscedasticity in regression analysis.
Homoscedasticity means that the variance of the errors is constant across all levels of the independent variables.
What is autocorrelation, and how does it affect regression analysis?
Autocorrelation occurs when errors in a regression model are correlated with each other, violating the assumption of independence of errors.
What is heteroscedasticity, and how does it differ from homoscedasticity?
Heteroscedasticity means that the variance of the errors is not constant across all levels of the independent variables.
When should you use logistic regression instead of linear regression?
Logistic regression is used when the dependent variable is binary or categorical.
What is the purpose of diagnostic plots in regression analysis?
Diagnostic plots help identify potential problems with the regression model, such as non-linearity or heteroscedasticity.
How do you detect outliers in regression analysis?
Outliers can be detected by examining residual plots or leverage statistics.
What are the assumptions of logistic regression?
Assumptions include linearity of the logit, independence of observations, absence of multicollinearity, and absence of influential points.
How do you interpret odds ratios in logistic regression?
Odds ratios represent the change in odds of the dependent variable for a one-unit change in the independent variable.
What is the Akaike Information Criterion (AIC), and how is it used in regression analysis?
AIC is a measure of the relative quality of statistical models, with lower values indicating better fit.