CHAPTER 5 Regression for Describing and Forecasting Flashcards
(87 cards)
What is regression?
Finding the line of best fit through data to describe the relationship between two or more variables.
What are the two primary uses of regression?
- Description
- Forecasting
What is overfitting in regression?
A problem that arises when a model is too complex and fits the noise in the data rather than the underlying relationship.
What does a regression equation express?
A linear relationship between a dependent variable and an independent variable.
What are the regression parameters in a regression equation?
- α (intercept)
- β (slope)
What does the intercept (α) represent in regression?
The predicted number of crimes on a day when the average temperature is 0 degrees Fahrenheit.
What does the slope (β) represent in regression?
The amount that predicted crime increases with each degree Fahrenheit.
What is the sum of squared errors (SSE)?
A measure of how well a regression line fits the data, calculated by summing the squared errors of each observation.
What is ordinary least squares (OLS) regression?
A method to find the values of α and β that minimize the sum of squared errors.
What are OLS regression coefficients?
The values of the parameters α and β that minimize the sum of squared errors.
How do we interpret the OLS regression line?
It estimates the average number of crimes based on temperature and shows how crime changes with temperature.
True or False: The regression line is always the best fit for all types of data.
False
What is a conditional mean function?
A function that tells you the mean of some variable conditional on the value of another variable.
What is the relationship between the regression line and conditional means?
The regression line is the best linear approximation to the conditional means.
What would you minimize to find the conditional median?
The sum of the absolute values of the errors.
What was a historical reason for using the sum of squared errors?
It is computationally easier to calculate using linear algebra.
What does the regression line tell us about crime and temperature?
It predicts the average number of crimes based on temperature.
What does it mean to regress crime on temperature?
To run an ordinary least squares regression where crime is the dependent variable and temperature is the independent variable.
What is the significance of the slope of the regression line?
It indicates how much crime changes as the temperature changes.
Fill in the blank: The line of best fit minimizes the _______.
sum of squared errors
What is the graphical representation of errors in regression?
Vertical lines from data points to the regression line.
In the context of regression, what does ‘parsimonious summary’ mean?
A concise representation of the data using the least number of parameters.
What happens if we use an arbitrary line in regression?
It will yield poor forecasts.
What is the role of statistical software in OLS regression?
To calculate the values of α and β that minimize the sum of squared errors.