IFN580 Week 5: Regression Modelling (11%) Flashcards
(15 cards)
Logistic regression is a ________ regression technique that is used to model data
having a _____ outcome.
linear, binary
Simple regression assumes a __________ relationship between the input attribute(s)
and the output attribute.
linear
What is the equation for simple linear regression?
y = b0 + b1x
where b0 is the y-intercept, b1 is the slope and x is the predictor value
What type of data are regression models used on?
Linear regression is used on continuous numerical variables, Logistic is used on binary classification
Nonlinear regression fits what type of line?
Fits a non-linear function
What is R2?
Measures how much total “noise” (variance) is reduced
What is Root Mean Squared Error (RMSE)
Measures how much deviation occurs at each point
Lower is better
What does it mean if RMSE is high but R² is also high?
The model explains the trend well, but absolute prediction errors are large
What is polynomial regression?
A form of regression where the relationship between the independent and dependent variable is modeled as an 𝑛th degree polynomial
What’s the formula for residual
Residual = y^ - yi
where y^ = predicted value
yi = observed value
When do we use Logistic regression?
When the data is binary
when data is always between 0 & 1
when the predicted values are the probability
What are the assumptions for Linear Regression?
- Linearity = there’s a linear relationship b/w the two variables
- Independence = Residuals are independent
- Homoscedasticity = residuals have constant variance @ every level of x
- Normality = residuals follow a normal distribution, centered at 0
What are the assumptions for Linear Regression?
- Curvilinear
- Independence = Residuals are independent
- Homoscedasticity = residuals have constant variance @ every level of x
- Normality = residuals follow a normal distribution, centered at 0
Which 𝑅2 value indicates that the line perfectly fits the model?
1
0 would mean that there is no linear relationship
The objective of a support vector machine is to find an optimal ________ that best
separates classes from a dataset
hyperplane