Linear Regression Flashcards
(39 cards)
What is gradient descent?
an algorithm that tweaks parameters iteratively (individually) in order to minimize a cost function
What is batch gradient descent?
instead of computing gradients individually, it computes them all in one go by using the whole training set at each iteration
what is the downsides to batch gradient descent?
because it uses the whole set at each step, it is very slow on large sets
what does a higher learning rate mean with gradient descent?
fails to find a good solution
what does a lower learning rate mean with gradient descent?
takes longer to compute
how does gradient descent perform with features with different scales?
it takes longer to reach the minimum, making the algorithm slower
how does gradient descent perform with features with same scales?
it goes directly to the minimum without jumping around, making the algorithm faster
what algorithm is better to use with Linear Regression out of -> Gradient Descent or Normal Equation when you have a larger dataset? Why?
Gradient Descent because it is faster
out of gradient descent and normal equation, which is faster, why?
Gradient descent, because it handles instances one at a time.
what is a cost function?
?
what is stochastic gradient descent?
as opposed to batch-gd which uses the whole set at each step, sgd picks a random instance and handles them one at a time, making it much faster and better for bigger sets
which gd algorithm is better for large sets?
sgd because it handles instances one at a time, instead of using the whole set at each step like bgd
what happens when you reduce the sgd’s learning rate slowly?
jumps around for ages
what happens when you reduce the sgd’s learning rate quickly?
get stuck in local minimum or frozen
what is mini-batch gradient descent?
computes gradient on small random set (both sgd and bgd)
what is polynomial regression?
when the data is too complex for a straight line, you can use powers on every feature and add them as new features, then use linear regression
what is good about polynomial regression?
it finds relationships between features, and can be used when a straight line wont fit the data
what are some regularized linear regression models?
ridge, lasso, elastic net
what is ridge regression?
a regularized version of linear regression that forces the learning algorithm to git the data, but also keep the model weights as small as possible
what happens if the ridges a=0?
it is linear regression
what happens if the ridges a=large?
all weights end up very close to zero, resulting in a flat line
what is an important step before using ridge regression?
scale!
what should you do before ridge regression?
scale!
what is lasso regression?
a regularized version of linear regression that eliminates weights of least important features and automatically performs feature selection and outputs a sparse model