Regression Model and Terminology Flashcards
(25 cards)
What is Linear Regression?
- putting a straight line through data points and derive other outputs for inputs based on that
- i.e predicting the price of houses based on the size of the houses
- can have one or more input variables
What is the data called that is used to train the model?
Training Set
What is the input variable notation?
x
= input variable
- or feature
What is the output variable notation?
y
= output or target variable
What is the notation for the number of training sets?
m
What does (x,y) mean?
a single training example
what does (x’(i), y’(i)) mean? ‘ means ‘up’
the i-th training example
What is the flow of training a model and then working with it?
- Train the model/learning algorithm using the test data
- The Model creates a function/hypothesis that takes a x creates an estimation ŷ
What is the notation for the prediction of y?
ŷ -> y-hat
Estimated value of y
How can we represent f for a linear regression?
f_w,b_(X) = wx + b
_ means lower, _ again is normal
Can be shortened to f(x)
How can linear regression with one variable be also called?
Univariate linear regression
What does a Cost function tell us?
- how well a model is doing
- enables us to modify the model to do better
- takes the prediction ŷ and compares it to y: ŷ - y
- this is called the error
What are w and b called in f_wb(x)? What can they be used for?
- called parameters of the model
- are the variables used during training to improve the model
- also called weights / coefficients
What role have w and b in f_wb(x)?
- b is also called y-intercept as it describes where the line intersects with the y axis
- w describes the slope of the line
How does the cost function look?
- takes the prediction ŷ and compares it to y: ŷ - y
- this is called the error
- J(w,b) = 1/2m * (Sum from i = 1 to m for (ŷ⁽i⁾ - y⁽i⁾)²)
- m = number of training examples
- also called the squared cost function
- most commonly used function for regression and especially linear regression
How can you rewrite the cost function for regression?
J(w,b) = 1/2m * (Sum from i = 1 to m for (ŷ⁽i⁾-y⁽i⁾)²)
J(w,b) = 1/2m * (Sum from i =1 to m for (f_w,b(x⁽i⁾) - y⁽i⁾)²)
resolving ŷ with the function for the prediction
What is our goal with the cost function?
- find w and b where the cost function is always small
minimize J(w,b) using w,b
How can you simplify the cost function in order to solve the optimal parameter variables?
set b = 0
f_w(x) = wx
Cost function becomes: minimize J(w)
J(w) = 1/2m(Sum of i = 1 to m for (f_w(x⁽i⁾) - y⁽i⁾)²)
What can you do in order to find the optimum for the cost function?
Plot the values of the cost function J(w) with what the squared cost function with the different values for w look like
What is a contour plot and how does it help in the context of Cost functions?
- contour plots show in a plane of with varying ws and bs the results of the cost function
- at the minimum of the contour plot of the cost function you can find the minimum of the cost function
- this means the optimal values for w and b to minimize the cost function
What is Gradient Descent and where is it used?
- algorithm to find the best constellation of w and b to minizime J(w,b) the squared cost function
- used in machine and deep learning
- can be used to minimize any function not just cost functions
How do you approach Gradient Descend?
- Start with some w, b
- keep changing w, b to reduce J(w,b)
- until we settle at or near a minimum
- with some cost functions there can be more than one minimum
How does Gradient Descend work?
- checks for all surrounding value combinations of the parameters (w,b,…) where the direction of steepest descent is
- repeats this until you cannot find any point of descent around you
- leads you to local minima depending on where you start
- both w and b are updated simultaneously
What is the Gradient descent algorithm?
w = w - Alpha (d/dw)J(w,b)
w is assigned a new value
Alpha -> Learning rate, usually small positive number, controls how big the steps are taken down
(d/dw)*J(w,b) -> derivative of cost function J, in what direction you want to take the baby step
Also
b = b - Alpha (d/db)*J(w,b)