Question 1

What is linear regression?

Accepted Answer

- It is a supervised machine learning algorithm - that finds the best fit linear line between independent and dependent variable (i.e. finds linear relationship between independent and dependent variable)

Question 2

There is only one optimal line that best fits data in linear regression. Is the statement true or false?

Accepted Answer

True

Question 3

How to calculate error function?

Accepted Answer

For N number of data points,

Take the sum of squared error between y value of each data point and y hat (predicted y value) on regression line multiply by 1/2N

Error function is the average sum of squared errors

Question 4

For error function, why do we need to take the average squared error sum (i.e. why do we need to divide by N)

Accepted Answer

It is to remove the dependency on the data set size. The error magnitude is thereby independent of N.

Question 5

For error function, sine it is average squared error sum why do we need to divide by 2 in the denominator --> divide by 2N instead of just N

Accepted Answer

It is to make calculations neater when finding the derivative, the cancel out the 2 in the numerator

Question 6

How to calculate new value of W0 and W1 (parameters)?

Accepted Answer

W0' = w0 - L(1/N) (partial derivative) ; W1' = w0 - L (1/N)(partial derivative) , where L is the learning rate

Question 7

What is the learning rate, L?

Accepted Answer

It is the rate that controls the rate of change of values of W0, W1 at each step

Question 8

What happens if the learning rate is too large?

Accepted Answer

The algorithm may not converge (and sum of squared error may increase)

Question 9

What happens if the learning rate is too small?

Accepted Answer

The gradient descent function will take a very long time to converge

Question 10

What are the steps in gradient descent function? [4]

Accepted Answer

1. Start with a random value of (W0, W1) and define learning rate (e.g. L = 0.000001) 2. Calulate partial derivative of error function with respect to W0 and W1. 3. Update current values of W0, W1 W0' = w0 - L (partial derivative) ; W1' = w0 - L (partial derivative) 4. Repeat steps 2-3 till convergence

Question 11

How does the gradient descent function know when to stop?

Accepted Answer

It stops when it converges.

Question 12

How can we define the convergence of the gradient descent function?

Accepted Answer

We can set a convergence condition to be achieved OR run the algorithm for a fixed number of iterations

Question 13

Argmin, by definition, tries to find the lowest cost value of the objective/error function. Is the statement true or false?

Accepted Answer

False. It tries to find the INPUT VALUES that minimises the cost of the error function

Question 14

Gradient descent algorithm is used to find the least optimal parameters and eliminate them. Is the statement true or false?

Accepted Answer

False. It is used to find the most optimal parameters.

Question 15

When learning rate is large, then the gradient descent algorithm takes a long time to converge because the weights are updated with smaller values. Is the statement true or false?

Accepted Answer

False. When the learning rate is large, the gradient descent algorithm may overshoot the optimal solution and oscillate around it, or it may even diverge and fail to converge.

This is because larger learning rates cause larger updates to the weights in each iteration

CHAP 6 : ML Linear Regression Flashcards

(25 cards)