CHAP 6 : ML Linear Regression Flashcards

1
Q

What is linear regression?

A
  • It is a supervised machine learning algorithm
  • that finds the best fit linear line between independent and dependent variable (i.e. finds linear relationship between independent and dependent variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There is only one optimal line that best fits data in linear regression. Is the statement true or false?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to calculate error function?

A

For N number of data points,

Take the sum of squared error between y value of each data point and y hat (predicted y value) on regression line multiply by 1/2N

  • Error function is the average sum of squared errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For error function, why do we need to take the average squared error sum (i.e. why do we need to divide by N)

A

It is to remove the dependency on the data set size. The error magnitude is thereby independent of N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For error function, sine it is average squared error sum why do we need to divide by 2 in the denominator –> divide by 2N instead of just N

A

It is to make calculations neater when finding the derivative, the cancel out the 2 in the numerator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to calculate new value of W0 and W1 (parameters)?

A

W0’ = w0 - L(1/N) (partial derivative) ; W1’ = w0 - L (1/N)(partial derivative) , where L is the learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the learning rate, L?

A

It is the rate that controls the rate of change of values of W0, W1 at each step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if the learning rate is too large?

A

The algorithm may not converge (and sum of squared error may increase)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens if the learning rate is too small?

A

The gradient descent function will take a very long time to converge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps in gradient descent function? [4]

A
  1. Start with a random value of (W0, W1) and define learning rate (e.g. L = 0.000001)
  2. Calulate partial derivative of error function with respect to W0 and W1.
  3. Update current values of W0, W1

W0’ = w0 - L (partial derivative) ; W1’ = w0 - L (partial derivative)

  1. Repeat steps 2-3 till convergence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the gradient descent function know when to stop?

A

It stops when it converges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we define the convergence of the gradient descent function?

A

We can set a convergence condition to be achieved OR run the algorithm for a fixed number of iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

**Argmin, by definition, tries to find the lowest cost value of the objective/error function. Is the statement true or false?

A

False. It tries to find the INPUT VALUES that minimises the cost of the error function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gradient descent algorithm is used to find the least optimal parameters and eliminate them. Is the statement true or false?

A

False. It is used to find the most optimal parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When learning rate is large, then the gradient descent algorithm takes a long time to converge because the weights are updated with smaller values. Is the statement true or false?

A

False. When the learning rate is large, the gradient descent algorithm may overshoot the optimal solution and oscillate around it, or it may even diverge and fail to converge.

This is because larger learning rates cause larger updates to the weights in each iteration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Order the steps for the gradient descent algorithm as shown below:

A) Updating of w0 and w1 values

B) Iteration of the previous two steps till convergence or termination

C) Setting a random w0 and w1, and a learning rate

D) Finding the partial derivates of w0 and w1

A

C, D, A, B

17
Q

There is no optimal way to attain a good L value (i.e., it requires trial and error to choose a nominal value of L). Is the statement true or false? Note: L is your learning rate.

18
Q

What is the overall flow for linear regression ? [6 steps]

A

1) Get Dataset,

2) Plotting of dataset,

3) Error function,

4) Gradient Descent algorithm,

5) Creation of Linear Regression model and

6) Prediction.

19
Q

It is possible to use linear regression when there is no dataset. Is the statement true or false?

A

False. Linear regression is a supervised learning algorithm that requires a labeled dataset, i.e., a dataset that contains both the input features and the corresponding target variable values.

20
Q

The key difference between single variable and multivariable linear regression is the number of independent variables. Is the statement true or false?

21
Q

Using linear regression, there are two known values of x and y, namely (3,5) and (8,10). What are the values of W0 and W1?

A

W0 : 2
W1 : 1

21
Q

Using linear regression, there are two known values of x and y, namely (3,5) and (8,10). What are the values of W0 and W1?

A

W0 : 2
W1 : 1

22
Q

Given that w0,w1=[2,10], find the value of y when x=3.

23
Q

Given that w0,w1,w2,w3= [8,10,-2,4], find for y when x1=3, x2=5, x3=10.

24
Using linear regression, there are two known values of x and y, namely (3,5) and (8,10). What are the values of W0 and W1?
W0 : 2 W1 : 1