Final Test Flashcards

1
Q

What does Entropy mean ?

A

It’s the degree of disorder
How random it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Entropy’s formula

A

Entropy([6+,2-])
-(6/8)log2(6/8) - (2/8)log2(2/8)
=0.8113

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Entropy([0+,4-]) with log2

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Entropy([4+,4-]) with log2

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the main concepts of Backpropagation ?

A

It optimizes the Weight and Biases of a Neural Network

It starts from the last parameter and works its way inward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the Chain Rule ?

A

dSize = dSize x dHeight
———– ——— ———–
dFat dHeight dFat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sigmoid function’s formula

A

1 + pow(e,-(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the use of Gradient Descent ?

A

Calculate better parameters for prediction.
Take bigger steps when far, and small steps when the parameter is close

finds the minimum value by taking steps from an initial guess until it reaches the best value
=> Good when derivative = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the meaning of SSR ?

A

Sum of Squared Residuals
1.1² + 0.4² + (-1.3)² = …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps of Gradient Descent ?

A

When making Predictions
1. Choose the Loss function
2. Calculate SSR (how different we are)
3. Take derivative of SSR
4. Pick random value for the intercept
5. Calculate derivative using that intercept
6. Calculate the Step Size
7. Calculate the New Intercept
8. Use new intercept and repeat 5 to 8
9. Stop when Step Size is close to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SSR’s formula

A

(observed1 - predicted1)² +
(observed2 - predicted2)² + …
(1.4 - (intercept + 0.64 x 0.5))²

observed = real value

predicted = equation line
= (intercept + 0.64 x axis val)
= (intercept + 0.64 x 0.5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Derivative of SSR with respect to the intercept

A

= derivative of all parts
=> Use CHAIN RULE
= -2 (1.4 - (intercept + 0.64 x 0.5))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Exemple of Loss function

A

SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the intercept in Gradient Descent

A

It’s the Y value that touches the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to calculate the Step Size in Gradient Descent

A

Step Size = Slope x Learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to calculate the New Intercept in Gradient Descent

A

New Intercept = Old Intercept - Step Size

17
Q

What happens when we do a gradient descent for 2 variables

A

By calculating for the Intercept and the Slope at the same time :
Gives a 3D graph with SSR, intercept & Slope

18
Q

How does the backpropagation finds the value of the last bias ?

A

Goal : Get the least Loss by using Gradient Descent
1. Need to calculate (d SSR / d bias3)
Because SSR = sum(observed - predict)²
and predicted = green squiggle = blue + orange + b3
2. Use chain rule
3. (d SSR / d bias3) = (d SSR / d Predicted) x (d Predicted / b3)
4. = sum( -2x(observed - predicted) ) x 1
5. = (d SSR / d bias3) [Original goal]
6. corresponds to the slope
7. calculate step size… [gradient descent]
8. Find smallest value