Terminology Week 2 Flashcards

1
Q

What are hyperparameters?

A

Hyperparameters are the configuration settings used to tune how the model is trained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe gradient descent.

A

It is a strategy to minimize loss gradually.

We repeatedly take small steps in the direction that minimizes loss. We call these Gradient Steps (But they’re really negative Gradient Steps)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a gradient?

A

Is the slope of a given point of a curve. I can be positive or negative. Or zero (which is at the bottom of a convex curve)

Also called the derivative of the loss function.

The gradient of is a two-dimensional vector that tells you in which direction to move for the maximum increase in height. Thus, the negative of the gradient moves you in the direction of maximum decrease in height. In other words, the negative of the gradient vector points into the valley.
In machine learning, gradients are used in gradient descent. We often have a loss function of many variables that we are trying to minimize, and we try to do this by following the negative of the gradient of the function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When does weight initialisation matter (in gradient descent)?

A

For Non-convex problems it matters!
think of an egg crate - more than one minimum;
Strong dependency on initial values

For convex problems, weights can start anywhere (say, all 0s) (convex: think of a bowl shape) = Just one minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe stochastic gradient descent.

A

Stochastic Gradient Descent: calculate the loss with one example at a time. The example is chosen randomly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Mini-Batch Gradient Descent.

A

Mini-Batch Gradient Descent: batches of 10-1000

Loss gradients are averaged over the batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe gradient descent step by step with your words.

A
I choose a set of parameters. (w and b for linear regression). This is means, we have a new model.
Calculate y (label) for with the given parameters for the  example feature (x). (we apply the model)
Calculate the loss (e.g. with square loss function) - i.e. how far away is my predication y' from the real value (example y).
Do this for all examples and calculate the mean square loss.
Now calculate the gradient of the loss function.  Which gives a direction in which to change the parameters to minimize the loss (it is 0 when you found a minimum loss.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can you do to find out the direction in which the hyperparameter should be changed?

A

Calculate the (negative) gradient of the loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is a machine learning model trained?

A

A Machine Learning model is trained by starting with an initial guess for the weights and bias and iteratively adjusting those guesses until learning the weights and bias with the lowest possible loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does it mean when a model is convergent?

A

The changes of parameters/model result only in very small changes in loss function.

Usually, you iterate until overall loss stops changing or at least changes extremely slowly. When that happens, we say that the model has converged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a batch (in gradient descent)?

A

In gradient descent, a batch is the total number of examples you use to calculate the gradient in a single iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is redundancy in example data useful?

A

Some redundancy can be useful to smooth out noisy gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly