Week 6 Flashcards

(36 cards)

1
Q

Overfitting

A

When fitting the observed facts, the data seen so far, well, does no longer indicate a small out-of-sample error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Deterministic noise

A

The part of the target function that is outside of the best approximation to the target function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stochastic noise

A

Random noise that cannot be modeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

State two differences between deterministic and stochastic noise

A

1) If we generate the same data again, the deterministic noise would be the same but the stochastic noise would be different.
2) Different models capture different parts of the target function -> deterministic noise depends on the learning model you use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The variance of the stochastic noise is captured by the variable…

A

sigma_squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the cause of overfitting?

A

Noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name two cures for overfitting:

A

1) Regularization
2) Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regularization

A

Attempts to minimize Eout by working through the equation
Eout(h) = Ein(h) + overfit penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validation

A

Estimates the out-of-sample error directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

validation set

A

A subset from the data that is not used in training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is a set no longer a test set?

A

When it affects the learning process in any way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the validation set created?

A

The data set D is divided in a training set of size (N-K) and a validation set of size K. A final hypothesis is learned by the algorithm using the training set. Then the validation error is calculated with the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the rule of thumb for determining K in validation?

A

K = N/5
Use 80% for training and 20% for validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cross validation estimate

A

The average value of the error made by gn on its validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Wat denoteert H.theta?

A

De polynomen van graad d (~erboven)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wat is theta(x)? (z)

A

de x-vector (bijv (1,x).T)p met daarbij nog x^2 … x^d(~erboven)

17
Q

Wanneer zijn we aan het overfitten?

A

Als het algorithme probeert te leren van de ruis in plaats van van het patroon.

18
Q

Wat is de oorzaak van overfitting in een voorbeeld zonder ruis?

A

Deterministische ruis: we kunnen met de beste hypothesefunctie niet perfect de target functie benaderen

19
Q

Hoe kun je E.out(g.D) uitdrukken in 3 dingen?

A

var+bias+sigma_squared

20
Q

Wat is de var bij het berekenen van E.out(g.D) in 3 delen?

A

E.D,x [ ( g.D(x) - gemiddelde g(x) ) ^2 ]

21
Q

Wat is de bias bij het berekenen van E.out(g.D) in 3 delen?

A

bias = E.x [ (gemiddelde g(x) - f(x) ) ^2 ]

22
Q

Wat is de sigma_squared bij het berekenen van E.out(g.D) in 3 delen?

A

sigma_squared = E.epsilon,x [ (epsilon(x))^2]

23
Q

Hoe bereken je in lineare regressie met ruis de verwachte in-sample error?

A

sigma_squared * (1 - (d+1)/N )

24
Q

Hoe bereken je de verwachte out-of-sample error in lineare regressi met ruis?

A

sigma_squared * (1 + (d+1)/N)

25
Hoe bereken je de verwachte generalization error in lineare regressie met ruis?
2*sigma_squared * ((d+1)/N)
26
Wat is H0 in principe?
De verzameling van alle hypotheses van de vorm f(x) = b
27
Wat is H1 in principe?
De verzameling van alle hypotheses van de vorm f(x) = ax + b
28
Diff. linear classification and linear regression:
Classification = binary (or trinary...) Regression = real numbers
29
Diff. Logistic regression and linear regression:
Logistic regression = real between 0 and 1 Linear regression = just real.
30
What does linear regression use to measure the distance between h(x) and f(x)
Mean Square Error (MSE)
31
What is the formula for the mean squared error?
E.in = 1/N * (h(x.i) - y.i) ^2 for all i in N.
32
What is h(x) in 1) linear regression, 2) perceptron and 3)logistic regression?
1) h(x) = s 2) h(x) = sign(s) 3) h(x) = theta(s) for s = w.T * x
33
Give the logistic regression algorithm (2 steps):
For every time step, do 1) compute the gradient 2) Update the weights with fixed learning rate eta: w(t+1) = w(t) - eta * E.in gradient
34
Stochastic gradient descent
Does not use all examples for E.in, it uses a single data sample error or several.
35
How do represent the XOR function in Ands and Ors?
f(X) = (not h1 AND h2) OR (h1 AND not h2) +1 if exactly one of h1, h2 equals +1
36
What does more nodes per hidden layer do with the approximation and generalization of the MLP?
approximation goes up, generalization goes down.