3) The supervised learning problem Flashcards

(49 cards)

1
Q

In supervised learning, what are the input space, label space, and concept function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a loss function in supervised learning, and what is its purpose

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the hypothesis space in supervised learning

A

The hypothesis space H is a set of functions h:X→Y that serve as potential candidates for the concept function c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the generalisation error of a hypothesis h defined, and what does it represent

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the goal of the supervised learning task, and how is it formulated as an optimisation problem

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is training data

A

Training data refers to pairs (x1 ,y1),…,(xN ,yN)∈X×Y, which are used to learn and approximate the concept function c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the empirical error of a hypothesis h defined using training data

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the empirical learning problem, and what is the goal of training in supervised learning

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the data-generating distribution setting in supervised learning, and how does it generalise the concept setting

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between regression and classification problems in supervised learning

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe examples of regression and classification problems

A

Regression -
* House price prediction
* Temperature prediction
* Stock price forecasting
Classification -
* Spam email detection
* Disease diagnosis (disease present or not)
* Image recognition (e.g. digits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a parametric model in the context of hypothesis space

A
  • A parametric model is one where the model is fully described using a fixed number of parameters.
  • The hypothesis space is the set of all functions the model can represent, and each function is determined by a specific choice of parameters
  • Think of it like: Hypothesis space = all possible models we can get by plugging different values into our parameterized formula

Formally - H={H(x,w):w∈W}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the zero-one loss (0-1 loss) defined

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe an example of a linear classifier defined in a binary classification problem with Y={−1,1} and X=R ^M

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does a linear classifier use a hyperplane to classify points in a binary classification problem

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is quadratic loss

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the space of square-integrable functions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In regression with quadratic loss, how can we minimise the generalisation error

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe the proof tha the conditional expectation is the best predictor under squared loss

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How is linear regression formulated using quadratic loss and dictionary functions

21
Q

What are some common choices for dictionary functions in linear regression

22
Q

How are the generalisation error and empirical error expressed in linear regression, and what type of optimisation problem does this lead to

23
Q

How is the logistic function defined in a probabilistic classification setting

24
Q

What is logistic regression, and how does it model classification problems

25
What is the Rademacher distribution
26
What is the space G of hypothesis losses, and how is the empirical Rademacher complexity of G defined
27
What is the (non-empirical) Rademacher complexity of G, and what does it intuitively represent
28
What is the relationship between the Rademacher complexities of the hypothesis space H and the associated loss space G in binary classification with 0-1 loss
29
Describe the proof that the ^Rad(G) = 1/2^Rad(H)
30
What is a generalisation bound for functions in G involving Rademacher complexity
31
What is the growth function ΠH(N) of a hypothesis space H, and what does it represent
32
What is an upper bound for the expected supremum of a finite set A⊆X⊆RM involving Rademacher variables
33
What is an upper bound for the Rademacher complexity of a hypothesis space H in binary classification
34
What is a generalisation bound for a hypothesis h∈H in binary classification with 0-1 loss
35
What does it mean for a hypothesis space H to shatter a dataset of size N
36
What is the VC dimension of a hypothesis space H
37
What is an upper bound for the growth function ΠH(N) in terms of the VC dimension
38
What is a generalisation bound for a hypothesis h∈H in terms of the VC dimension
39
What does it mean for a hyperplane H to separate two sets A and B, and how does this relate to their convex hulls
40
What does Radon's Theorem state about the partition of a set A⊆R^M with #A=M+2
41
What is the the hypothesis space of linear classifiers (hypothesis class)
42
What is the VC dimension of the hypothesis space H ^cl, M in R^M
43
What is the Bayes hypothesis
44
What is the optimal hypothesis in a binary classification problem under the zero-one loss
45
Describe the proof of the optimal hypothesis for the zero-loss function
46
What is the Bayes error, and why is it called irreducible
47
How can we split up the generalisation error into three terms
48
What is the approximation-estimation trade-off in machine learning model selection
49
How do we balance approximation and estimation errors when selecting a hypothesis space H