3) The supervised learning problem Flashcards by Caleb Pleavin

In supervised learning, what are the input space, label space, and concept function

How well did you know this?

Not at all

Perfectly

What is a loss function in supervised learning, and what is its purpose

How well did you know this?

Not at all

Perfectly

What is the hypothesis space in supervised learning

The hypothesis space H is a set of functions h:X→Y that serve as potential candidates for the concept function c

How well did you know this?

Not at all

Perfectly

How is the generalisation error of a hypothesis h defined, and what does it represent

How well did you know this?

Not at all

Perfectly

What is the goal of the supervised learning task, and how is it formulated as an optimisation problem

How well did you know this?

Not at all

Perfectly

What is training data

Training data refers to pairs (x1 ,y1),…,(xN ,yN)∈X×Y, which are used to learn and approximate the concept function c

How well did you know this?

Not at all

Perfectly

How is the empirical error of a hypothesis h defined using training data

How well did you know this?

Not at all

Perfectly

What is the empirical learning problem, and what is the goal of training in supervised learning

How well did you know this?

Not at all

Perfectly

What is the data-generating distribution setting in supervised learning, and how does it generalise the concept setting

How well did you know this?

Not at all

Perfectly

What is the difference between regression and classification problems in supervised learning

How well did you know this?

Not at all

Perfectly

Describe examples of regression and classification problems

Regression -
* House price prediction
* Temperature prediction
* Stock price forecasting
Classification -
* Spam email detection
* Disease diagnosis (disease present or not)
* Image recognition (e.g. digits)

How well did you know this?

Not at all

Perfectly

What is a parametric model in the context of hypothesis space

A parametric model is one where the model is fully described using a fixed number of parameters.
The hypothesis space is the set of all functions the model can represent, and each function is determined by a specific choice of parameters
Think of it like: Hypothesis space = all possible models we can get by plugging different values into our parameterized formula

Formally - H={H(x,w):w∈W}

How well did you know this?

Not at all

Perfectly

How is the zero-one loss (0-1 loss) defined

How well did you know this?

Not at all

Perfectly

Describe an example of a linear classifier defined in a binary classification problem with Y={−1,1} and X=R ^M

How well did you know this?

Not at all

Perfectly

How does a linear classifier use a hyperplane to classify points in a binary classification problem

How well did you know this?

Not at all

Perfectly

What is quadratic loss

How well did you know this?

Not at all

Perfectly

What is the space of square-integrable functions

How well did you know this?

Not at all

Perfectly

In regression with quadratic loss, how can we minimise the generalisation error

How well did you know this?

Not at all

Perfectly

Describe the proof tha the conditional expectation is the best predictor under squared loss

How well did you know this?

Not at all

Perfectly

How is linear regression formulated using quadratic loss and dictionary functions

Study These Flashcards

What are some common choices for dictionary functions in linear regression

Study These Flashcards

How are the generalisation error and empirical error expressed in linear regression, and what type of optimisation problem does this lead to

Study These Flashcards

How is the logistic function defined in a probabilistic classification setting

Study These Flashcards

What is logistic regression, and how does it model classification problems

Study These Flashcards

What is the Rademacher distribution

What is the space G of hypothesis losses, and how is the empirical Rademacher complexity of G defined

What is the (non-empirical) Rademacher complexity of G, and what does it intuitively represent

What is the relationship between the Rademacher complexities of the hypothesis space H and the associated loss space G in binary classification with 0-1 loss

Describe the proof that the ^Rad(G) = 1/2^Rad(H)

What is a generalisation bound for functions in G involving Rademacher complexity

What is the growth function ΠH(N) of a hypothesis space H, and what does it represent

What is an upper bound for the expected supremum of a finite set A⊆X⊆RM involving Rademacher variables

What is an upper bound for the Rademacher complexity of a hypothesis space H in binary classification

What is a generalisation bound for a hypothesis h∈H in binary classification with 0-1 loss

What does it mean for a hypothesis space H to shatter a dataset of size N

What is the VC dimension of a hypothesis space H

What is an upper bound for the growth function ΠH(N) in terms of the VC dimension

What is a generalisation bound for a hypothesis h∈H in terms of the VC dimension

What does it mean for a hyperplane H to separate two sets A and B, and how does this relate to their convex hulls

What does Radon's Theorem state about the partition of a set A⊆R^M with #A=M+2

What is the the hypothesis space of linear classifiers (hypothesis class)

What is the VC dimension of the hypothesis space H ^cl, M in R^M

What is the Bayes hypothesis

What is the optimal hypothesis in a binary classification problem under the zero-one loss

Describe the proof of the optimal hypothesis for the zero-loss function

What is the Bayes error, and why is it called irreducible

How can we split up the generalisation error into three terms

What is the approximation-estimation trade-off in machine learning model selection

How do we balance approximation and estimation errors when selecting a hypothesis space H

3) The supervised learning problem Flashcards

(49 cards)