Learning Model Flashcards

1
Q

Introduce the supervised learning problem, 6 key points.

A

Supervised learning problem:

  • Domain set X, which is the set of all possible objects to make predictions about where a domain point x in X is called instance and is usually a vector of features in R^n
  • Label set Y, that defines the set of all possible labels
  • Training set S, composed of m labeled points
  • Learner’s Output h : X –> Y, which is the prediction rule
  • Data-generation model: Instances are generated by some probability distribution and labeled according a function (both unknown)
  • Measures of success: error of a classifier = probability it does not predict the correct label on a random data point generated by D

The Loss is the probability of randomly choosing an example x for which h(x) different from f(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discuss one approach to pursue the supervised problem main objective in terms of Generalization Error (= Expected Risk) and Training Error (= Empirical Risk).

A

One approach to solve a supervised learning problem is through the empirical risk minimization. So given:

  • Domain set X, which is the set of all possible objects to make predictions about where a domain point x in X is called instance and is usually a vector of features in R^n
  • Label set Y, that defines the set of all possible labels
  • Training set S, composed of m labeled points

we need to define an hypothesis class H, which defines the possible prediction rules h : X –> Y, and a loss function l : H x Z –> R+, where Z = X x Y that given an hypothesis h in H provides a measure of how much we lose by predicting the value h(x) for x instead of the correct value y.

The goal is then to find an hypothesis h^* in H which minimize the generalziation error LD,f

but since the probability distribution D and the labeling function f are unknown, we minimize the training error (empirical risk) Ls(h) = #ofwrongpredictions/#oftrainingsamples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the regression task and classification task and their differences.

A

Answer as ERM approach but add:

In the regression task we define Y = R
In the binary classification task we define Y as the set of the two possible labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition of PAC learnability

A

An hypothesis class H is PAC learnable if there exists a function mh : (0,1)^2 –> N and a learning algorithm such that for every delta and epsilon in (0,1), for every distribution D over X and for every labeling function f: X –> {01}. if the realizability assumption holds wrt H,D,f; when running the learning algorithm on m >= mh(epsilon, delta) iid samples generated by D and labeled by f, the algorithm returns an hypothesis h such that, with probability >= 1 - delta (over the choice of examples): Ldf(h) <= epsilon

epsilon is the accuracy parameter: how far the output can be from the optimal one
delta is the confidence parameter: how likely the classifier is to meet that accuracy requirment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample complexity for PAC learnability of finite hypotheses classes

A

The sample complexity indicates how many examples are required to guarantee a probably approximately correct (PAC) solution.
Every finite hypothesis class is PAC learnable with sample complexity mh(epsilon, delta) <= ceiling(log(|H|/delta)/epsilon)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Definition of Agnostic PAC learnability for general loss functions

A

H is agnostic PAC learnable wrt a set Z and a loss function l : H x Z –> R+, if there exists a function mh: (0,1)^2 –> N and a learning algorithm such that for every delta, epsilon in (0,1), for every distribution D over Z, when running the learning algorithm on m >= mh(epsilon, delta) iid examples generated by D the algorithm returns a hypothesis h such that, with probability >= 1-delta (over the choice of the m training examples:

Ld(h) <= min Ld(h’) + epsilon

where Ld(h) = E[l(h,z)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Definitions of general loss function, true and empirical error.
Common loss functions
Prove that the expectation (over the distribution of the training set) of Ls(h) is equal to Ld(h).

A

General Loss Functions: any function l : H x Z –> R+ where H is any hypotheses set and Z is some domain

Risk function = expected loss of a hypothesis h in H wrt D over Z: Ld(h) = E[l(h,z)]

Empirical risk: the expected loss over a given sample S in Z^m: Ls(h) = 1/m Sum l(h,zi)

Common loss functions:

  • 0-1 loss : Z = X x Y. Used in binary or multiclass classification

l0-1(h,(x,y)) = 0 h(x) = y; 1 h(x) diff y

-Squared loss: Z = X x Y. Used in regression.

lsq(h,(x,y)) = (h(x)-y)^2

E[Ls(h)] = E[1/m SUM l] = 1/m E[SUM l] = 1/m SUM Ld(h) = Ld(h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly