Logistic regression Flashcards

1
Q

Data-set

A

{(x1,y1),…,(xN,yN)}
generated independently according to
P(y|x) = f(x) if y=+1; 1-f(x) if y=-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

objective

A

To learn the target function
f(x) = P[y=1 | x]

Note that f(x) is not the output y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

def logistic function

A

θ(s) = e^s / (1 + e^s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis

A

h(s) = θ(s)
where s is the risk score, modelled as a linear function of the input x
s = w’ x
- > h(x) = θ(w’ x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Performance index

A
Maximum likelihood inference:
maximization wrt w of Π(n=1,N) Pw(yn|xn)
where Pw(yn|xn) = θ(yn w' xn)

Reformulation as a minimization problem:
minimize wrt w -1/N * sum(n=1,N) ln( Pw(yn|xn) )

In general, this is a non-linear minimization problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In sample error in logistic regression

A

Ein(w) = 1/N * sum(n=1,N) ln(1+e^-yn w’ xn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gradient descent algorithm

A
  1. initialize the weights at t=0 as w^0
  2. for t=1,2,… do
    - compute the gradient of the in-sample error using all the data
    g(t) = ∇Ein(w^(t))
    - set the direction v^t = -g(t)
    - update the weights as
    w^(t+1) = w^(t) + η*v^t
    - iterate until stop
  3. return the weights that minimize Ein(w)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stopping criteria

A

1) upper bound on number of iterations
2) threshold on the norm of the gradient ||g(t)||
3) combination of 1 and 2
4) combination of the previous plus a threshold on Ein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stochastic gradient descent

A

Gradient descent algorithm uses all the N data to compute ∇Ein

Stochastic GD algorithm, instead:
- pick a data point (xn,yn) at random
ein(w) = ln( 1+e^-yn w' xn )
∇ein(w) = …
- update rule: w^(t+1) = w^(t) - η*∇ein(w) 
  • > the minimization proceeds on average in the right direction, with some fluctuations, but with the advantage of requiring many less computations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selection of the parameter η in the gradient descent algorithm

A

η is a design parameter:
- it should be small to allow Taylor approximation
- it should be large enough to obtain good convergence performances
- a good choice is a variable η:
η^(t) = η*||∇Ein(w(t))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly