empirical risk Flashcards

(67 cards)

1
Q

simple linear regression model

A

H(x) = w0 + w1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

slope in simple linear regression model

A

w1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

intercept in simple linear regression model

A

w0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

loss function

A

quantifies how bad a prediction is for a single data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

if our prediction is close to the actual value

A

we should have low loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if our prediction is far from the actual value

A

we should have high loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

error

A

difference between actual and predicted values (yi - H(xi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

squared loss function

A

computes (actual - predicted)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

constant model

A

Lsq(yi, h) = (yi - h)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

another term for average squared loss

A

mean squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

best prediction, h*

A

Rsq(h) = 1/n(Summation of i = 1, n) (yi - h)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

constant model

A

H(x) = h

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

simple linear regression

A

H(x) = w0 + w1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do we find h* that minimizes Rsq(h)

A

using calculus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

minimize Rsq(h)

A
  1. take its derivative with respect to h
  2. set it equal to 0
  3. solve for the resulting h*
  4. perform a second derivative test to ensure we found a minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

derivative of Rsq(h)

A

-2/n(SUMMATION of n starting w/ i = 1)(yi - h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mean minimizes…

A

mean squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

absolute loss

A

Labs(yi, H(xi)) = |yi - H(xi)|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

average absolute loss

A

Rabs(h) = 1/n summation of n from i = 1 |yi - h|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

to minimize mean absolute error

A
  1. take its derivative with respect to h
  2. set it equal to 0
  3. solve for the resulting h*
  4. perform a second derivative test to ensure we found a minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

derivative of |yi - h|

A

it is a piece-wise function, so will be undefined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

derivative of Rabs(h)

A

d/dh(1/n SUMMATION of n from i = 1, |yi - h|) = 1/n[#(h > yi) - #(h < yi)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

median minimizes

A

mean absolute error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

best constant prediction in terms of mean absolute error

A

median
1. when n is odd, answer is unique
2. when n is even, any number between the middle two data points also minimizes mean absolute error
3. when n is even, define the median to be the mean of the middle two data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
process for minimizing average loss
empirical risk minimization
26
another name for "average loss"
empirical risk
27
corresponding empirical risk when using the squared loss function
Rsq(h) = 1/n SUMMATION of n from i = 1 (yi - h)^2
28
if L(yi, h) is any loss function the corresponding empirical risk is
R(h) = 1/n(SUMMATION Of n from i = 1, L(yi, h)
29
Modeling recipe
1. choose a model 2. choose a loss function 3. minimize average loss to find optimal model parameters
30
empirical risk minimization
formal name for the process of minimizing average loss
31
corresponding squared loss function to Lsq(yi, h) = (yi - h)^2
Rsq(h) = 1/n Summation of n from i = 1 (yi - h)^2
32
For the mean
sum of distances below = sum of distances above
33
Mean is the point where
Summation of n from i = 1 (yi - h) = 0
34
Median is the point where
#(yi < h) = #(yi > h)
35
Lp loss
Lp(yi, h) = |yi - h|^p
36
Corresponding empirical risk to Lp
Rp(h) = 1/n summation of n from i = 1|yi - h|^p
37
midrange minimizes
L(infinity loss)
38
As p --> infinity,
the minimizer of mean Lp loss approached the midpoint of the minimum and maximum values in the dataset or midrange
39
The general form of empirical risk for any loss function
R(h) = 1/n Summation of n from i = 1 (L(yi , h))
40
input h* that minimizes R(h) is...
some measure of the center of the dataset
41
minimum output R(h*) represents
some measure of the spread or variation in the dataset
42
Minimum value of Rsq(h)
Rsq(h*) = Rsq(Mean(y1, y2...yn)) = 1/n SUMMATION of n starting from i = 1(yi - Mean(y1, y2,... yn))^2
43
Variance
minimum value of Rsq(h) is the mean squared deviation from the mean, measures the squared distance of each data point from the mean, on average
44
standard deviation
square root of variance
45
empirical risk for absolute loss
Rabs(h) = 1/n summation of n starting from i = 1|yi - h|
46
Rabs(h) is minimized when
h* = Median(y1, y2,... yn)
47
Minimum value of Rabs(h) is...
mean absolute deviation from the median (1/n)SUMMATION of n from i = 1|yi - Median(y1, y2,...yn)|
48
empirical risk for 0-1 Loss
R0,1(h) = 1/n Summation of n starting from i = 1 {0 - yi = h, 1 yi doesn't equal h proportion (between 0 and 1) of data points not equal to h
49
R0,1(h) is minimized when
h* = Mode(y1,y2...yn)
50
the minimum value of R0,1(h)
proportion of data points not equal to mode
51
simple linear regression model
H(x) = w0 + w1x
52
when using squared loss
h* = Mean(y1, y2... yn) Rsq(h*) = Variance(y1, y2, ... yn)
53
When using absolute loss
h* = Median(y1, y2... yn) Rabs(h*) = MAD from median
54
R0,1(h) is minimized when
h* = Mode(y1,y2,... yn) so therefore R0,1(h*) is the proportion of data points not equal to the mode
55
minimum value of R0,1(h) is the
proportion of data points not equal to mode
56
higher value means
less of the data is clustered at the mode
57
hypothesis function
H, takes in an x as input and returns a predicted y
58
parameters define
the relationship between the input and output of a hypothesis function
59
Since linear hypothesis functions are of the form H(x) = w0 + w1x, we can re-write Rsq
Rsq(w0, w1) = 1/n Summation n from i = 1 (yi - (w0 + w1xi))^2
60
Minimize mean squared error
Take partial derivatives with respect to each variable set all partial derivatives to 0 solve the resulting system of equations ensure that you've found a minimum, rather than a maximum or saddle point
61
We have a system of two equations and two unknowns (w0 and w1) -2/n Summation of n from i = 1 ( yi - (w0 + w1xi)) = 0 -2/n Summation of n from i = 1 (yi - (w0 + w1xi))xi = 0
solve for w0 in first equation, result becomes best intercept plut w0* into second equation and solve for w1
62
correlation
linear association, pattern that looks like a line
63
association
any pattern
64
correlation coefficient ,r
measure of strength of linear assocaition of two variables, x and y, measures how tightly clustered a scatter plot is around a straight line, between -1 and 1
65
correlation coefficient, r is defined
average of the product of x and y when both are in standard units
66
slope : w1* = r(sigma y / sigma x)
units of y per units of x
67