1. take its derivative with respect to h 2. set it equal to 0 3. solve for the resulting h 4. perform a second derivative test to ensure we found a minimum

empirical risk Flashcards by Stephanie Yue

simple linear regression model

H(x) = w0 + w1x

How well did you know this?

Not at all

Perfectly

slope in simple linear regression model

How well did you know this?

Not at all

Perfectly

intercept in simple linear regression model

How well did you know this?

Not at all

Perfectly

loss function

quantifies how bad a prediction is for a single data point

How well did you know this?

Not at all

Perfectly

if our prediction is close to the actual value

we should have low loss

How well did you know this?

Not at all

Perfectly

if our prediction is far from the actual value

we should have high loss

How well did you know this?

Not at all

Perfectly

error

difference between actual and predicted values (yi - H(xi)

How well did you know this?

Not at all

Perfectly

squared loss function

computes (actual - predicted)^2

How well did you know this?

Not at all

Perfectly

constant model

Lsq(yi, h) = (yi - h)^2

How well did you know this?

Not at all

Perfectly

another term for average squared loss

mean squared error

How well did you know this?

Not at all

Perfectly

best prediction, h*

Rsq(h) = 1/n(Summation of i = 1, n) (yi - h)^2

How well did you know this?

Not at all

Perfectly

constant model

H(x) = h

How well did you know this?

Not at all

Perfectly

simple linear regression

H(x) = w0 + w1x

How well did you know this?

Not at all

Perfectly

how do we find h* that minimizes Rsq(h)

using calculus

How well did you know this?

Not at all

Perfectly

minimize Rsq(h)

take its derivative with respect to h
set it equal to 0
solve for the resulting h*
perform a second derivative test to ensure we found a minimum

How well did you know this?

Not at all

Perfectly

derivative of Rsq(h)

-2/n(SUMMATION of n starting w/ i = 1)(yi - h)

How well did you know this?

Not at all

Perfectly

Mean minimizes…

mean squared error

How well did you know this?

Not at all

Perfectly

absolute loss

Labs(yi, H(xi)) = |yi - H(xi)|

How well did you know this?

Not at all

Perfectly

average absolute loss

Rabs(h) = 1/n summation of n from i = 1 |yi - h|

How well did you know this?

Not at all

Perfectly

to minimize mean absolute error

take its derivative with respect to h
set it equal to 0
solve for the resulting h*
perform a second derivative test to ensure we found a minimum

How well did you know this?

Not at all

Perfectly

derivative of |yi - h|

it is a piece-wise function, so will be undefined

How well did you know this?

Not at all

Perfectly

derivative of Rabs(h)

d/dh(1/n SUMMATION of n from i = 1, |yi - h|) = 1/n[#(h > yi) - #(h < yi)]

How well did you know this?

Not at all

Perfectly

median minimizes

mean absolute error

How well did you know this?

Not at all

Perfectly

best constant prediction in terms of mean absolute error

median
1. when n is odd, answer is unique
2. when n is even, any number between the middle two data points also minimizes mean absolute error
3. when n is even, define the median to be the mean of the middle two data points

How well did you know this?

Not at all

Perfectly

process for minimizing average loss

empirical risk minimization

another name for "average loss"

empirical risk

corresponding empirical risk when using the squared loss function

Rsq(h) = 1/n SUMMATION of n from i = 1 (yi - h)^2

if L(yi, h) is any loss function the corresponding empirical risk is

R(h) = 1/n(SUMMATION Of n from i = 1, L(yi, h)

Modeling recipe

1. choose a model 2. choose a loss function 3. minimize average loss to find optimal model parameters

empirical risk minimization

formal name for the process of minimizing average loss

corresponding squared loss function to Lsq(yi, h) = (yi - h)^2

Rsq(h) = 1/n Summation of n from i = 1 (yi - h)^2

For the mean

sum of distances below = sum of distances above

Mean is the point where

Summation of n from i = 1 (yi - h) = 0

Median is the point where

#(yi < h) = #(yi > h)

Lp loss

Lp(yi, h) = |yi - h|^p

Corresponding empirical risk to Lp

Rp(h) = 1/n summation of n from i = 1|yi - h|^p

midrange minimizes

L(infinity loss)

As p --> infinity,

the minimizer of mean Lp loss approached the midpoint of the minimum and maximum values in the dataset or midrange

The general form of empirical risk for any loss function

R(h) = 1/n Summation of n from i = 1 (L(yi , h))

input h* that minimizes R(h) is...

some measure of the center of the dataset

minimum output R(h*) represents

some measure of the spread or variation in the dataset

Minimum value of Rsq(h)

Rsq(h*) = Rsq(Mean(y1, y2...yn)) = 1/n SUMMATION of n starting from i = 1(yi - Mean(y1, y2,... yn))^2

Variance

minimum value of Rsq(h) is the mean squared deviation from the mean, measures the squared distance of each data point from the mean, on average

standard deviation

square root of variance

empirical risk for absolute loss

Rabs(h) = 1/n summation of n starting from i = 1|yi - h|

Rabs(h) is minimized when

h* = Median(y1, y2,... yn)

Minimum value of Rabs(h) is...

mean absolute deviation from the median (1/n)SUMMATION of n from i = 1|yi - Median(y1, y2,...yn)|

empirical risk for 0-1 Loss

R0,1(h) = 1/n Summation of n starting from i = 1 {0 - yi = h, 1 yi doesn't equal h proportion (between 0 and 1) of data points not equal to h

R0,1(h) is minimized when

h* = Mode(y1,y2...yn)

the minimum value of R0,1(h)

proportion of data points not equal to mode

simple linear regression model

H(x) = w0 + w1x

when using squared loss

h* = Mean(y1, y2... yn) Rsq(h*) = Variance(y1, y2, ... yn)

When using absolute loss

h* = Median(y1, y2... yn) Rabs(h*) = MAD from median

R0,1(h) is minimized when

h* = Mode(y1,y2,... yn) so therefore R0,1(h*) is the proportion of data points not equal to the mode

minimum value of R0,1(h) is the

proportion of data points not equal to mode

higher value means

less of the data is clustered at the mode

hypothesis function

H, takes in an x as input and returns a predicted y

parameters define

the relationship between the input and output of a hypothesis function

Since linear hypothesis functions are of the form H(x) = w0 + w1x, we can re-write Rsq

Rsq(w0, w1) = 1/n Summation n from i = 1 (yi - (w0 + w1xi))^2

Minimize mean squared error

Take partial derivatives with respect to each variable set all partial derivatives to 0 solve the resulting system of equations ensure that you've found a minimum, rather than a maximum or saddle point

We have a system of two equations and two unknowns (w0 and w1) -2/n Summation of n from i = 1 ( yi - (w0 + w1xi)) = 0 -2/n Summation of n from i = 1 (yi - (w0 + w1xi))xi = 0

solve for w0 in first equation, result becomes best intercept plut w0* into second equation and solve for w1

correlation

linear association, pattern that looks like a line

association

any pattern

correlation coefficient ,r

measure of strength of linear assocaition of two variables, x and y, measures how tightly clustered a scatter plot is around a straight line, between -1 and 1

correlation coefficient, r is defined

average of the product of x and y when both are in standard units

slope : w1* = r(sigma y / sigma x)

units of y per units of x

empirical risk Flashcards

(67 cards)