GLM 1 - Simple linear regression Flashcards

1
Q

In a linear relationship, how is the value of an outcome variable Y approximated?

A

Y ≈ β0 + β1X.

Y= dependent variable
B0= is an intercept
B1 = slope coefficient of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the intercept/B0 (often labelled the constant)?

A

Expected mean value of Y when all X=0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the B1?

A

The slope or how y changes per unit increase in x.

B1 is increase in y when you change x by a unit/when x is increased by one unit y will increase by beta 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the terminology of a linear regression?

A
  1. We say that Y is regressed on X .
  2. We are expressing Y in terms of X .
  3. The dependent variable, Y , depends on X .
  4. The independent variable, X , doesn’t depend on anything.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are the coefficients or parameters B0 and B1 estimated?

A

Using the available data:
(x1, y1), (x2, y2), . . . , (xn, yn ) - We have here a sample size of n data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are the estimates of parameters written?

A

The estimates of the parameters are written with a circumflex or hat: ^

We then write our linear equation with these estimated coefficients: y^ = β^0 + β^1 xi

Only a hat over the dependent variable.

Independent variable (xi) does not have a hat as treated as fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

B0 and B1 are independent of each other

True or false

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the circumflex allow us to differentiate between?

A

True value and estimated value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens if add value to B0?

A

This would only affect y but not B1xi – B0 can change independently of B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is y^I?

A

Predictions or predicted values of the outcomes y , given the independent variables, xi ’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the differences between the predicted values, y^ i’s, and the observed values, yi ’s?

A

The residuals:
e^ := yj − yi^ .

That is, these are the values that remain after we have removed the
predictions from the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are the residuals, e^i ’s, also equipped with a hat?

A

Because these are also estimated values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are the black error bars vertical, and not perpendicular to the line in blue?

A

Residuals correspond to an addition to value of y hat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the optimal value of the parameters, β0 and β1 be found?

A

By considering the sum of the squares of the residuals:

RSS := e^1 + e^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we square residuals?

A

Residuals are defined as a subtraction of the predicted values from observed values; we can rewrite RSS in the following fashion: RSS = (y − y^1)2. Some values may be negative and some may be positive and thus must square them to normalise them and ensure they make a positive contribute to RSS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the optimal purpose for B0 and B1?

A

To minimise distance from all the data points.

17
Q

What is RSS a function of and why?

A

B0 and B1 because all residuals do depend upon values of B0 and B1. Thus, we may write the RSS as depending on these quantities:

RSS(β^0, β^1) = e^12 + e^22

18
Q

The value taken by the RSS can therefore minimized for some values of β^0 and β^1.

How do we write this?

A

(β0, β1 ) := argmin RSS(β0 , β1),

Argim RSS- means argument that minimizes RSS

Where the hats on the right hand-side of the RSS have been suppressed.

19
Q

The RSS is a function of the parameters β0 and β1 therefore…

A

it can take a range of values across a two dimensional landscape

20
Q

How can we assess the accuracy/goodness of fit of our model?

A

Using the previously minimized value of the RSS

21
Q

What is one way of quantifying accuracy of model?

A

Compare RSS with total sum of squares (which can be reformulated as the sum of squares of null model as null model is model with only y intercept)

22
Q

What is R2 also known as?

A

Coefficient of determination

23
Q

What does R2 measure?

A

Proportion of variance in the dependent variable explained by the independent variable.

24
Q

For simple regression, the R2 can be shown to be equivalent to what?

A

Correlation of the IV with the DV.That is,where R2 and the square of Cor(Y , X ) are equal.

25
Q

What is a random variable?

A

A function, from a sample space Ω to the real numbers, R, such that
X : Ω ›→ R.

Uppercase X is random variable

26
Q

For every point in the sample space, ω ∈ Ω, the random variable X may (or may not) take what?

A

A different value, such that we have:
X (ω) = x.

We call x , the realization of X(random variable) at ω(point in sample space)

27
Q

What does the probability to obtain x , count?

A

The number of ω producing x , written as

P[X = x ] := P[{ω ∈ Ω : X (ω) = x }].

28
Q

What is the random variable for the toss of an even coin, with head, H, and tail, T?

A

X : {H, T } ›→ {0, 1},

with X (H) = 0 and X (T ) = 1, producing the probabilities

In other words, X has H and T as a sample space and X is going to assign to those events 0 for heads and 1 for tails

29
Q

If we have a single-faced coin, Y : {H, T } ›→ {0, 1}, such that Y (H) = 0, and Y (T ) = 1, what are the probabilities?

A

P[Y = 0] = 1, and P[Y = 1] = 0

The measure P is used to give probability mass to each element in Ω.

30
Q

What is the discrete expectation?

A

For a discrete value y, the expectation E[Y] is the sum of the value of y’s(the realizations- all the possible values taken by y over values in sample space). Finite number of values a y as this is for a discrete random variable. Weight each values by probability of obtaining those values. This is almost identical to arithmetic mean.

31
Q

What is the Arithmetic mean ?

A

A special case of expectation in which p of y are uniform across all values of possible values of y(1/n).

32
Q

In simple regression what are we given?

A

Two sequences of data points.

Each pair of observations is a case, (yi , xi ), with i = 1, . . . , n.

33
Q

What is the deterministic and stochastic part of a statistical model?

A

yi = β0 + β1xi + εi

B0 + B1xi = deterministic

ei = stochastic

34
Q

What is one difference between regression and correlation?

A

In regression one of the variables is treated as the outcome variable or dependent variable, generally denoted by the yi ’s

We will then use the other variables for predicting that outcome. As a result, the other variables are referred to as predictors, or independent variables, and are denoted by xi ’s, or features in the machine learning literature.

35
Q

What is the deterministic part of a univariate simple linear regression made up of?

A
  1. Mean expressed as a conditional expectation:

E[Y |X = xi ] = β0 + β1xi ,

  1. Variance function, expressed as a conditional variance operator:

Var[Y |X = xi ] = σ2, ∀i = 1, . . . , n.

The (unknown) parameters in this model are therefore (β0, β1, σ2).

36
Q

What are the unknown parameters in the deterministic part of a simple linear regression?

A
  1. β0 is the y -intercept of E[Y |X ], when X = 0. Thus, we have
    E[Y |X = 0] = β0.
  2. β1 is the rate of change of E[Y |X ], such that
    E[Y |X = x + 1] − E[Y |X = x ] = β1.
  3. σ2 is the (conditional) variance of Y , given X . It is strictly positive,

σ2 > 0.

37
Q

What does the stochastic part of a simple linear regression?

A

Random noise- In general, the observables or observed data, denoted by yi ’s, differ from the expected values of Y , given xi , such that

yi = E[Y |X = xi ] + εi , i = 1, . . . , n,
where the εi ’s are the statistical errors, collectively referred to as additive noise.

The εi ’s are defined as the difference between the observables and the conditional expected values –that is,
εi = yi − E[Y |X = xi ].

Geometrically, the errors correspond to the vertical distances between each yi and its conditional expectation. Note that the error terms are not observable, since they depend on the unknown parameters (β0, β1).