normal linear models Flashcards

(18 cards)

1
Q

the line of best fit

A

The line of best fit minimizes the distance to each point

For any line we can do the sum of squared residuals, which is a measure of how close all the points are on to the line of best fit.

Basically, the line of best fit is the line that minimises the sum of the squared residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

equation of a line

A

The equation of a line between 2 variables X and Y is
Y = β 0 + β1 X,

(Y = intercept term + slop term multiplied by X/ where β1 is the slope and the β0 is the intercept.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does the slope term tell us

A

The slope term tells us how much the value of the Y variable changes as X increases by 1 unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does the intercept tell us

A

The intercept tells where the line crosses the Y axis when X is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

residuals

A

residuals (represented by Ɛi) which is the distance between the line and the point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the distance between yi and β0 + β1 xi is denoted by

A

ϵi.

These are known as residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

so how do we find the line of best fit?

A

Overall, we find the line of best fit by minimizing the sum of squared residuals. ∑i=1nϵ2i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Simple linear regression as a statistical model

A

We have some outcome variable (also known as the dependent variable, measurement variable, etc) and a single predictor variable (also known as the independent variable, explanatory variable, etc).

Rather than just saying that simple linear regression is finding a line that best fits a sample of points from these two variables, we say that it is a statistical model describing the general relationship between the predictor and the outcome variable and we are fitting that model to our data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

normal linear model

A

• For every value of the predictor variable, the distribution over the outcome variable is normally distributed.

As the value of the predictor variable changes, there is linear change in the mean of the distribution of the outcome variable. In other words,
Linear = changes by a proportional amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mean of outcome =

A

mean of outcome = linear function of predictor,

mean of outcome = β0 + β1 (intercept term + slope term) × predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a normal distribution

A

A probability distribution over a continuous variable.

The normal, or Gaussian, distribution is a probability distribution over a continuous random variable. It has two parameters: The mean, usually denoted by μ, and the variance, usually denoted by σ2. We will denote a normally distributed random variable with mean μand variance σ2 by X∼N(μ,σ2).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does μ mean

A

Location parameter Mew/ mean/median/mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does σ mean

A

Sigma parameter mew/ standard deviation.

Sigma tells us the width of the normal distribution, so the larger the value of sigma, the wider that normal distribution is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The area under any range of values of the normal distribution can be worked using formulas.

A

o Around 2/3 of the area under the normal distributions is within 1 standard deviation above/below the mean.
o Around 95% of the area under the normal distributions is within 2 standard deviation above/below the mean.
o Around 99% of the area under the normal distributions is within 2.5 standard deviation above/below the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are linear functions?

A

• If Y is a linear function of X, if X changes by a certain amount, Y changes by a constant proportion of that amount.
• For one dependent and one independent variable, the linear equation is
Y = β0 + β1 X

• For example, if β0 = 1 and β1 = 2, then if X = 10,
Y = 1 + 2 × 10 Y = 21
• If we increase X by 1 to X = 11 , we have
Y = 1 + 2 × 11 Y = 23

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Linear functions with multiple independent variables?

A

If Y is a dependent variables and we more than one independent variables, e.g., we have two independent variables X1, X2, then if there is linear function between the independent variables and Y, as any one of the independent variables changes by a certain amount then Y changes by a constant proportion of that amount.
e.g. if we change X1 by a certain amount, then Y changes by a constant proportion of that amount

17
Q

Simple linear regression: Model

A
  • We have n observations, and each one is indexed by i ∈ 1, 2 … n. (i = observation, so observation 1, 2..)
  • The outcome variable for observation i is yi.
  • The predictor variable for observation i is xi.
  • Then the formula for the normal linear model is as follows: For all i ∈ 1, 2…n,

Yi ∼ N(Ci,σ2), outcome variable (Y) is normally distributed (N) with a mean (μi , and the standard deviation is squared)

μi = β0 + β1xi. (that mean is a linear function of the predictor variable)

18
Q

terciles

A

Let’s look at the distribution of weight for each tercile of height. The height tercile is the grouping of the height variable into three groups. The first group is from the minimum height to the height at the 33rd percentile. The second group is from the 33rd to the 67th percentile. The third group is from the 67th to the maximum height. Basically, we can see the terciles as the groupings of those of low, medium, and tall heights.