normal linear models Flashcards

Question 1

Q

the line of best fit

Answer

A

The line of best fit minimizes the distance to each point

For any line we can do the sum of squared residuals, which is a measure of how close all the points are on to the line of best fit.

Basically, the line of best fit is the line that minimises the sum of the squared residuals.

Question 2

Q

equation of a line

Answer

A

The equation of a line between 2 variables X and Y is
Y = β 0 + β1 X,

(Y = intercept term + slop term multiplied by X/ where β1 is the slope and the β0 is the intercept.)

Question 3

Q

what does the slope term tell us

Answer

A

The slope term tells us how much the value of the Y variable changes as X increases by 1 unit.

Question 4

Q

what does the intercept tell us

Answer

A

The intercept tells where the line crosses the Y axis when X is zero.

Question 5

Q

residuals

Answer

A

residuals (represented by Ɛi) which is the distance between the line and the point.

Question 6

Q

what is the distance between yi and β0 + β1 xi is denoted by

Answer

A

ϵi.

These are known as residuals.

Question 7

Q

so how do we find the line of best fit?

Answer

A

Overall, we find the line of best fit by minimizing the sum of squared residuals. ∑i=1nϵ2i

Question 8

Q

Simple linear regression as a statistical model

Answer

A

We have some outcome variable (also known as the dependent variable, measurement variable, etc) and a single predictor variable (also known as the independent variable, explanatory variable, etc).

Rather than just saying that simple linear regression is finding a line that best fits a sample of points from these two variables, we say that it is a statistical model describing the general relationship between the predictor and the outcome variable and we are fitting that model to our data.

Question 9

Q

normal linear model

Answer

A

• For every value of the predictor variable, the distribution over the outcome variable is normally distributed.

As the value of the predictor variable changes, there is linear change in the mean of the distribution of the outcome variable. In other words,
Linear = changes by a proportional amount

Question 10

Q

mean of outcome =

Answer

A

mean of outcome = linear function of predictor,

mean of outcome = β0 + β1 (intercept term + slope term) × predictor

Question 11

Q

what is a normal distribution

Answer

A

A probability distribution over a continuous variable.

The normal, or Gaussian, distribution is a probability distribution over a continuous random variable. It has two parameters: The mean, usually denoted by μ, and the variance, usually denoted by σ2. We will denote a normally distributed random variable with mean μand variance σ2 by X∼N(μ,σ2).

Question 12

Q

what does μ mean

Answer

A

Location parameter Mew/ mean/median/mode

Question 13

Q

what does σ mean

Answer

A

Sigma parameter mew/ standard deviation.

Sigma tells us the width of the normal distribution, so the larger the value of sigma, the wider that normal distribution is.

Question 14

Q

The area under any range of values of the normal distribution can be worked using formulas.

Answer

A

o Around 2/3 of the area under the normal distributions is within 1 standard deviation above/below the mean.
o Around 95% of the area under the normal distributions is within 2 standard deviation above/below the mean.
o Around 99% of the area under the normal distributions is within 2.5 standard deviation above/below the mean.

Question 15

Q

What are linear functions?

Answer

A

• If Y is a linear function of X, if X changes by a certain amount, Y changes by a constant proportion of that amount.
• For one dependent and one independent variable, the linear equation is
Y = β0 + β1 X

• For example, if β0 = 1 and β1 = 2, then if X = 10,
Y = 1 + 2 × 10 Y = 21
• If we increase X by 1 to X = 11 , we have
Y = 1 + 2 × 11 Y = 23

Question 16

Q

Linear functions with multiple independent variables?

Answer

Study These Flashcards

A

If Y is a dependent variables and we more than one independent variables, e.g., we have two independent variables X1, X2, then if there is linear function between the independent variables and Y, as any one of the independent variables changes by a certain amount then Y changes by a constant proportion of that amount.
e.g. if we change X1 by a certain amount, then Y changes by a constant proportion of that amount

Question 17

Q

Simple linear regression: Model

Answer

Study These Flashcards

A

We have n observations, and each one is indexed by i ∈ 1, 2 … n. (i = observation, so observation 1, 2..)
The outcome variable for observation i is yi.
The predictor variable for observation i is xi.
Then the formula for the normal linear model is as follows: For all i ∈ 1, 2…n,

Yi ∼ N(Ci,σ2), outcome variable (Y) is normally distributed (N) with a mean (μi , and the standard deviation is squared)

μi = β0 + β1xi. (that mean is a linear function of the predictor variable)

Question 18

Q

terciles

Answer

Study These Flashcards

A

Let’s look at the distribution of weight for each tercile of height. The height tercile is the grouping of the height variable into three groups. The first group is from the minimum height to the height at the 33rd percentile. The second group is from the 33rd to the 67th percentile. The third group is from the 67th to the maximum height. Basically, we can see the terciles as the groupings of those of low, medium, and tall heights.

normal linear models Flashcards

(18 cards)