L21 Part 2 - Single regression (chapter 8 part 1) Flashcards by Victoria Kubickova

What is linear regression?

Models the relationship between a scalar dependent variable y and one or more explanatory variables x
↪ outcome = model prediction + error

One explanatory variable → single linear regression
It models it using linear predictor functions whose unknown model parameters are estimated from the data

How well did you know this?

Not at all

Perfectly

What is the formula for linear regression?

Picture 1 - expresses how our model predicts (is there more accuracy or error?)
Y - outcome variable
Bs - parameters and they represent what we’re interested in (when the predictor is 0)
- B0 - intercept, baseline level that we are predicting with
- B1 - regression coefficient for our single predictor variable and it quantifies how strong the association is between our predictor and outcome variable
↪ we multiply this with our predictor variable X and this product gives us the model prediction
↪ to calculate B1, we use correlation between the two variables, so the higher the correlation the stronger the predictive value the predictor has

when we add hats to the bs they are estimates of the population using a sample

E - erros in the prediction in our sample model (residuals)

How well did you know this?

Not at all

Perfectly

Assumptions for linear regression

picture 14 - general procedure of fitting a regression model but it shows what to do with each assumption

Continous variables
linearity
independent errors of the observations
Sensitivity (outliers)
Homoscedasticity (equivalent to equal variances in anova)
Normality (model residuals are normally distributed; visualised with QQ plots)

How well did you know this?

Not at all

Perfectly

What is linearity?

For this assumption to hold, the predictors must have a linear relation to the outcome variable
- checked through: correlations, matrix scatterplot with predictors and outcome variable

How well did you know this?

Not at all

Perfectly

What is sensitivity?

Potential influence of outliers
We look at outliers through:
- Extreme residuals
- Cook’s distance
- Check Q-Q, residuals plots, casewise diagnostics (cook’s distance)

How well did you know this?

Not at all

Perfectly

What is the difference between unstandardized residuals and standardized residuals?

Residuals represent the error present in the model (small residual = model fits the sample data well)
Unstandardized residuals - raw differences between predicted and observed values of the outcome variable
↪ measured in the same units as the outcome variable which makes it difficult to generalize
Standardized residuals - residuals converted to z-scores and so are expressed in SD units (mean 0, sd 1)

With standardized residuals we can assess which data points are outside of the general pattern of the data set

How well did you know this?

Not at all

Perfectly

Diagnostic statistics

What is leverage?

It gauges the influence of the observed value of the outcome variable over the predicted values
- Defined as (k+1)/n, k is the number of predictors in the model, n is the number of cases
- Can vary from 0 (no influence) to 1 (the case has complete infleunce over predictions)
- If no cases exert undue influence over the model all leverage values should be close to the average value of (k+1)/n
- Those values greater than twice the average should be investigated

How well did you know this?

Not at all

Perfectly

What is cook’s distance?

A measure of the overall infleunce of a case on the model
↪ compute for every observation separately and it assesses to what extent our result would change with and without that observation (outlier - removing or not this participant affects our data and results in high cook’s distance)
↪ Cook’s distance should be < 1 for this assumption to be met
↪ combines the point’s leverage and its residual; point with high leverage and high residual will have a large cook’s distance, so strong influence on the fitted values across the model

How well did you know this?

Not at all

Perfectly

How does outlier affect our correlation?

Picture 2
The correlation is higher when the outlier is removed because it’s not a follower of the general pattern of the data

How well did you know this?

Not at all

Perfectly

What do we do when there is an outlier?

We always must follow up on this outlier and investigate why there is this outlier, we should never just remove it from our data
- see them as a source of information not as annoyance

How well did you know this?

Not at all

Perfectly

What is homoscedasticity? How do we assess it?

Variance of residuals should be equal (equally distributed) across all expected values → no systematic errors
- Assess by looking at scatterplot of standardized residuals: predicted values vs. residuals → roughly round shape is needed (spread out points equally, no pattern in the errors)
- After the analysis is complete because it’s based on the residuals
- picture 2

How well did you know this?

Not at all

Perfectly

What is cross-validation? Why do we use it?

Assessing the accuracy of a model across different sample
- To generalise, model must be able to accurately predict the same outcome variable from the same set of predictors in a different group of people
- If the model is applied to a different sample and there is a severe drop in its predictive power, then the model does not generalise

How well did you know this?

Not at all

Perfectly

How large should be our sample?

It depends on the size of effects that we’re trying to detect and how much power we want to detect these effects

How well did you know this?

Not at all

Perfectly

Why does it matter what size of the sample we have in terms of R?

The estimate of R that we get from regression is dependent on the number of predictors, k, and the sample size, N
- the expected R for random data (should be 0) is k/(N−1) and so with small sample sizes random data can appear to show a strong effect
- E.g. 6 predictors, 21 cases of data; R = 6/(21-1) = 0.3

How well did you know this?

Not at all

Perfectly

What can we do as a first step in the analysis of our data?

Create a scatterplot to see whether the data are somehow associated and in which direction
- The strength of the correlation decided later, this is just to get an idea of the data

How well did you know this?

Not at all

Perfectly

What is the regression line/least squares line?

Study These Flashcards

A line that is as close as possible to all of our points (picture 15 - orange line the least squares line and the blue line null model so if we don’t use any predictors to estimate y)
- calculated with the regression formula using the regression weights (bs) that minimize the sum of squares
- it’s the optimal fit to our data points using regression weight
- it’s our model error (unexplained variance) = because the orange line is the distance between the data points and what we predict based on our regression formula

How do we get to the model sums of squares?

Study These Flashcards

We can turn the model error sums of squares into the proportion against the total variance (e.g. 7.68/10.24 = 0.75) and then take the complement of that (e.g. 1-(7.68/10.24) = 0.25) and this represents the proportion of the explained variance
- the trick is that if we were to take the square root of that we would get the same number as the correlation between our two variables (0.5) → this works because we only have two predictors but once we increase the number of predictors, it’s not gonna be the same

Look at picture 15 to see the numbers

How do we calculate b1?

Study These Flashcards

Picture 3
It’s based on the correlation between the two variables
- it’s a type of standardised statistic but it’s not bound between 1 and -1
- quantifies how strong the association is between our two variables (no association - b1 would be 0)

B1 is the slope coefficient - determines the slope of the line (positive - increasing slope, negative - decreasing slope, close to 0 - no association)

How do we calculate b0?

Study These Flashcards

Picture 4
Based on the average of our outcome variable and we subtract our b1 multiplied by the average of our predictor variable
It determines at what point do we cross the Y axis when the predictor is 0
- if our predictor is 0 what does our model predict for the outcome? So then we just predict the average value of y

What is the interpretation of the slope coefficient?

Study These Flashcards

Represents the change in the outcome associated with a unit change in the predictor
- If we increase our predictor by 1 unit, we predict the outcome variable to increase by b1 units
Picture 5 and picture 6

Now that we have b0 and b1, what can we do?

Study These Flashcards

We can quantify the prediction for the outcome variable based on the value of x (airplay in picture 7) and then see how close are our predictions to our observed data (y with a hat vs y)
- Our regression line tells us what our model predicts (the same thing that group means told us in anova, but now we have a continous variable so it’s a regression line)

Now that we have our predictions what is the next step?

Study These Flashcards

We can look at the error/residuals which is the difference between the model predictions and observed values
- and now we can test for homescedasticity

How do we test for homoscedasticity?

Study These Flashcards

Using a scatterplot of whether there is an association between the size of our model error and the size of the prediction (predictions vs residuals)
- we want 0 correlation - no systematic error going on
- picture 8

How do we compare our observed data to our predicted ones?

Study These Flashcards

We create a scatter plot (picture 16) and calculate the correlation, i.e. fit of the model (!in multiple regression the slope in the scatter plot is not equal to the correlation of the model fit!)
- the stronger the correlation, the better our model did

How can we calculate the explained variance?

Squaring this correlation gives the proportion of variance in the outcome variable that is explained by the predictor

How can we see the varince visually?

Picture 12 - it has my annotations but it's basically the same thing it was in ANOVA just related to r^2 HOW EXCITING right?!?!?! <3

How do we test for significance using r^2 (explained variance)?

We can convert the proportions of explained and unexplained variance to F statistic so that we can attach p-value to it - F can again be seen as a proportion of explained to unexplained variance (signal-to-noise ratio) - where df of the model = n - p - 1 = N - K - 1 ↪ n - sample size, p - number of parameters (bs) estimated - picture 9 - H0: R^2 = 0 (there is no improvement in the sum of squaeed error due to fitting the model)

What is another way to get to r^2?

In terms of sums of squares how we did in ANOVA 1. Calculate the residual sums of squares (error in prediction) from the observed values of the outcome (Y) and predicted values, i.e. model (Y with a hat) - picture 11.1 2. We calculate the total sums of squares (total variance) - how good the *mean* (Y with a line above) is as a model of the observed outcome scores (Y) - picture 11.2 3. We can use those two to calculate how much better the linear model (SSR) is than the baseline model of no relationship (SST) - picture 11.3 4. The difference shows us the reduction in the accuracy of the model resulting from fitting the regression model to the data - model sums of squares 5. We can calculate r^2 which is the proportion of the explained variance, i.e. SSM - picture 11.4 6. Turn it to percentage and we get the percenateg of the variation in the outcome that can be predicted by the model 7. Take a square root of r^2 and we get the correlation coefficient which represents the size of the model fit

What is one problem with the R^2 when we have multiple predictors?

It increases as a function of the number of predictors because unless a predictor has no relationship whatsoever with the outcome its inclusion will increase the variance in the outcome explained by the model (SSM) - the same thing as we did with mean squares in ANOVA to adjust for the sample size

How do we solve this problem with R^2?

We use adjsuted R^2 - picture 11.5 - It adjusts for the sample size*n*, and the number of predictors *k* - Doesn't tell us anything about how well the model would predict scores of a different sample of data from the same population, but it gives a better estimate (than the unadjusted value) of the R^2 in the population - Said to indicate loss of predictive power or shrinkage

What other way is there to hypothesis testing using bs this time?

We can also convert each *b* to a t-statistic, since that has a known sampling distribution - picture 10 - Where *b* is the beta coefficient, *SE* is the standard error of the beta coefficient, *n* is the number of subjects and *p* the number of predictors *μb* is the null-hypothesized value for *b*- usually set to 0 - H0: b = 0 (if there is no association between a predictor and outcome variable, we'd expect the parameter for that predictor to be 0) - HA: b ≠ 0 - We can then find out the corresponding p-value

The magical moment about this...

... the F statistic and squared t statistic have the exact same value WHAAAAAT

Why is F not the same as t^2 when we have multiple regression?

In multiple regression (with more than one predictor), the F-statistic tests the overall model fit, while the t-statistic tests the contribution of each individual predictor. Thus, in multiple regression, t^2 ≠ 𝐹 because the F-statistic aggregates the effects of all predictors, while the t-statistic isolates a single predictor's effect

How many different ways do we have for assessing an association?

Picture 13

L21 Part 2 - Single regression (chapter 8 part 1) Flashcards

(34 cards)