Week 12 Flashcards

1
Q

Regression is?

A
  • More Fiddly than other methods
  • Has more assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do Linear Regression

A
  • Not looking at differences
  • Looking at relationships
  • Regression goes further than correlation - Allows us to make predictions
  • Produces a model that allows for sophisticated exploration of relationships in variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In Second Year Stats

A
  • Looked at relationships - Correlation
  • Differences Between Groups and Within-Groups
  • Used t-tests and aNOVAs
  • Variation in Dependent Variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation

A

Allows us to estimate direction and strength of a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do Linear Regression

A
  • How well will a set of variables predict an outcome?
  • Which variable in a set of variables is the best predictor of an outcome?
  • Does a particular predictor variable predict an outcome if another variable is controlled for?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Predictor Variable

A

Same as Independent Variable in Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Outcome Variable

A

The same as the Dependent Variable in Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Model?

A
  • An approximation to the actual data
  • simple summary of data
  • Makes data easier to interpret, communicate
  • Allows us to predict data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Regression Model

A

Mathematically Describes the linear relationship
* Y = Beta X + C
* Y = Predicted valuus of the DV
* Beta = The slope of the line
* X = Scores on the Predictor (IV)
* C = The Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Intercept

A
  • Point where the function crosses the y-axis.
  • Sometimes Regression model only becomes significant when we remove the intercept, and the regression line reduces to
  • Y = b(X) + error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standardized beta (β)

A
  • Compares the strength of the effect of each IV to the DV
  • The higher the absolute value of the beta coefficient, the stronger the effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How Does Regression Work?

A
  • Linear combination of another variable don’t always have to be continuous
  • Can have a combination of variables
  • Need to find the Line of Best Fit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Line of Best Fit

A
  • Many lines produced with Regression Formula
  • How do we know what line is best?
  • Mimimises the difference between observed values and data predicted by the line
  • This is called error
  • In regression also called residuals
    * Y = b(X) + C + error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

N (Cases): k (Predictors) Ratio

A
  • Assumption about sample size
  • Need a certain number of participants to trust validity
  • Simple Linear Regression Assumption
  • Number of Cases multiplied by Predictors
  • The more Predictors we have the more cases we need for the study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Checking Linearity

A
  • checking for Linearity requires scatter plots
  • Need Scattepots between each DV & IV
  • Looking for Non-Linear evidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Check for Normality

A
  • Kolmogorov-Smirnov/Shapiro-Wilks: p > .05
  • Skewness & Kurtosis: z score is < ±1.96 then it is normal
  • Histogram follows a bell curve.
  • Detrended Q-Q Plots: Equal amounts of dots above and below the line.
  • Normal QQ Plots: Normal if dots hugging the line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Check for Univariate Outliers

Week 12 Part 2 - 10:00

A
  • Identified on Box & Whisker Plots
  • Dots indicate outliers
  • Asterisk indicates extreme cases
  • Number tells you which case is the issue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reason Univariate Outliers are Problematic

A
  • Regression Analysis gives formula for a straight line
  • A data point that stands outside other data points can change the slope of your straight line
  • This makes the line a poor predictor of the value of other data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to deal with Outliers

A
  1. Check if Outlier is a data entry error and fix it
  2. Check if outlier is from different population - Justifies removing their data
  3. Separate outliers and run different analysis
  4. Run Analysis with and without outliers and report both models
  5. Winsorization - Change values so they’re not Outliers anymore
  6. Use transformations or Bootstrapping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Winsorization

A
  • Change the score of outlier to value of 5th percentile for minimum values
  • Change the score of outlier to value of 95th percentile for maximum values
  • Slightly problematic because it changes the data
  • But this retains extremeness without removing outlier data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Bootstrapping

A
  • Uses transformations to deal with outliers
  • Creates samples from your sample
  • Uses your Mean and Standard Deviation to create another data set
  • does this repeatedly
  • This creates a large data set where extreme values are more normal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Homoscedasticity

A
  • Means Scame Scatter or Same Variance
  • Residuals are equal for all scores on the Outcome Variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Check for Normality, Linearity and Homoscedasticity

A
  • We need the residuals to behave in a certain way
  • Residuals are the difference between predicted scores and outcome variable
  • SPSS generates a Histogram Q-Q Plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Dealing with Heteroscedasticity

A
  • Check graph in SPSS
  • Check the data for any patterns
  • If dots are scattered randomly then we are all good
21
If Regression Assumptions are violated
* Check Normality of predictors, if you fix these heteroscedasitcity can dissapear * Use a transformation on the Outcome Variable * Consider using a different method like Weighted Least Squares Regression * Use some kind of Non-Linear Regression
22
Null Hypothesis for Regression
* Slope of Regression line will be equal to 0 * Beta = 0
23
Alternative Hypothesis
* Slope of the Regression line will not be Zero * Beta Not= 0
24
Running Linear Regression
**1. Analyse 2. Regression 3. Linear** 4. Move your DV into the dependent box. 5. Move your independent variable into the independent box. **6. Ok**
25
Linear Regression - *R* value
* Same as Pearson's Correlation - *p* value * Tells us strength and direction of relationship
26
Linear Regression *R* Square Value
* Tells us amount of variance in DV explained by IV * Proportion of Variance that can be explained by the variable * 23% of variability in grades explained by attendance in this example * Known to overestimate the explained variance
27
Linear Regression - *R* Square Adjusted
* *R* needs to be adjusted to be smaller than *R* squared * Corrects bias of overestimated explained variance * Useful as Goodness of Fit Statistic
28
Goodness of Fit Statistic
* Determines how well sample data fits a distribution from a normal population * Determines if a sample is skewed or normal in the actual population
29
Regression ANOVA
* Uses the df, *F* value and the *p* value * Compares error rate with line of best fit and the error rate of the baseline model of 0 * ANOVA is significant if it is "better" than the baseline
30
Unstandardised Coefficient
* The Slope of the Regression Equation * Amount of change in a Dependent Variable due to a change of an Independent Variable * This is the Beta coefficient e.g. each unit of attendance is associated with 1.88 unit of increase in grades
31
Coefficient t-tests
* Check if IV is a significant predictor of the DV * Become more relevant when we start adding more predictors
32
Standardised Coefficients
* A measure of the effect size * Useful for multiple Regression * Important when we have more than one Predictor * Predictors often measured in different scales * e.g, IQ Points, Classes attended, additional study time
33
Dealing with Multiple Predictors
* Most commonly found in Research Projects * Allows us to predict the outome variable from more than one predictor * Answers how well does combination predictors predict the outcome **Y = b1(X1) + b2(X2) + C + error**
34
Univariate OUtliers
Outlier on one variable
35
Multivariate Outlier
Outlier on a combination of variables
36
Assumptions with Regression
* Normality * Univariate Outliers * Multivariate Outliers * Multicollinearity * Normality, Linearity & Homosedasticity of residuals
37
Multicollinearity
* Two or more IV's highly correlated in regression * IV can be predicted from another IV in a regression model.
38
How to check for Multivariate Outliers
Mahalonobis Distance
39
Mahalanobis Distance
* largest value should not be greater than the critical 𝜒2 value for df = k at 𝛼= .001. * Where k = the number of predictors. * Use same table as Cook's Distance * For simplicity use table below:
40
Cooks Distance
* Tells you if there are cases that influence the regression line * Use same table as Mahalanobis Distance * rule of thumb is if Cook’s **D is > 1** you have influential cases. * Dealt with the same way as Univariate Oultiers
41
Check for Multicollinearity
* Pearson's Correlations between IV * if *i* > .85 then there is multicollinearity * **Tolerance:** Values < .1 are multicollinear; < .2 warrant a closer look * **VIF:** Values > 10 are clearly Multicollinear; > 5 warrant a closer look * If you find a problem then remove the offending variable * If they are so closely related then they are basically the same thing. treat as one variable.
42
Check for Multivariate Outliers
* Use Residual Statistics Table * First, we find the critical 𝜒2 for a model with 4 predictors: 𝜒2 = 18.467 - Check the Mahalanobis Distance Table * Use Mahal. Distance Maximum (13.803 here) * 13.803 < 18.467 Therefore there are no multivariate outliers. * Cooks D is < 1 so there are no influential cases
43
Interpreting Multiple Regression
* Use the Variables Entered/Removed Table - Tells you how many predictors are in the model (4) * Then Model Summary Table * *R* is not just Person's *R* anymore * It is correlation between actual scores and predictions in the regression equation * *R square* = Proportion of variance in DV Accounted for y combined predictors * Again *R square* Adjusted is a corrected version of *R square* that accounts for the positive bias.
44
Interpreting Multiple Regression ANOVA
* Now tests the comBination of predictors * A significant predictor of GHQ * The table has the df, the F value, and the p-value.
45
Interpreting Multiple Regression Coefficients
* Unstandardized coefficient is the slope of the regression * Shows each unit increase in one of the independent variables is associated with a b unit increase in GHQ * All other IVs are kept constant * Beta values = Standardised regression coefficients * Allow direct comparison of regression coefficients. * Displayed in units of standard deviation.
46
Interpreting Multiple Regression Standardised Coefficients
* t-values and p-values test the significance of the unique contribution of each predictor * Changes depending on predictors included in the model.
47
Multiple Regression Tolerance & VIF
* **Tolerance:** values < .1 are multicollinear; < .2 warrant closer inspection. * **VIF:** values > 10 are clearly multicollinear; > 5 warrant closer inspection.
48
Remove Non-Significant Predictors
* If you have a predictor that is not reflecting anything it makes the model worse * This changes the numbers slightly * Only have significant predictors in the model
49
Applied look at Regression Equation
* Our general form for the regression is: Y = b1(X1) + b2(X2) + b3(X3) + C + error * And if we take this equation and substitute in our variables we get: GHQ = b1(neuroticism) + b2(state-anxiety) + b3(trait-anxiety) + C + error GHQ = .555(Neuroticism) + .318(state-anxiety) + .471(trait-anxiety) + 13.552 + error
50
What is the value for *R*?
* Correlation between theDV & IV * Value greater than 0.4 is taken for further analysis.
51
What does *R* tell us?
The strength & direction of the relationship
52
What does the value of *R*2 Adjusted tell the researcher
* Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
53
What does the value of *R*2 Adjusted tell the researcher
* Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
54
What does the value of *R*2 Adjusted tell the researcher
* Tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.
55
What does the value of *R*2 Adjusted tell the researcher
* Tells you the percentage of variation explained by only the independent variables * Those that actually affect the dependent variable.