Week 4: Multiple Regression Flashcards

Question

Like HR, forced entry MR relies on

Answer 1

good theoretical reasons for including the chosen predictors,

Answer 2

makes no decision about the order in which variables are entered.

Answer 3

this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.

Answer 4

Analyse --> Linear --> Regression Put outcome in DV and IVs (predictors, x) in IV box Can select a range of statistics in statistics box and press okay to check colinearity assumption Can also click plots to check assumptions of homoscedasticity and lineartiy

Answer 5

This option is for obtaining collinearity statistics such as the VIF, tolerance, Checking assumption of no multicolinearity

Answer 6

strong correlation between two or more predictors in a regression model.

Answer 7

simple regression requires only one predictor.

Answer 8

e.g., two predictors are perfectly correlated , have a correlation coefficient of 1

Answer 9

to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.

Answer 10

real-life data

Answer 11

interchangable

Answer 12

unavoidable

Answer 13

* Untrustory bs * Limit size of R * Importance of predictors

Answer 14

Multicollinearity between predictors makes it difficult to assess the individual importance of a predictor. If the predictors are highly correlated, and each accounts for similar variance in the outcome, then how can we know which of the two variables is important? Quite simply we can’t tell which variable is important – the model could include either one, interchangeably.

Answer 15

a correlation matrix of all of the predictor variables and see if any correlate very highly (by very highly I mean correlations of above .80 or .90)

Answer 16

variance inflation factor (VIF) and tolerance

Answer 17

predictor has a strong linear relationship with the other predictor(s).

Answer 18

potential problem of multicolinearity

Answer 19

look at your variables to see if you need to include all variables whether all need to go in model if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model

Answer 20

reciporal (1/VIF) = inverse of VIF

Answer 21

issue with multicolinerity

Answer 22

ZRESID on Y and ZPRED on X Plot of residuals against predicted to asses homoscedasticity

Answer 23

(the standardized predicted values of the dependent variable based on the model). These values are standardized forms of the values predicted by the model.

Answer 24

(the standardized residuals, or errors). These values are the standardized differences between the observed data and the values that the model predicts).

Answer 25

* basics means and also a table of correlations between variables. * This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity

Answer 26

variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.

Answer 27

measure of how much of the variability in the outcome is accounted for by the predictors

Answer 28

fit in the general population

Answer 29

assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells) value closer to 2 the better = assumption met

Answer 30

F-tests for each model

Answer 31

significantly beter at predicting the outcome than using the mean as a 'best guess'

Answer 32

improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model

Answer 33

improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome

Answer 34

total difference between the model and the observed data

Answer 35

number of predictors (e.g., 1 for first model, 3 for second)

Answer 36

Number of observations (N) minus number of coefficients in regression model (e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)

Answer 37

calculated for each term (SSM, SSR) by dividing the SS by the df. T

Answer 38

F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average difference between the model and the observed data (MSR)

Answer 39

greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change

Answer 40

there is a positive relationship between the predictor and the outcome,

Answer 41

represents a negative relationship between predictor and outcome variable?

Answer 42

Indicating positive relationships so as advertising budget increases, record sales increases (outcome) plays on ratio increase as do record sales attractiveness of band increases record sales

Answer 43

predictor affects the outcome if the effects of all other predictors are held constant:

Answer 44

(b = 0.085): This value indicates that as advertising budget (x) increases by one unit, record sales (outcome, y) increase by 0.085 units. This interpretation is true only if the effects of attractiveness of the band and airplay are held constant.

Answer 45

not dependent on the units of measurements of variables

Answer 46

the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor.

Answer 47

a better insight into the ‘importance’ of a predictor in the mode

Answer 48

both variables have a comparable degree of importance in the model

Answer 49

advertising budget increases by one standard deviation (£485,655), record sales increase by 0.511 standard deviations. This interpretation is true only if the effects of attractiveness of the band and airplay are held constant

Answer 50

95% of these sampels these boundaries containn true value of b

Answer 51

true (pop) value of b

Answer 52

value of b in this sample is close to the true value of b in the populatio

Answer 53

in some samples the predictor has a negative relationship to the outcome whereas in others it has a positive relationship

Answer 54

two best predictors (advertising and airplay) have very tight confidence intervals indicating that the estimates for the current model are likely to be representative of the true population values interval for attractiveness is wider (but still does not cross zero) indicating that the parameter for this variable is less representative, but nevertheless significant.

Answer 55

Pearson's correlation coefficients

Answer 56

represent the relationships between each predictor and the outcome variable, controlling for the effects of the other two predictors.

Answer 57

represent the relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome. representing the unique relationship each predictor has with otucome

Answer 58

variance of outcome explained by predictors divided by total (A+C)/(A+B+C+E)

Answer 59

unique variance in outcome (ignore all other predictors) explained by predictor divided by variance in outcome not explained by all other predictors A/A+E

Answer 60

unique variance in outcome explained by predictor divided by total variance in outcome A/A+B+C+E

Answer 61

entered into the model.

Answer 62

may be biased

Answer 63

serious problem.

Answer 64

a potential problem

Answer 65

For our current model the VIF values are all well below 10 and the tolerance statistics all well above 0.2; therefore, we can safely conclude that there is no collinearity within our data.

Answer 66

summary of residuals statistics to be examined of extreme cases To see whether individual scores (cases) influence the modelling of data too much

Answer 67

less than -2 or greater than 2 (We expect about 5% of our cases to do tha and 95% to have standardised residuals within about +/- 2.)

Answer 68

10 cases (5% of 200)

Answer 69

* 99% of cases should lie within ±2.5 so expect 1% of cases lie outside limits * From cases listed, clear two cases (1%) lie outside of limits (case, 164 [investigate further has residual 3] and 179) - 1% which isconform to accurate model

Answer 70

broken the assumptions of the regression

Answer 71

investigate and potentially remove them because they are ‘outliers’

Answer 72

* Continous outcome variable and continous or dichotomous predictor variables * Independence = all values of outcome variable should come from different participant * Non-zero variance as predictors should have some variation in value e.g., variance ≠ 0 * No outliers * No perfect or high collinearity * Histogram to check for normality of errors * Scatterplot of ZRES against ZPRED to check for linearity and homoscedasticity = looking for random scatter * Independent errors (Durbin-Watson)

Answer 73

undue influence on a predictor’s b coefficient

Answer 74

partial plots as well

Answer 75

the partial plot shows the strong positive relationship to album sales. There are no obvious outliers and the cloud of dots is evenly spaced out around the line, indicating homoscedasticity.

Answer 76

the plot again shows a positive relationship to album sales, but the dots show funnelling, There are no obvious outliers on this plot, but the funnel-shaped cloud indicates a violation of the assumption of homoscedasticity.

Answer 77

you cannot generalize your findings beyond your sample

Answer 78

transforming the raw data – but this won’t necessarily affect the residuals!

Answer 79

logistic regression instead

Answer 80

37.4% of the variance in productivity scores was accounted for by 3 predictor variables

Answer 81

if we assumed no relation between predictor variables and outcome variable – flat regression line no association between these variables)

Answer 82

holidays had standardized beta coefficient of 0.031 whereas cake had a much higher standardized beta coefficient of 0.499 which tells us that amount of cake given out much better predictor of productivity than the amount of holidays taken For pay we have a beta coefficient of 0.323 which tells us that pay was also a pretty good predictor in the model of productivity but slightly less than cake

Answer 83

- P value for holidays is 0.891 which is not significant - P value for cake is 0.032 is significant - P value for pay is 0.012 is significant

Answer 84

o All below 10 here showing we are unlikely to have a problem with multicollinearity so we can not worry about that for this data

Answer 85

another predictor - block 2 of 2

Answer 86

baseline not M1

Answer 87

change statistics

Answer 88

M2 explains an extra 7.5% which is sig

Answer 89

values would vary across different samples, and these standard errors are used to determine whether or not the b-value differs significantly from zero

Answer 90

significantly different from 0. I

Answer 91

slope of the regression line is significantly different from horizontal, but in multiple regression, it is not so easy to visualize what the value tells us.

Answer 92

predictor is making a sig contribution to model

Answer 93

predictor is making a significant contribution to the model.

Answer 94

contribution of that predictor.

Answer 95

For this model, the advertising budget (t(196) = 12.26, p < .001), the amount of radio play prior to release (t(196) = 12.12, p < .001) and attractiveness of the band (t(196) =4.55, p < .001) are all significant predictors of record sales. From the magnitude of the t-statistics we can see that the advertising budget and radio play had a similar impact, whereas the attractiveness of the band had less impact.

Answer 96

one DV (usually denoted as Y) and a series of other variable (known as IV)

Answer 97

we are talking about a variable with a infinante number of real numbers within a given interval so something like height or age

Answer 98

variable that can only hold two distinct values like male and female

Answer 99

line of best fit in MR

Answer 100

one or two outliers then could be okay

Answer 101

are over 3 SD from the mean

Answer 102

-3 and 3 SD

Answer 103

Weight, Activity, and the interaction between them are statistically significant

Answer 104

Homoscedasticity: similar variance of residuals (errors) across the variable continuum, e.g. equally accurate. Heteroscedasticity: variance of residuals (errors) differs across the variable continuum, e.g. not equally accurate

Answer 105

your distribution

Answer 106

normally distributed errors/residuals

Answer 107

* 0 = errors between pairs of obsers are pos correl * 2 = independent error * 4 = errors between pairs of observs are neg correl

Answer 108

1.5 and 2.5

Answer 109

‘generalizes’ to the entire population.

Answer 110

for small N and where results are to be generalized use the adjusted R2

Answer 111

1. Standard: To assess impact of all predictor variables simultaneously 2. Hierarchical: To test predictor variables in a specific order based on hypotheses derived from theory 3. Stepwise: If the goal is accurate statistical prediction from a large number of predictor variables – computer driven

Answer 112

* Tells that OCD interpretiotn of intrustrions would have not have a significant impact on model's ability to predict social anxiety Beta value of Interpretation of Intrusions is very small, indicating small influence on outcome variable Beta is the degree of change in the outcome variable for every 1 unit of change in the predictor variable.

Answer 113

When predictor variables correlate very highly with each other

Answer 114

Normality of residuals

Answer 115

The t-statistic is equal to the regression coefficient divided by its standard deviation

Answer 116

The residual error in the prediction of fear scores when both gender and fantasy proneness are included as predictors in the model.

Answer 117

The improvement in the prediction of depression by fitting the model

Answer 118

Regression assumptions that have been met

Answer 119

Somewhere between −3.369 and −0.517

Answer 120

Stress from research

Answer 121

As stress from teaching increases by one unit, burnout decreases by 0.36 of a unit.

Answer 122

No, because the errors show heteroscedasticity.

Answer 123

Note that you expect 1% of cases to lie outside this area so in a large sample, if you have one or two, that could be ok

Answer 124

A record company boss was interested in predicting album sales from advertising. Data 200 different album releases Outcome variable: Sales (CDs and Downloads) in the week after release Predictor variables The amount (in £s) spent promoting the album before release Number of plays on the radio

Answer 125

observed values of the outcome, and the values predicted by the model.

Answer 126

Difference between no predictors and model 1 (a). Difference between model 1 (a) and model 2 (b). Our model 2 is significantly better at predicting the value of the outcome variable than the null model and model 1 (F (2, 197) = 167.2, p<.001) and explains 66% of the variance in our data (R2=.66

Answer 127

y = 0.09x1 + 3.59x2 + 41.12 For every £1,000 increase in advertising budget there is an increase of 87 record sales (B = 0.09, t = 11.99, p<.001). For every number of plays on Radio 1 per week there is an increase of 3,589 record sales (B = 3.59, t = 12.51, p<.001).

Answer 128

o R squared = 0.09 o F statistic = 22.54 o P value = p < 0.001

Answer 129

D - data poiints show random pattern

Answer 130

A -->The R square change in step 2 was .020,

Answer 131

A fashion student was interested in factors that predicted the salaries of catwalk models. He collected data from 231 models. For each model he asked how much they earned per day (salary), their age (age), and how many years they had worked as a model (years_modelling). The student wanted to know if the number of years spent modelling predicted the models' salary after the models' age was taken into account.

Answer 132

Somewhere between 3.369 and 0.517

Week 4: Multiple Regression Flashcards

(171 cards)