Flashcards in Multiple regression and logistic regression Deck (16)

Loading flashcards...

1

## Describe multiple regression

###
with a single explanatory variable we would carry out a simple linear regression.

With several explanatory variables we can use a more general form of regression model that allows more than one explanatory variable at a time

2

## What are the assumptions for multiple regression?

###
residuals are normally distributed

residuals have mean 0

residuals have constant variance

observations are independent

error-free measurement of predictors

3

## What does the regression line indicate?

###
The regression line indicated the nature of the relationship between the explanatory variables and the response

4

## What does the coefficient for each variable?

### it gives the expected change that an increase in one unit would give to the response assuming that the other variables do not change

5

## What can the coefficients' standard errors can be used for?

### to create confidence intervals for the coefficients

6

## What is the Ho in multiple regression?

### each coefficient = 0

7

## Why is the adjusted R squared used instead of the raw value in multiple regression?

### The normal R squared will increase whenever another variable is added to the model regardless of whether the variable has any predictive ability

8

## Describe adjusted R squared

###
most commonly used selection criteria

it adjusts the estimated proportion of variance explained by the model

9

## What are the other selection criteria used in multiple regression?

###
predicted R squared

Mallow's C-p

Principle of parsimony

10

## describe the predicted R-squared

###
cross validation

takes over fitting into account

11

## Describe Mallow's C-p

### penalises for having too many variables, the model with minimum C-p is most effective

12

## Describe the principle of parsimony

### keep it simple

13

## What are model searching methods?

###
stepwise regression - usually based on forwards selection or backwards elimination

Best subsets regression - considers all potential models

14

## What are the potential pitfalls of multiple regression?

###
overfitting - large number of variables and small sample size

Typically need sample size to be at least 10 times larger than the number of variables considered

The final number of variables in the model should not be more than the square root of the sample size

Collinearity - when explanatory variables are highly correlated to each other. usually best to remove one before the regression

15

## Describe logistic regression

###
A logistic regression model is appropriate when we are interested in modelling a binary response or dependent variable

This variable can be related to one or more risk factors or covariates that may either be categorical or continuous

16