Chapter 18: GLM Flashcards

1
Q

List assumptions of the classical linear model

A

response variable modelled as a linear combination of explanatory variables
error terms have normal distribution
error terms have constant variance
error terms are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe drawbacksof the classical linear model

A

model assumes a normal distribution which has constant variance, may not be appropriate
adds together the effects of different explanatory variables, but this is often not reality
may become long-winded with more than 2 explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When doesintrinsic and extrinsic aliasing occur?

A

Intrinsic aliasing occurs:
because of dependencies inherent in the definition of the explanatory variables
this is dealt with by modelling software

Extrinsic aliasing occurs:
when two or more explanatory variables contain levels that are perfectly correlated
“Near aliasing” occurs when this correlation is almost, but not quite perfect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use a GLM over one-way analysis?

A

one-way analysis ignores correlation and interaction effects:

for example, the effect of smoker status on claim amount amount may be higher for males than females
for example, the effect of smoker status on claim amount amount may be higher for older ages compared to younger ages
as a result, the one-way analysis may underestimate the effect of smoker status on claim amount when considering older ages

glm appropriately accounts for correlations and interactions:

by simultaneously modelling the effects of explanatory variables on the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why use a GLM over classical linear model?

A

model not limited normal distribution:
can take on any distribution from the exponential family, for example poisson/ gamma

model not limited to the additive effects of explanatory variables:
can model the multiplicative effects of explanatory variables through use of a link function (transforming them to linearity)

variance of the response variable is a function of its mean and can often increase with the value of its mean:
for example poisson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define thetotal devianceand thescaled deviance
total deviance

A

total deviance:

deviance is a measure of the distance bweteen the observed value (Y_i) to the fitted value (u_i)
with allowance for weights w_i - with higher importance assigned to errors where the variance should be small
the sum of each observation’s contribution to the deviance (d(Y_i,u_i)) is the total deviance for a model
D, total deviance = SUM (from i to n) of d(Y_i,u_i)

scaled deviance:

total deviance adjusted by the scale parameter phi
D*, scaled deviance = D/phi
thisstandardises the deviance so that it can be used when comparing different models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List 3 goodness of fit tests

A

chi-squared statistic:
used when comparing nested models and where the scale parameter is known
test statistic = (D_1)* - (D_2)*
which has thechi-squared distribution with degrees of freedom = df_1 - df_2
degrees of freedom is the number of observations less number of parameters

F-statistic:
used when comparing nested models and where the scale parameter is unknown
test statistic = [D_1 - D_2] / [(df_1 - df_2)*(D_2/df_2)]
which has theF-distribution with degrees of freedom = df_1 - df_2 ; df_2

AIC:
can be used when models are not necessarily nested
AIC = -2 * log-likelihood +2 * number of parameters
lower the AIC, the better the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain 5 ways we can test for appropriateness of models

A

hat matrix:
H such that y_hat = H * y
The diagonal entries are called leverages that measure the influence the observed value has on their respective fitted value

deviance residuals:
which measures the distance between the observed and fitted values. Any large deviations may indicate that distributional assumptions are being violated

standardised pearson residuals:
which measures the distance between the observed value and fitted value, adjusted for the leverage from the observed value and variance of the fitted value

Cook’s distance:
alternative to the diagonal entries of the hat matrix where Cook’s distance > 1 may be cause for concern

residual plot:
where residuals are plotted against the fitted values. Residuals should be symmetrical about the x-axis and should have an average residual of zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List examples of GLMs

A

gamma model:
may be a good model for claim amounts, log link

poisson model:
may be a good model for claim frequency, log link

logistic regression model:
may be a good model for binary outcome, logit link
consider odds ratios with p-value to assess significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can a GLM be used to model?

A

cost plpm - cost per life per month

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List key items to mention when suggesting a model

A

specify explanatory variables, response variable
specify model, link function
consider interactions
consider the significance of coefficients (p-value, 95% ci)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Outline the advantages of the Tweedie distribution for modelling PMI claims`

A

The Tweedie distribution is a special member of the exponential family

that has a
point mass (large spike) at zero

and corresponds to the compound distribution of a Poisson claim number process and a gamma claim size distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

List two properties of the exponential family of distributions

A

Distribution completely specified in terms of its mean and variance

Variance of the response is a function of its mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly