GLM Flashcards

(37 cards)

1
Q

Model Specifications

A

Target Variable
Predictors
Link Function
Error Distribution
Weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why log continuous variables when using log link?

A

Otherwise positive coefficients will result in exponential effect on target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When to use weights

A

If rows in dataset represents an average for multiple data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When to use offsets

A
  1. Variable modeled elsewhere (territory model, deductible, limits)
  2. Want to change only some variables in rating plan
  3. Target variable varies directly with exposure (e.g. modeling claim counts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlated Vars (How Identify and Adjust)

A

Identify with two-way correlation table

Can adjust with principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multicollinearity

A

When 2 or more predictors strongly predictive or a third

Two-way tables may not show

Identify with Variance Inflation Factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aliasing

A

Perfect correlation
Must remove one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Effects of highly correlated variables

A

Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

GLM Limitations

A

Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs

Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Freq/Sev Models vs PP

A

More stable

PP can overfit if variable only affects one or other

Tweedie dist assumes freq/sev move in same direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Target Variable Considerations

A

Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Predictor Selection Criteria

A

Significance
Cost of collecting
IT constraints
Regulatory requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Partial Residual Plots

A

Detect non-linearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correcting Non-Linearity

A
  1. Binning
  2. Polynomial terms
  3. Piecewise
  4. Splines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Loglikelihood

A

How well model explains data

Con: Requires identical dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Deviance

A

How far model is from saturated model

Want unscaled to compare different models

Cons:
-Need to assume same error dist. in models
-Always decreases when add more params so overfits

17
Q

F-Test

A

Only for nested models

18
Q

AIC and BIC

A

Can compare any models

BIC over-penalizes large datasets

19
Q

Working Residual Plots

A

vs Linear Predictor (look for systemic over/under prediction)

vs One Predictor (look for non-linearity)

vs Weight (look for homoscedasticity)

20
Q

3 Ways to Assess Model Stability

A
  1. Cook’s Distance (influential records)
  2. Cross-Validation (consistent param estimates - not good for insurance when more manual intervention needed)
  3. Bootstrapping (consistent param estimates)
21
Q

Quantile Plot

A

How well model differentiates between best and worst risks

Quintiles by prediction. Then plot actual vs predicted

Want:
Actual close to predicted
Actual increasing monotonically
Large lift from actual endpoints

22
Q

Double Lift Chart

A

Compares two models directly

Quintiles by A/B. Then plot actual, A, B

Want model that’s closer to actual

Harder to interpret since compares where models disagree most

23
Q

Loss Ratio Chart

A

Whether model better at segmenting than current

Sort by Predicted LR. Plot actual LR

Want more spread

Only tells if good at segmenting not predicting

24
Q

Gini Index and Lorenz Curve

A

Plot cuml % of EE vs cuml % of Loss

Lorenze curve formed by points

Gini Index = 2 * area between curve and y=x

25
ROC Curve
Used for logistic models 1-Specificity vs Sensitivity AUROC higher better (no predictive power is y=x or .5)
26
Specificity
True Neg/Total Negative
27
Sensitivity
True Pos/Total Positive
28
Why shouldn't model ILFs or Deds
Policy Options chosen by insured May give counterintuitive results Never charge more for less coverage Correlation with results but not causation
29
Ensembling
Averaging models together Improves performance when errors uncorrelated
30
GLM Shortcomings
1. Predictions must be based on linear function of predictors 2. Instability if data thin or highly correlated vars 3. Full credibility for each predictor coefficient 4. Assumes randomness uncorrelated 5. Dispersion param must be constant
31
GLMM
Allows credibility in coeff estimates Fixed and random effects
32
DGLMs
Allows different dispersion parameters
33
GAMs
Allows non-linearity without manual intervention
34
MARS Models
Allows non-linearity without manual intervention
35
Elastic Net GLMs
Allows credibility in coeff estimates Automatic variable selection
36
Why center continuous variables
Intercept more intuitive since meaningful base case Makes signs of coefficients more intuitive
37
Working Residual Advantages
Retain properties after binning (no pattern and homoscedastic)