1. Variable modeled elsewhere (territory model, deductible, limits) 2. Want to change only some variables in rating plan 3. Target variable varies directly with exposure (e.g. modeling claim counts)

GLM Flashcards by Unknown Unknown

Model Specifications

Target Variable
Predictors
Link Function
Error Distribution
Weights

How well did you know this?

Not at all

Perfectly

Why log continuous variables when using log link?

Otherwise positive coefficients will result in exponential effect on target

How well did you know this?

Not at all

Perfectly

When to use weights

If rows in dataset represents an average for multiple data points

How well did you know this?

Not at all

Perfectly

When to use offsets

Variable modeled elsewhere (territory model, deductible, limits)
Want to change only some variables in rating plan
Target variable varies directly with exposure (e.g. modeling claim counts)

How well did you know this?

Not at all

Perfectly

Correlated Vars (How Identify and Adjust)

Identify with two-way correlation table

Can adjust with principal component analysis

How well did you know this?

Not at all

Perfectly

Multicollinearity

When 2 or more predictors strongly predictive or a third

Two-way tables may not show

Identify with Variance Inflation Factor

How well did you know this?

Not at all

Perfectly

Aliasing

Perfect correlation
Must remove one

How well did you know this?

Not at all

Perfectly

Effects of highly correlated variables

Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge

How well did you know this?

Not at all

Perfectly

GLM Limitations

Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs

Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)

How well did you know this?

Not at all

Perfectly

Freq/Sev Models vs PP

More stable

PP can overfit if variable only affects one or other

Tweedie dist assumes freq/sev move in same direction

How well did you know this?

Not at all

Perfectly

Target Variable Considerations

Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium

How well did you know this?

Not at all

Perfectly

Predictor Selection Criteria

Significance
Cost of collecting
IT constraints
Regulatory requirements

How well did you know this?

Not at all

Perfectly

Partial Residual Plots

Detect non-linearity

How well did you know this?

Not at all

Perfectly

Correcting Non-Linearity

Binning
Polynomial terms
Piecewise
Splines

How well did you know this?

Not at all

Perfectly

Loglikelihood

How well model explains data

Con: Requires identical dataset

How well did you know this?

Not at all

Perfectly

Deviance

Study These Flashcards

How far model is from saturated model

Want unscaled to compare different models

Cons:
-Need to assume same error dist. in models
-Always decreases when add more params so overfits

F-Test

Study These Flashcards

Only for nested models

AIC and BIC

Study These Flashcards

Can compare any models

BIC over-penalizes large datasets

Working Residual Plots

Study These Flashcards

vs Linear Predictor (look for systemic over/under prediction)

vs One Predictor (look for non-linearity)

vs Weight (look for homoscedasticity)

3 Ways to Assess Model Stability

Study These Flashcards

Cook’s Distance (influential records)
Cross-Validation (consistent param estimates - not good for insurance when more manual intervention needed)
Bootstrapping (consistent param estimates)

Quantile Plot

Study These Flashcards

How well model differentiates between best and worst risks

Quintiles by prediction. Then plot actual vs predicted

Want:
Actual close to predicted
Actual increasing monotonically
Large lift from actual endpoints

Double Lift Chart

Study These Flashcards

Compares two models directly

Quintiles by A/B. Then plot actual, A, B

Want model that’s closer to actual

Harder to interpret since compares where models disagree most

Loss Ratio Chart

Study These Flashcards

Whether model better at segmenting than current

Sort by Predicted LR. Plot actual LR

Want more spread

Only tells if good at segmenting not predicting

Gini Index and Lorenz Curve

Study These Flashcards

Plot cuml % of EE vs cuml % of Loss

Lorenze curve formed by points

Gini Index = 2 * area between curve and y=x

ROC Curve

Used for logistic models 1-Specificity vs Sensitivity AUROC higher better (no predictive power is y=x or .5)

Specificity

True Neg/Total Negative

Sensitivity

True Pos/Total Positive

Why shouldn't model ILFs or Deds

Policy Options chosen by insured May give counterintuitive results Never charge more for less coverage Correlation with results but not causation

Ensembling

Averaging models together Improves performance when errors uncorrelated

GLM Shortcomings

1. Predictions must be based on linear function of predictors 2. Instability if data thin or highly correlated vars 3. Full credibility for each predictor coefficient 4. Assumes randomness uncorrelated 5. Dispersion param must be constant

GLMM

Allows credibility in coeff estimates Fixed and random effects

DGLMs

Allows different dispersion parameters

GAMs

Allows non-linearity without manual intervention

MARS Models

Allows non-linearity without manual intervention

Elastic Net GLMs

Allows credibility in coeff estimates Automatic variable selection

Why center continuous variables

Intercept more intuitive since meaningful base case Makes signs of coefficients more intuitive

Working Residual Advantages

Retain properties after binning (no pattern and homoscedastic)

GLM Flashcards

(37 cards)