GLM Flashcards

1
Q

Describe the 2 components of a GLM

A
  1. Random component
    Captures portion of variation driven by causes other than predictors in model (including pure randomness)
  2. Systematic component
    Portion of variation in a model that can be explained using predictor variables

Our goal in modelling with GLM is to shift as much of the variability as possible away from random component into systematic component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Identify the variation function for the following distributions:
1. Normal
2. Poisson
3. Gamma
4. Inv Gaussian
5. NB
6. Binomial
7. Tweedie

A
  1. 1
  2. u
  3. u^2
  4. u^3
  5. u(1+ku)
  6. u(1-u)
  7. u^p
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the 2 most common link functions

A
  1. Log: g(u) = ln(u)
    Used for rating
  2. Logit: g(u) = ln(u/(1-u))
    Used for binary target (0,1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List 3 advantages of log link function

A

Log link function transforms linear predictor in a multiplicative structure: u = exp(b0 + b1x1 +…)

Which has 3 advantages:
1. Simple and practical to implement
2. Avoids negative premiums that could arise if additive structure
3. Impact of risk characteristics is more intuitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List 2 uses of offset terms

A

Allows you to incorporate pre-determined values for certain variables into model so GLM takes them as given.

2 uses:
1. Deductible factors are often developed outside of model
ln(u) = b0 + b1x1 +…+ ln(Rel(1-LER))

  1. Dependent variables varies directly on particular measure (e.g. exposures)
    ln(u) = b0 + b1x1 +…+ ln(1 car year)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the steps to calculate offset

A
  1. Calculate unbiased factor 1 - LER
  2. Rebase factor: Rel = Factor(i) / Factor(base)
  3. Offset = g(rebased factor)
  4. Include fixed offsets before running GLM st all estimated coefficients for other predictors are optimal in their presence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe 3 methods to assess variable significance

A
  1. Standard error
    Estimated std dev of random process
    Small value indicates estimate is expected to be relatively close to true value.
    Large range indicates that wide range of estimates an be achieved through randomness.
  2. P-value
    Probability of at least the value arising by pure chance.
    H0: Beta(i) = 0
    H1: Beta(i) different than 0
    Small value indicates we have a small chance of observing coeff randomly.
  3. Confidence interval
    Gives a range of possible values for a coefficient that would not be rejected at a given p-threshold
    95% CI would be based on a 5% p-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe 2 distributions appropriate for severity modelling and their 5 desired characteristics

A

Gamma and Inverse Gaussian
1. Right-skewed
2. Lower bound at 0
3. Sharp peaked (inv gauss > gamma)
4. Wide tail (inv gauss > gamma)
5. Larger claims have more variance (u^2, u^3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe 2 distributions appropriate for frequency modeling

A
  1. Poisson
    Dispersion parameter adds flexibility (allows var > mean)
    Poisson and ODP will produce same coefficients but model diagnostics will change (var understated = distorted std error and p-value)
  2. Negative Binomial
    Poisson with mean follows gamma
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe 2 characteristics that should have frequency error distribution

A
  1. Non-negative
  2. Multiplicative relationship fits frequency better than additive relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe which distribution is appropriate for pure premium / LR modeling and gives 3 reasons/desired characteristics

A

Tweedie:
1. Mass point at zero (lots of insured have no claims)
2. Right-skewed
3. Power parameter allows some other distributions to be special cases (p=0 if normal, p=1 if poisson, p=2 if Gamma, p=3 if Inv Gauss)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens where power parameter of tweedie between 1 and 2

A

Compound poisson freq & gamma sev

Smoother curve with no apparent spike

Implicit assumption that freq & sev move in same direction (often not realistic but robust enough)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate mean of Tweedie

A

lambda * alpha * theta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculate power parameter of Tweedie

A

p = (a+2) / (a+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculate dispersion parameter of Tweedie

A

lambda^(1-p) * (a*theta)^(2-p) / (2-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Identify 3 ways to determine p parameter of Tweedie

A
  1. Using model-fitting software (can slow down model)
  2. Optimization of metric (e.g. log-likelihood)
  3. Judgmental selection (often practical choice as p tends to have small impact on model estimates)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe which distribution is appropriate for probability modeling

A

Binomial

Use mean as modelled prob of event occurring

Use logic function:
u = 1/(1+exp(-x))

Odds = u/(1-u)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

It is good practice to log continuous variables before using in model.

Explain why and give 2 exceptions.

A

Forces alignment of predictors scale to that of entity they are predicting. Allows flexibility in fitting different curve shapes.

2 exceptions:
1. Using a year variable to pick up trend effects
2. If variable contains values of (since ln(1) undefined)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why do we prefer choosing level with most observations as base level

A

Otherwise, there will be wide CI around coefficients estimates (although same predicted relativities)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discuss how high correlation between 2 predictor variables can impact GLM

A

Main benefit of GLM over univariate analysis is being able to handle exposure correlation.

However, GLM run into problems when predictor variables are very highly correlated. This can result in unstable model, erratic coefficients and high standard errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe 2 options to deal with very high correlation in GLM

A
  1. Remove all highly correlated variables except one
    This eliminates high correlation in model, but also potentially loses some unique info contained in eliminated variables.
  2. Use dimensionality-reduction techniques such as components analysis or factor analysis to create a subset of variables from correlated variables and use subset in GLM.
    Downside is the additional time required to do that extra analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe multicollinearity, its potential impacts and how to detect

A

Occurs when there is a near-perfect linear dependency among 3 or more predictor variables.

When exists, the model may become unstable with erratic coefficients and may not converge to a solution

One way to detect is to use variation inflation factor (VIF) which measures impact on square error of a predictor due to presence of collinearity with other predictors.

VIF of 10 or more is considered high and would indicate to look into collinearity structure to determine how to best adjust model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe aliasing

A

Aliasing occurs when there is a perfect linear dependency among predictor variables (ex: when missing data are excluded)

The GLM will not converge (no unique solution) or if it does, coefficients will make no sense.

Most GLM will detect and automatically remove one of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Identify 2 important limitations of GLM

A
  1. Give full credibility to data
    Estimated coefficients are not cred-wtd to recognize low volumes of data or high volatility.
  2. Assume randomness of outcomes are uncorrelated
    This is an issue in 2 cases:
    a. Using dataset with several renewals of same policy since likely to have correlated outcomes
    b. when data can be affected by weather: likely to cause similar outcomes to risks in same area
    Some extensions of GLM (GLMM or GEE) can help account for such correlation in data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

List the 9 steps to build a model

Hint: Obeying Simple Directions Elicits Fully Outstanding Very Powerful Model Results

A
  1. setting goals and Objectives
  2. communicate with key Stakeholders
  3. collect & process Data
  4. conduct Explanatory data analysis
  5. specify model Form
  6. evaluate model Output
  7. Validate model
  8. translate model results into Product
  9. Maintain & Rebuild model
26
Q

Discuss 2 considerations/potential issues in matching policy and claims

A
  1. Matching claims to specific vehicles/drivers or coverages
  2. Are there timing differences between datasets? How often is each updated? Timing diff can cause record matching problems
  3. Is there a unique key to merge data (ex: policy number). Potential for orphaned claims or duplicating claims if multiple policy records.
  4. Level of aggregation before merging, time dimension (CY vs PY), policy level vs claimant level, location level or per risk level
27
Q

Discuss 2 considerations in modifying (cleaning) data prior to modeling

A
  1. check for duplicate records and remove them prior to aggregation
  2. check categorical fields against documentation (new codes, errors)
  3. check reasonability of numerical fields (negative premium, outliers)
  4. Decide how to handle errors and missing values (discard or replace with average values)
  5. Convert continuous variables to categorical (bining)
28
Q

Discuss possible data adjustments prior to modeling

A
  1. Cap large losses and remove cats
  2. Develop losses
  3. On-level premiums
  4. Trend exposures and losses
  5. Use time variable in model to control these effects (not as good as other adjustments), e.g.: group ages by range
29
Q

Why don’t we train and test on same dataset

A

Would be inappropriate since would give biased results of model performance

More variables will always cause model to fit training data better (overfitting) but may not fit other datasets better since begins assuming random noise in data is part of systematic component

We want to pick as much signal as possible with minimal noise

30
Q

Describe 3 model testing strategies

A
  1. Train & test
    Split data into 1 training set and 1 testing set (usually 60/40 or 70/30)
    Can split randomly or on time basis
    Adv of time: weather events not in both datasets so results are not overly optimistic
  2. Train, validate & test
    Split data into 3 sets. Validation set can be used to refine model and tweak before test set. (40/30/30)
  3. Cross validation
    Most common is k-fold.
    Pick number k and split data into k groups
    For each fold, train model using k-1 folds and test model using kth fold.
    Tend to be superior since more data used in both training and testing but extremely time-consuming
31
Q

Identify 4 advantages of modeling freq and sev separately over pure premium

A
  1. Gain more insight and intuition about impact of each predictor variable
  2. Each is more stable (variable that only impacts freq will look less significant in pure premium model)
  3. PP can lead to overfitting if predictor variables only impact freq or sev but not both since randomness of other may be considered signal effect
  4. Tweedie distribution assumed both freq and sev move in same direction which may not be true
32
Q

Identify 2 disadvantages of modeling freq and sev separately

A
  1. Requires data to be available
  2. Takes more time to build 2 models
33
Q

Identify 4 considerations in variable selection

A
  1. Significance: we want to be confident effect of var is result of true relationship between predictor and target and not due to noise in data
  2. Cost-effectiveness of collecting data for variable
  3. Conformity with actuarial standards of practice and regulatory requirements
  4. Whether the quotation system can include the variable
34
Q

How can we calculate partial residuals

A

ri = (yi - ui)g’(ui) + beta(j)xij

if log link:
(yi - ui)/ui + beta(j)*xij

Then they can be plotted against xj and line y = beta(j)*xj can be drawn to see how well line matches residual points

35
Q

If systematic deviation of residuals from line is observed, what do we do?

A

We will want to transform the predictor variable in one of 4 ways:

  1. Bining variables: turning into categorical variable with separate bins
    Removes need to constrain y to any particular shape, but increases df which can result in inconsistent or impractical patterns
  2. Add polynomial terms (ex x^2, x^3)
    Loss of interpretability without graph and can behave erratically at edges of data.
  3. Add piecewise linear function (ex: hinge function)
    Allows fit of wide range of non-linear patterns and easy to interpret, but must manually chose breakpoints and increase df.
  4. Add natural cubic splines
    Combines piecewise with polynomials which better fits data and smooth curve, but need graph to interpret.
36
Q

When do we want to use interaction variables

A

When there is a response correlation between predictor variables (ex: when gender affected losses below a certain age)

37
Q

Identify 3 advantages of centering at base level

A
  1. Other coefficients are easier to interpret, particularly true if interaction terms exist
  2. Intercept becomes more intuitive as avg frequency at base level
  3. Avoids counter-intuitive signs of coefficients when interaction
  4. Lower p-values for variable significance (tighter CI)
38
Q

Describe 2 measures used in diagnostic tests for overall model fit

A
  1. Log-likelihood
    Log of product of likelihood for all observations (sum of log-likelihood)
  2. Scaled deviance
    D* = 2(llsaturated - llmodel)
    Unscaled D = dispersion parameter x D

    GLMs are fit by maximizing ll so D is minimized
39
Q

Describe the 2 conditions for validity of ll & D comparisons

A
  1. Same dataset is used to fit the 2 models
  2. Assumed distribution and dispersion param same for the 2 models
40
Q

Describe 2 options to compare candidate models using ll & D

A
  1. F-Test
    F = (Ds - Db) / n*phi
    Ds is unscaled dev of smaller model
    Db is unscaled dev of bigger model
    n is number of added parameters
    F(test) > F(table, dfnum = n, dfdenom = n-pb) means we prefer bigger model

For non-nested models, use penalized measures of fit:
2. AIC = -2ll + 2p
3. BIC = -2
ll + p*ln(n)
Lower is better

41
Q

Describe 3 measures of deviation of actual from predicted

A
  1. Raw residual (ex: yi - ui)
  2. Deviance residuals
    (2philn(f(yi given yi=ui)) - ln(f(yi given ui=ui))^0.5
    Take negative if yi < ui
    Residual adjusted for shape of GLM
    Should follow normal distribution with no predictable pattern
    homoscedasticity: normally distributed with constant variance
  3. Working residuals
    wri = (yi - ui)*g’(ui)
    if log link wri = (yi - ui)/ui
    if logic wri = (yi - ui) / ui(1-ui)
    Critical to bin residuals before analysis
42
Q

Identify 3 options to measure model stability

A
  1. Cook’s distance
    Higher indicates higher level of influence.
    Records with highest value should be given additional scrutiny as to whether they should be included.
  2. Cross-validation
    Comare in-sample parameter estimates across model runs. Model should produce similar results when run on separate subsets of initial dataset.
  3. Bootstrapping
    Create a new dataset with same number of records by randomly sampling with replacement from original dataset.
    Model can then be refit on different datasets and can get statistics like and mean var for each parameter estimate.
43
Q

State 2 reasons why model refinement techniques may not be appropriate for model selection

A
  1. Some of the models may be proprietary
    Info on data & detailed form need to be available to evaluate model
  2. Final decision is often business call
    Those deciding may know nothing about predictive modeling and actuarial science
44
Q

Briefly explain scoring

A

Scoring is the process of using models to make predictions from individual records
It can be used in model selection
Should always score on holdout sample
Then we can use techniques for model selection

45
Q

Describe 2 techniques for model selection

A
  1. Plot actual vs predicted
    The closer the points are to the line y=x, the better the prediction
  2. Lift-based measures:
    Model lift = economic value of model/ability to prevent adverse selection
    Attempts to visually demonstrate model’s ability to charge each insured an actuarially sound rate
    Requires 2 or more competing models
46
Q

List 4 lift-based measures for model selection

A
  1. Simple Quintile Plots
  2. Double Lift Charts
  3. Loss Ratio Charts
  4. Gini Index
47
Q

Briefly explain the Simple Quintile Plots measure for model selection.

A

For each model:
a. Sort holdout dataset based on model’s predicted loss cost
b. Bucket data into quantiles having equal exposures
c. Calculate average predicted loss cost & avg actual loss cost for each bucket and plot them on graph.

Winning model should be based on 3 criteria:
1. Predictive accuracy
More consistently closer to overall average predicted loss cost
2. Monotonicity
Actual PP should consistently increase across all quantiles
3. Vertical distance of actuals between 1st and last quantile
Indicates how well the model distinguish between best and worst risks.

48
Q

Briefly explain the double lift chart measure for model selection

A

Compares 2 models on same graph

Winning model is one that best matches actual in each quantile

49
Q

Briefly explain the Loss Ratio Charts measure for model selection

A

Generally easier to understand since LR is the most commonly-used metric in insurance profitability

The greater the vertical distance between lowest and highest LRs, the greater the model does at identifying further segmentation opportunities not present in current plan.

We want LRs not equal and increasing monotonically.

50
Q

Briefly explain the Gini Index measure for model selection

A

Quantify ability to identify best and worst risks

Gini index = 2 * area between 2 curves

higher value = better

51
Q

How will changing discrimination threshold will impact amount of TP, FP, FN and TN in logistic model

A

Decreasing discrimination threshold = more true positives and more false positives since more will be investigated

52
Q

Define sensitivity

A

ratio of true positives to total event occurrences

also called true positive rate or hit rate

53
Q

Define specificity

A

ratio of true negatives to total event non-occurrences

false positive rate = 1 - specificity

54
Q

Describe ROC curve

A

All possible combinations of sensitivity and 1-specificity for different discrimination thresholds

Help determine a target threshold (lower for large risks since we want to spend more time investigating)

AUROC is area under ROC curve
The higher the AUROC, the better

55
Q

List 3 purposes of model documentation

A
  1. Opportunity to check your work for errors and improve communication skills
  2. Transfer knowledge to others that maintain or rebuild model
  3. Comply with stakeholder demands (ASOP41)
56
Q

Identify 4 items to include in model documentation

A
  1. Everything needed to reproduce model (from source data to model output)
  2. All assumptions and justifications for all decisions
  3. All data issues encountered and resolution
  4. Any reliance on external models or external stakeholders
  5. Model performance, structure and shortcomings
  6. Compliance with ASOP41 or local actuarial standards on communication
57
Q

Why should coverage-related variables be priced outside of GLM and included in offset terms

A

Examples: deductibles, limits, covered perils

Can give counter-intuitive results in GLM such as indicating lower rate for more coverage.

Could be due to correlation with other variables outside of model, including possible selection effects (insured self-selecting higher limits since they know they are higher risks)

58
Q

Describe how territories can be priced in conjunction with GLMs

A

Challenging due to their large number and aggregating them may cause you to lose important information.

Techniques like spatial smoothing can be included in GLM as offset terms.

Territory model should also be offset for rest of classification plan = iterative process until each model converges to acceptable range

59
Q

Discuss how ensemble models can improve performance of single model

A

Instead of choosing a single model from 2 or more, models can be combined into an ensemble of models (ex: avg of predictions = balances predictions)

Only works when model errors are as uncorrelated as possible, which happens when built by different people with little or no sharing.

60
Q

Define intrinsic aliasing

A

When there are one covariate for every level of each variable

61
Q

Provide 2 arguments against the inclusion of deductible as predictor in GLM analysis.

A
  1. Coverage variables in GLMs can given counter-intuitive results, such as indicating lower rate for more coverage.
  2. Charging rates for coverage options that reflect anything other than pure loss elimination could lead to changes in insured behaviour, which means indicated rates based on past experience will no longer be appropriate for new policies.
62
Q

A variable with second-order polynomial adds how many degrees of freedom?

A

2