A.2. Generalized Linear Models for Insurance Rating Flashcards

Question

Steps for k-fold cross-validation

Answer 1

1. Pick a number k (e.g., 10) and split data into k groups (called folds). Split can be random or based on time. 2. For each fold, train the model using the other k − 1 folds, and test the model using this kth fold.

Answer 2

When both have log link functions, you can multiply the corresponding relativities together (or you can just add their linear predictors together to get the model in equation form).

Answer 3

It requires the available detailed data and it takes more time to build 2 models.

Answer 4

* Gaining more insight and intuition about the impact of each predictor variable. * Each of frequency and severity separately is more stable (e.g., a variable that only impacts frequency will look less significant in a pure premium model). • Pure premium modeling can lead to overfitting if a predictor variable only impacts frequency or severity but not both, since the randomness of the other component may be considered a signal effect. • The Tweedie distribution in a pure premium model assumes both frequency and severity move in the same direction, but this may not be true.

Answer 5

1. Run each peril model separately to get expected losses from each peril for the same group of exposures. 2. Aggregate the expected losses across all perils for all observations. 3. Run a model using the all-peril loss cost as the target variable and the union of all predictor variables as the predictors. Since this target variable will be more stable, focus on using a dataset that will be more reflective of the future mix of business (e.g., the latest year instead of several years worth of data).

Answer 6

If our intent in building the GLM is just to update the rates for the existing rating algorithm, then we only want to use existing rating variables in our model. ``` Otherwise, the criteria for variable inclusion will include statistical significance (e.g., p values), the cost-effectiveness of collecting data for the variable, actuarial standards of practice and legal requirements, and whether the quotation system can include the variable. ```

Answer 7

ri = (yi − µi)g'(µi) + βjxij With the log link function g(µi) = ln(µi), we have g'(µi) = 1/µi, so the above becomes ri = (yi−µi)/µi + βjxij

Answer 8

• Binning the variable: i.e., turning it into a categorical variable with separate “bins”. Downsides include that this increases the degrees of freedom of the model, it can result in inconsistent and/or impractical patterns, and variation within bins is ignored. • Adding polynomial terms: i.e., xj^2, xj^3, etc. Drawback is loss of interpretability without a graph. • Add piecewise linear functions: Add hinge functions max(0,xj − c) at each break point c. Drawback is break points must be manually chosen.

Answer 9

• Log-likelihood: This is the log of the product of the likelihood for all observations using the model (or equivalently, the sum of the log-likelihood for all observations). For a given dataset, it is bound between the lowest possible log-likelihood of the null model (no predictors) and the highest possible log-likelihood of the saturated model (1 predictor for each observation). • Deviance = 2 × (llsaturated − llmodel) = 2 ×∑i=1 ln f(yi|µi = yi) − ln f(yi|µi = µi) Note that adding more variables to a model always increases log-likelihood and reduces deviance since there is more freedom to fit the data.

Answer 10

Identical datasets Same distribution Same dispersion parameter

Answer 11

F = (Ds−Db)/[ (# of added parameters)×φs] The bigger model is considered better at a given significance level if F is larger than the F distribution table value F (# of added parameters, n−ps)

Answer 12

``` AIC = −2 × ll + 2p BIC = −2 × ll + p ln(n) ``` Smaller values are better

Answer 13

ln f(yi|µi = yi) − ln f(yi|µi = µi) This is changed to be a negative value when yi < µi The deviance residual is the amount that a given observation contributes to the deviance. In a well-fit model, the deviance residuals will follow no predictable pattern and will be normally distributed with constant variance.

Answer 14

• The influence of an individual record on the model can be measured using the Cook’s distance, which can be calculated by most GLM software. Records with the highest Cook’s distance should be given additional scrutiny as to whether they should be included in the dataset or not. * Cross-validation can be used to assess model stability by comparing in-sample parameter estimates across different model runs. * Bootstrapping can be used to create new datasets with the same number of records by randomly sampling with replacement from the original dataset. The model can then be refit on many different datasets and we can get statistics like the mean and variance for each parameter estimate.

Answer 15

1. Some of the models may be proprietary. | 2. The decision on the final model may be a business decision and not a technical one.

Answer 16

* Use holdout data (to prevent overfit) * It can help to aggregate data before plotting if the dataset is very large (e.g., into 100 buckets based on percentiles of predicted values) * Taking the log of all values before graphing prevents large values from skewing the picture

Answer 17

1. Sort the (holdout) dataset based on that model’s predicted loss costs. 2. Bucket the data into quantiles with each quantile having equal exposures. 3. Calculate the average predicted loss cost and average actual loss cost for each bucket and plot them on a graph. For ease of interpretation, it can be helpful to divide both values by the overall average predicted loss cost.

Answer 18

1. Predictive accuracy: Difference between actual and predicted in each quantile. 2. Monotonicity: The actual pure premium should consistently increase across quantiles. 3. Vertical distance of actual loss cost between first and last quantiles: This indicates how well the model distinguishes between the best and worst risks.

Answer 19

1. For each observation, calculate sort ratio = model 1 predicted loss cost / model 2 predicted loss cost. 2. Sort the data by sort ratio in ascending order. 3. Bucket the data into quantiles with equal exposures. 4. Calculate the average predicted loss cost for each model and average actual loss cost for each bucket, divide each by the overall average loss cost from that source, and plot the quantities on a graph.

Answer 20

1. Sort the (holdout) dataset based on that model’s predicted loss costs. 2. Bucket the data into quantiles with each quantile having equal exposures. 3. Calculate the actual loss ratio (based on the current rating plan, not on the model) for each bucket and plot them on a graph.

Answer 21

1. Sort the (holdout) dataset based on that model’s predicted loss costs. 2. Plot a graph with the x-axis being the cumulative percent of exposures and the y-axis being the cumulative percent of actual losses. Gini index = 2 × area between Lorenz curve and line of equality

Answer 22

Sensitivity = True positives / Total event occurrences Specificity = True negatives / Total event non-occurrences False positive rate = 1 - Specificity

Answer 23

Coverage related variables (such as deductibles or limits) in GLMs can give counterintuitive results, such as indicating a lower rate for more coverage. This could be due to correlations with other variables outside of the model, including possible selection effects (e.g., nsureds self-selecting to higher limits since they know they are higher risk, underwriters forcing high risk insureds to have higher deductibles). Charging rates for coverage options that reflect anything other than pure loss elimination could lead to changes in insured behavior, which means the indicated rates based on past experience would no longer be expected to be appropriate for new policies. As such, rates for coverage options should be estimated outside of the GLM first and included in the GLM as offset terms

Answer 24

Territories are challenging in GLMs since there may be a very large number of territories, and aggregating them into a smaller number of groups may cause you to lose important information. Techniques like spatial smoothing can be used to price territories, and then territorial rates can be included in the GLM with the offset terms. However, the territory model should also be offset for the rest of the classification plan, so the process should be iterative until each model converges to an acceptable degree.

Answer 25

Different models will over-predict and under-predict for different segments of the book, but using an average of multiple models helps balance these predictions out for those segments. However, this really only works when the model errors are as uncorrelated as possible, which generally happens when models are built separately by different people with little or no sharing of information.

A.2. Generalized Linear Models for Insurance Rating Flashcards

(49 cards)