Exam 8 TIA Flashcards

Question

2 diagnostic test measures

Answer 1

Log-likelihood and Scaled Deviance (Adding more variables to a model always increases llmodel and reduces D∗ since there is more freedom to fit the data)

Answer 2

Same dataset and same distribution

Answer 3

* The influence of an individual record on the model can be measured using the Cook’s distance, which can be calculated by most GLM software. Records with the highest Cook’s distance should be given additional scrutiny as to whether they should be included in the dataset or not. * Cross-validation can be used to assess model stability by comparing in-sample parameter estimates across different model runs. * Bootstrapping can be used to create new datasets with the same number of records by randomly sampling with replacement from the original dataset. The model can then be refit on many different datasets and we can get statistics like the mean and variance for each parameter estimate

Answer 4

Models may be proprietary, selection may be business-driven.

Answer 5

* Use holdout data (to prevent overfit) * It can help to aggregate data before plotting if the dataset is very large (e.g., into 100 buckets based on percentiles of predicted values) * Taking the log of all values before graphing prevents large values from skewing the picture

Answer 6

1. Predictive accuracy: Difference between actual and predicted in each quantile. 2. Monotonicity: The actual pure premium should consistently increase across quantiles. 3. Vertical distance of actual loss cost between first and last quantiles: This indicates how well the model distinguishes between the best and worst risks.

Answer 7

Sensitivity = True positives / Total event occurrences Specificity = True negatives / Total event non-occurrences False positive rate = 1 - Specificit

Answer 8

Coverage related variables (such as deductibles or limits) in GLMs can give counterintuitive results, such as indicating a lower rate for more coverage. This could be due to correlations with other variables outside of the model, including possible selection effects (e.g., insureds self-selecting to higher limits since they know they are higher risk, underwriters forcing high risk insureds to have higher deductibles). Charging rates for coverage options that reflect anything other than pure loss elimination could lead to changes in insured behavior, which means the indicated rates based on past experience would no longer be expected to be appropriate for new policies. As such, rates for coverage options should be estimated outside of the GLM first and included in the GLM as offset terms.Avoid counterintuitive results due to selection/correlation; include as offset.

Answer 9

Territories are challenging in GLMs since there may be a very large number of territories, and aggregating them into a smaller number of groups may cause you to lose important information. Techniques like spatial smoothing can be used to price territories, and then territorial rates can be included in the GLM with the offset terms. However, the territory model should also be offset for the rest of the classification plan, so the process should be iterative until each model converges to an acceptable degree.

Answer 10

Different models will over-predict and under-predict for different segments of the book, but using an average of multiple models helps balance these predictions out for those segments. However, this really only works when the model errors are as uncorrelated as possible, which generally happens when models are built separately by different people with little or no sharing of information.

Answer 11

This can result in an unstable model with erratic coefficients that have high standard errors. Two options for dealing with very high correlation include: 1. Removing all highly correlated variables except one. This eliminates the high correlation in the model, but it also potentially loses some unique information contained in the eliminated variables. 2. Use dimensionality-reduction techniques such as principal components analysis or factor analysis to create a new subset of variables from the correlated variables, and use this subset of variables in the GLM. The downside is the additional time required to do this extra analysis

Exam 8 TIA Flashcards

(35 cards)