Exam 8 TIA Flashcards
(35 cards)
Considerations for which risk characteristics to use
- Relationship of Risk Characteristics & Expected Outcomes
- Causality
- Objectivity
- Practicality
- Applicable Law
- Industry Practices
- Business Practices
Considerations in establishing risk classes
- Intended Use
- Actuarial Considerations - Homogeneity, Credibility, and practicality
- Other Considerations - Law, industry, and business practices
- Reasonableness of results
Advantages of multiplicative rating plans
Simple and practical, guarantees positive premiums, intuitive impact of risk characteristics.
Choices for Severity distributions
Gamma and Inverse Gaussian are common. Gamma is most used; Inverse Gaussian suits more skewed distributions.
Choices for Frequency distributions
Poisson (possibly overdispersed with φ ≠ 1), or Negative Binomial (Poisson with Gamma mixing, uses κ).
Degrees of freedom
Number of parameters to be estimated.
GLM outputs for each predicted coefficient
- Standard error
- p-value: an estimated probability that the absolute value of a particular β is at least that different from 0 by pure chance
- Confidence interval
Impact of more observations and φ=dispersion parameter on p-values
p-values (and standard errors and confidence intervals) will
be smaller with larger datasets that have more observations.
They will also be smaller with smaller values of φ.
Define multicollinearity and give a way to detect it
Linear dependency among 3+ predictors. Detected by VIF ≥ 10.
Define aliasing and how GLM software deals with it
Perfect linear dependency. Software drops one variable.
2 limitations of GLMs
- GLMs give full credibility: The estimated coefficients are
not credibility-weighted to recognize low volumes of data
or high volatility. This concern can be partially addressed
by looking at p-values or standard errors. - GLMs assume that the randomness of outcomes are
uncorrelated: Two examples of violations of this are:
* Using a dataset with several renewals of the same policy,
since the same insured over different renewals is likely to
have correlated outcomes.
* When the data can be affected by weather, the same
weather events are likely to cause similar outcomes to
risks in the same areas
Steps of model-building process
Components of model-building process
1. Setting goals and objectives
2. Communication with key stakeholders
3. Collecting and processing the data
4. Conducting exploratory data analysis
5. Specifying the form of the model
6. Evaluating the model output
7. Validating the model
8. Translating the model results into a product
9. Maintaining and rebuilding the model
Considerations in merging policy and claim data
Match claims to vehicles/coverages, address timing differences, ensure unique keys, consider aggregation level (PY vs CY).
Considerations in Modifying the Data
- Check for duplicate records and remove them
- Check categorical field values against documentation (i.e.,
are there code values not in the documentation, and are
these new codes or errors?) - Check reasonability of numerical fields (e.g., negative
premiums, significant outliers) - Decide how to handle errors and missing values (e.g., how
much time to investigate, anything systematic about these
records such as a specific location, maybe discard these
records or replace the bad values with average values or an
error flag) - Convert continuous variables into categorical (called
binning)? Group levels in categorical variables? Combine
or separate variables?
Other possible data adjustments before modeling
- Capping large losses
- Removing cats or giving them less weight
- Developing losses
- On-leveling premiums for LR models
- Trending exposures and losses
Purpose of using a separate dataset for testing
Avoid overfitting by testing model on different data to measure true predictive performance.
List 3 Model Testing Strategies
- Train/test split
- Train/validate/test
- Cross-validation (e.g., k-fold)
Steps for k-fold cross-validation
- Split data into k folds
- Train on k−1 folds, test on 1
- Repeat
Combine frequency and severity into pure premium
Multiply relativities (if both log-linked), or add linear predictors.
2 disadvantages of modeling freq/sev separately
Takes more time, requires detailed data.
Advantages of modeling freq/sev separately
- Gaining more insight and intuition about the impact of each
predictor variable. - Each of frequency and severity separately is more stable
(e.g., a variable that only impacts frequency will look less
significant in a pure premium model). - Pure premium modeling can lead to overfitting if a
predictor variable only impacts frequency or severity but
not both, since the randomness of the other component may
be considered a signal effect. - The Tweedie distribution in a pure premium model
assumes both frequency and severity move in the same
direction, but this may not be true
Steps to combine separate models by peril
- Model each peril
- Aggregate expected losses
- Build model on all-peril loss costs
Criteria for variable inclusion
Statistical significance, data cost, legal/applicability, system constraints.
Transformations after partial residual graph
- Binning the variable: i.e., turning it into a categorical
variable with separate “bins”. Downsides include that this
increases the degrees of freedom of the model, it can result
in inconsistent and/or impractical patterns, and variation
within bins is ignored. - Adding polynomial terms: i.e., x2j , x3j , etc. Drawback is loss
of interpretability without a graph. - Add piecewise linear functions: Add hinge functions
max(0,xj − c) at each break point c. Drawback is break
points must be manually chosen.