Exam 1 Flashcards

(77 cards)

1
Q

Why do we do predictive modeling?

A

So we know what can potentially happen in the future to price our premiums correctly and set aside enough reserves to pay off claims

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Input Variables

A

The predictors of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Output Variables

A

The responses of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ϵ

A

Random error term which is independent of X and has mean zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reducible Error

A

Potentially improve the accuracy of our predicted f by using the most appropriate statistical learning technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Irreducible Error

A

Y is also a function of ϵ, which by definition cannot be predicted using X. No matter how well we estimate f, we cannot reduce the error introduced by ϵ. This error is larger than 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prediction

A

If one is not concerned with the exact form of predicted f, provided that it yields accurate predictions for y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inference

A

If one wants to understand the relationship between x and y (how y changes as a function of x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Relationship between prediction and inference

A

Linear models may allow for simple and interpretable inferences but may not yield as accurate predictions. However, non-linear approached can provide accurate predictions, but comes at the expense of being less interpretable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Flexible Models

A

When a model fits around the data it is using with many function forms, resulting in a model that can be accurately predictable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Overfitting

A

When all the parameters are considered and the model is way to specific to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Underfitting

A

The model is way too simplistic to capture the underlying patterns in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Training Data

A

Observations used to train or teach our method how to estimate f. Apply a statistical learning method to this first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Test Data

A

Use a separate data to see how well our estimate of f will behave with different data (“Holdout Data”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parametric Methods

A

Make an assumption about the functional form or shape of f and utilize a procedure that uses the training data to fit or train the model. This method simplifies the model but will not match the true unknown form of f.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-parametric Methods

A

No explicit assumptions about the functional form of f, so we seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly. This has the potential to accurately fit a wide range of possible shapes for f, but a very large number of observations is required here for accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Supervised Models

A

Each observation of the predictors has an associated response measurement of y. We want to fit a model that relates response to the predictors, with the aim of accurately predicting the response for future observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unsupervised Models

A

For each observation we observe a vector of measurements of x but no associated y response. This is more for understanding relationships between variables or observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Regression

A

Problems with a quantitative response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Classification

A

Problems with a qualitative response or categorial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean Squared Error (MSE)

A

Most commonly used measure to evaluate method performance. Will be small if the predicted responses are close to the true responses. Want to choose the method that gives the lowest test MSE as opposed to lowest training MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

MSE and Overfitting Relationship

A

Our MSE cannot be super small and cannot be larger than the training MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Variance

A

The amount by which f would change if we estimated it using a different training data set. More flexible methods have higher variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Bias

A

Refers to the error that is when a complex model is oversimplified. More flexible methods tend to have lower bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Bias-Variance Trade-Off
Selecting a method that achieves low variance and low bias at the same time. Needs to vary enough to the point where the bias is not being impacted enough
25
How is predictive modeling used in ratemaking?
Actuaries would look at historical data of past claims and certain demographics of policyholders to then adjust to the future time period to showcase industry benchmarks and trends, informed assumptions, and trends and patterns from historical data. Then create a model taking into consideration different factors of a policyholder such as age, smoker class, sex, etc... to predict if a claim needs to be filed or not based on if the policyholder dies or not. If the policyholder has a history of smoking, they have higher risk so their rates might be higher.
26
How is predictive modeling used in reserving?
We would look at historical data of a policyholder's age, risk class, gender, health status, etc... Then create a model that takes these factors into consideration. So if their are a lot of old policyholders with histories of chronic smoking for example, likelihood of deaths might be higher therefore more claims might need to be paid in the future so higher reserves should be set aside
27
How is predictive modeling used in planning?
A life insurance company might have a goal of gaining more customers between the ages of 20-30. So they might create models to asses the conditions or the reasons why people between the ages of 20-30 might not buy life insurance as much. Based on this, they might choose to adjust their existing products or create new ones that can tailor to these conditions and reasons
28
Difference between writing/running code in RStudio's Console vs. Source Pane
In the console pane, code is executed immediately after you type it and it cannot be edited or saved. This pane can be used for simple short code execution. While in the source pane, multiple lines of code can be written to execute data graphs for example. And these codes can be edited and saved.
29
Difference between a dataframe and a matrix?
A matrix is a two-dimensional data structure containing elements of the same type while dataframes are also two-dimensional structures but contain elements of different types
30
Describe how the flexibility of a model is related to overfitting and underfitting
Flexibility of a model refers to the models ability to fit many different possible function forms. However, when a model is too flexible, this can lead to overfitting because it is capturing the noise data along with the training data. Overfitting makes it harder to make generalizations of the data since the noise data complicates the underlying patterns. Models can also have little flexibility and be too simple which leads to underfitting because the model is unable to function correctly from the trained data.
31
Difference between prediction and inference
Prediction is trying to use the data model to make the most accurate predictions of outcomes based on input data while inference is trying to understand the relationship between the variables in the model, so finding the underlying patterns to see how the change of one variable affects the other
32
Difference between training data and test data
Training data is the part of the dataset used to create the model. From this data, the model will reflect the underlying patterns and relationships. The test data will be the other part of the dataset which is used after to asses whether the model works well or not. The model works well when it can produce accurate results, make generalizations, and not overfit or under fit based on the test data.
33
Difference between supervised and unsupervised learning models
Supervised learning models are when each input value has a corresponding output value to help accurately predict the response for future observations based on these patters of responses to certain predictors. Unsupervised learning models are more focused on trying to identify the underlying patterns of the data without focusing on what the correct output should be, which is why the outputs aren't labeled.
34
Difference between regression and classification models
Regression models are models that have a quantitative response (numerical outcomes) while classification models are models with a qualitative response (categorial outcomes).
35
Difference between independent and dependent variables
Independent variables are the predictors or inputs because they are manipulated to explore the effect on other variables while dependent variables are the response or output variables because they are the variables that change based on the independent variables
36
Difference between reducible and irreducible error
Reducible error is the portion of the total error that can be minimized as the model is improved to improve the accuracy of predictions. On the other hand, irreducible error is the portion of the total error that cannot be minimized since it is comes from the noise data which is inherent.
37
Difference between bias and variance
Bias and variance both can result in reducible errors. Bias refers to oversimplified model, so underfitting can happen. While variance is when the model is sensitive to small changes. So these models captures the noise data along with the training data which can lead to overfitting.
38
Simple Linear Regression
Approximately a linear relationship between our independent and dependent variable and use this assumption to find the line that best fits the data
39
Method of Least Squares
Produce the best fit estimates of the betas, so as close to all of the data points as possible
40
Residual
Difference between the model's estimate of y (hat i) for a given x and the actual y sub i from the data
41
Residual and bias relationship
Some residuals are positive and some are negative if our model is unbiased. And we are only concerned with finding the betas that define the line and create the smallest magnitudes of the residuals, we can square them and get the same effect
42
Standard Error
Used to determine if we can estimate our predicted betas and see how close we can get to the actual betas
43
Residual Standard Error
Attempts to estimate the standard deviation of ϵ, so how much our model will deviate from some unknown "true" regression line. So it measures the lack of fit of the model to the data. If it is small it could fit the data well
44
P-value
If the p value is less then 0.05 then we reject the null hypothesis and a relationship actually exists. If more than 0.05, then we accept null and the relationship happened by random chance
45
3 statistics to measure the quality of fit
After determining if there is a relationship or not, we look at the R^2, Residual Standard Error, and F-statistic to measure the quality of fit
46
R^2 Statistic
Same idea as the RSE but measures on a proportional basis rather than by units. So the proportion of variance in Y that can be explained using X. Takes on a value between 0 and 1
47
Total Sum of Squares (TSS)
Total amount of variability inherent in the response before the regression model is performed
48
RSS
Measures the amount of variability left unexplained after performing the regression
49
TSS - RSS
Measures the amount of variability in the response that is explained (or removed) by building the model
50
What does an R^2 close to 1 indicate?
Large proportion of the variability in the response is explained by the regression
51
What does R^2 close to 0 indicate?
The regression does not explain much of the variability response. Either the linear model could be incorrect or the error variance is high or both
52
Requirements for a linear model
- Constant variance - Independent and response variables are linear - Normally distributed - Variables should be independent
53
What is the correlation of X and Y used to measure?
The strength of the relationship between X and Y
54
Multiple Linear Regression
Looks at the individual variable and keeps the rest of the variables constant. Truly independent if we're able to solve for a variable, freeze the rest and repeat the process
55
F-Statistic
Test used for multiple linear regression. Look for at least one independent variable to have a significant relationship with our response variable
56
What does it mean when the F-statistic is less than 1?
We accept the null hypothesis and there is no relationship
57
What does it mean when the F-statistic is greater than 1?
We reject the null hypothesis and at least one predictor variable is significantly related to the response variable
58
F-statistic relationship with dataset (n)
When there is a big dataset, and F-statistic bigger than 1 might still provide evidence against the null. If the dataset is small, we would see a much larger value to reject the null hypothesis
59
Why do we still need to perform F-statistic when our p-value gives the results?
For multiple linear regressions, we want to feel comfortable about the model as a whole along with associated p-value
60
Forward Selection
Start with the null model and fit p simple linear regressions and then add to the null model the variable that results in the lowest RSS
61
Backward Selection
We start with all the variables and remove the variable with the largest p-value and stop when all the remaining variables have p-values below 0.05
62
Mixed Selection
A combination of forward and backwards selection where we add and remove variables until all the variables have a sufficient low p-value.
63
Predictors with only two variables
Two qualitative inputs that takes on two possible numerical values
64
Additive Assumption
Association between an independent variable and the response does not depend on the values of other predictors. Can create a new variable involving both of the variables to analyze its overall summary to prove they are still independent of one another
65
Modeling for Additive Assumption Data
Rely on a polynomial regression to model a better line
66
Outliers
Data points that have absolute values of standard residuals larger than 2
67
High-Leverage Points
Measure of how far away the independent variable values of an observation are from other observations in the model
68
Difference between outliers and high-leverage points
Outliers are data points completely off from the model with their large residuals while high leverage points don't have large residuals but deviate from the center of the model
69
Collinearity
The situation in which two or more predictor variables are closely related to one another. This can make it difficult to separate out the individual effects of collinear variables on the response
70
Regularization of models
Constraining the coefficient estimates or equivalently shrinking them towards zero
71
Ridge Regression
When all the variables are kept but the betas are minimized by the shrinking penalty to push it towards 0. As lambda increases, the flexibility of the ridge regression fit decreases leading to decreased variance but increased bias
72
Lasso
When we don't want to include all predictors so some of the coefficient estimate will be exactly zero removing the respective independent variable from the model
73
Regression splines
Divide the range of X into K regions and a polynomial fits each region and is constrained so they join smoothly at the region boundaries. More knots leads to extremely flexible fit
74
Smoothing Splines
Minimizes RSS criterion subject to a smoothness penalty. We want to use a tuning parameter to smooth the spline to make it less wiggly by using second derivative
75
Local Regression
Able to overlap the polynomial model in a smooth way by computing the fit to a target point using only nearby training observations
76
Generalized Additive Models
Creates a framework to allow us to extend the model