Exam 1 Flashcards

Question

Bias-Variance Trade-Off

Answer 1

Selecting a method that achieves low variance and low bias at the same time. Needs to vary enough to the point where the bias is not being impacted enough

Answer 2

Actuaries would look at historical data of past claims and certain demographics of policyholders to then adjust to the future time period to showcase industry benchmarks and trends, informed assumptions, and trends and patterns from historical data. Then create a model taking into consideration different factors of a policyholder such as age, smoker class, sex, etc... to predict if a claim needs to be filed or not based on if the policyholder dies or not. If the policyholder has a history of smoking, they have higher risk so their rates might be higher.

Answer 3

We would look at historical data of a policyholder's age, risk class, gender, health status, etc... Then create a model that takes these factors into consideration. So if their are a lot of old policyholders with histories of chronic smoking for example, likelihood of deaths might be higher therefore more claims might need to be paid in the future so higher reserves should be set aside

Answer 4

A life insurance company might have a goal of gaining more customers between the ages of 20-30. So they might create models to asses the conditions or the reasons why people between the ages of 20-30 might not buy life insurance as much. Based on this, they might choose to adjust their existing products or create new ones that can tailor to these conditions and reasons

Answer 5

In the console pane, code is executed immediately after you type it and it cannot be edited or saved. This pane can be used for simple short code execution. While in the source pane, multiple lines of code can be written to execute data graphs for example. And these codes can be edited and saved.

Answer 6

A matrix is a two-dimensional data structure containing elements of the same type while dataframes are also two-dimensional structures but contain elements of different types

Answer 7

Flexibility of a model refers to the models ability to fit many different possible function forms. However, when a model is too flexible, this can lead to overfitting because it is capturing the noise data along with the training data. Overfitting makes it harder to make generalizations of the data since the noise data complicates the underlying patterns. Models can also have little flexibility and be too simple which leads to underfitting because the model is unable to function correctly from the trained data.

Answer 8

Prediction is trying to use the data model to make the most accurate predictions of outcomes based on input data while inference is trying to understand the relationship between the variables in the model, so finding the underlying patterns to see how the change of one variable affects the other

Answer 9

Training data is the part of the dataset used to create the model. From this data, the model will reflect the underlying patterns and relationships. The test data will be the other part of the dataset which is used after to asses whether the model works well or not. The model works well when it can produce accurate results, make generalizations, and not overfit or under fit based on the test data.

Answer 10

Supervised learning models are when each input value has a corresponding output value to help accurately predict the response for future observations based on these patters of responses to certain predictors. Unsupervised learning models are more focused on trying to identify the underlying patterns of the data without focusing on what the correct output should be, which is why the outputs aren't labeled.

Answer 11

Regression models are models that have a quantitative response (numerical outcomes) while classification models are models with a qualitative response (categorial outcomes).

Answer 12

Independent variables are the predictors or inputs because they are manipulated to explore the effect on other variables while dependent variables are the response or output variables because they are the variables that change based on the independent variables

Answer 13

Reducible error is the portion of the total error that can be minimized as the model is improved to improve the accuracy of predictions. On the other hand, irreducible error is the portion of the total error that cannot be minimized since it is comes from the noise data which is inherent.

Answer 14

Bias and variance both can result in reducible errors. Bias refers to oversimplified model, so underfitting can happen. While variance is when the model is sensitive to small changes. So these models captures the noise data along with the training data which can lead to overfitting.

Answer 15

Approximately a linear relationship between our independent and dependent variable and use this assumption to find the line that best fits the data

Answer 16

Produce the best fit estimates of the betas, so as close to all of the data points as possible

Answer 17

Difference between the model's estimate of y (hat i) for a given x and the actual y sub i from the data

Answer 18

Some residuals are positive and some are negative if our model is unbiased. And we are only concerned with finding the betas that define the line and create the smallest magnitudes of the residuals, we can square them and get the same effect

Answer 19

Used to determine if we can estimate our predicted betas and see how close we can get to the actual betas

Answer 20

Attempts to estimate the standard deviation of ϵ, so how much our model will deviate from some unknown "true" regression line. So it measures the lack of fit of the model to the data. If it is small it could fit the data well

Answer 21

If the p value is less then 0.05 then we reject the null hypothesis and a relationship actually exists. If more than 0.05, then we accept null and the relationship happened by random chance

Answer 22

After determining if there is a relationship or not, we look at the R^2, Residual Standard Error, and F-statistic to measure the quality of fit

Answer 23

Same idea as the RSE but measures on a proportional basis rather than by units. So the proportion of variance in Y that can be explained using X. Takes on a value between 0 and 1

Answer 24

Total amount of variability inherent in the response before the regression model is performed

Answer 25

Measures the amount of variability left unexplained after performing the regression

Answer 26

Measures the amount of variability in the response that is explained (or removed) by building the model

Answer 27

Large proportion of the variability in the response is explained by the regression

Answer 28

The regression does not explain much of the variability response. Either the linear model could be incorrect or the error variance is high or both

Answer 29

- Constant variance - Independent and response variables are linear - Normally distributed - Variables should be independent

Answer 30

The strength of the relationship between X and Y

Answer 31

Looks at the individual variable and keeps the rest of the variables constant. Truly independent if we're able to solve for a variable, freeze the rest and repeat the process

Answer 32

Test used for multiple linear regression. Look for at least one independent variable to have a significant relationship with our response variable

Answer 33

We accept the null hypothesis and there is no relationship

Answer 34

We reject the null hypothesis and at least one predictor variable is significantly related to the response variable

Answer 35

When there is a big dataset, and F-statistic bigger than 1 might still provide evidence against the null. If the dataset is small, we would see a much larger value to reject the null hypothesis

Answer 36

For multiple linear regressions, we want to feel comfortable about the model as a whole along with associated p-value

Answer 37

Start with the null model and fit p simple linear regressions and then add to the null model the variable that results in the lowest RSS

Answer 38

We start with all the variables and remove the variable with the largest p-value and stop when all the remaining variables have p-values below 0.05

Answer 39

A combination of forward and backwards selection where we add and remove variables until all the variables have a sufficient low p-value.

Answer 40

Two qualitative inputs that takes on two possible numerical values

Answer 41

Association between an independent variable and the response does not depend on the values of other predictors. Can create a new variable involving both of the variables to analyze its overall summary to prove they are still independent of one another

Answer 42

Rely on a polynomial regression to model a better line

Answer 43

Data points that have absolute values of standard residuals larger than 2

Answer 44

Measure of how far away the independent variable values of an observation are from other observations in the model

Answer 45

Outliers are data points completely off from the model with their large residuals while high leverage points don't have large residuals but deviate from the center of the model

Answer 46

The situation in which two or more predictor variables are closely related to one another. This can make it difficult to separate out the individual effects of collinear variables on the response

Answer 47

Constraining the coefficient estimates or equivalently shrinking them towards zero

Answer 48

When all the variables are kept but the betas are minimized by the shrinking penalty to push it towards 0. As lambda increases, the flexibility of the ridge regression fit decreases leading to decreased variance but increased bias

Answer 49

When we don't want to include all predictors so some of the coefficient estimate will be exactly zero removing the respective independent variable from the model

Answer 50

Divide the range of X into K regions and a polynomial fits each region and is constrained so they join smoothly at the region boundaries. More knots leads to extremely flexible fit

Answer 51

Minimizes RSS criterion subject to a smoothness penalty. We want to use a tuning parameter to smooth the spline to make it less wiggly by using second derivative

Answer 52

Able to overlap the polynomial model in a smooth way by computing the fit to a target point using only nearby training observations

Answer 53

Creates a framework to allow us to extend the model

Exam 1 Flashcards

(77 cards)