Linear Models Flashcards

1
Q

Explain the difference between descriptive, predictive, and prescriptive modeling

A
  • Descriptive: Focuses on what happened in the past and aims to describe/explain observed trends by identifying relationships between variables
  • Predictive: Focuses on what will happen in the future and aims to make accurate predictions
  • Prescriptive: Uses a combination of optimization and simulation to investigate and quantify impacts of prescribed actions/decisions to answer “what if” questions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the difference between supervised and unsupervised learning and what their goals are

A
  • Supervised learning: Problems where there is a target variable supervising predictive analysis. Goals are to (1) understand the relationship between the target variable and the predictors and (2) make accurate predictions for the target variable based on the predictors
  • Unsupervised learning: Problems where there is no target variable supervising predictive analysis. Goal is to identify relationships, structures, and patterns between different variables in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain how stratified sampling contributes to a more representative sample than random sampling

A

Stratified sampling ensures that every stratum is properly represented in the collected data. This is done by dividing the underlying population into non-overlapping groups in a non-random fashion, then randomly sampling a set number of observations from each stratum.
* Oversampling and undersampling –> designed for unbalanced data
* Systematic sampling –> draw observations according to a set pattern to arrive at pre-determined sampled observations (no random mechanism)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the three data quality issues one should examine in practice

A
  1. Reasonableness: Do the key statistics for the variables make sense?
  2. Consistency: Are the records in the data inputted consistently?
  3. Sufficient documentation: Can other users easily gain an understanding of different aspects of the data?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the problem with target leakage in predictive analytics

A

Target leakage is when some predictors in a model leak information about the target variable that will not be available when the model is applied in practice. This causes a problem because these variables cannot serve as predictors in practice and would lead to artifially good model performance if mistakenly included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how to use a time variable to make the training/test set split and the advantage of doing so

A

A time variable can be used to make the training/test split on the basis of time. This includes allocating the older observations to the training test set and the more recent observations to the test set. This is useful for evaluating how well a model extrapoltaes time trends observed in the past to future, unseen years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain what hyperparameters are and why they are important for a predictive model

A

Hyperparameters = tuning parameters, which are parameters that control some aspect of the fitting process itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the difference between bias and variance in a predictive analytics context

A

Bias = the difference between the expected value of prediction and the true value of the signal function
* Part of the test error caused by the model not being flexible enough (underfitting)
Variance = the amount of variability of prediction
* Part of the test error caused by the model being too complex (overfitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the difference between variables and features in a predictive analytic context

A

Variables = predictors in a model. the original dataset before any transformations
Features = derivations from the original variables to provide a more useful view of the information in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the difference between dimensionality and granularity

A

There are two main differences:
1. Applicability: Dimensionality is a concept specific to categorical variables. Granularity applies to both numeric and categorical variables.
2. Comparability: We can always order two categorical variables by dimension, but it is not always possible to order them by granularity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the problem with RSS and R squared as model selection measures

A

They are merely goodness-of-fit measures of a linear model to the training data. There is no explicit regard to its complexity or prediction performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the rationale behind and the difference between the AIC and BIC

A

Both AIC and BIC can be used as a model selection criterion. However, the penalty term for BIC is higher than that for the AIC. Therefore, the BIC tends to result in a simpler final model than the AIC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the advantages and disadvantages of polynomial regression

A

Pros: Polynomial regression can take care of substantially more complex relationships between the target variable and predictors than linear ones. This is because the more polynomial terms, the more flexible the fit can be
Cons: Interpretation and the choice of m. Regression coefficients in polynomial regression are more difficult to interpret. Additionally, there is no simple rule as to how to choose the value of m. However, it can be tuned by CV and EDA can also help

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the meaning of interaction

A

An interaction arises if the association between one predictor and the target variable depends on the value/level of another predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how best subset selection works and its limitations

A

Best subset selection is performed by fitting all p models, where p is the total number of predictors being considered, that contain exactly one predictor and picking the model with smallest deviance, fitting all p choose 2 models that contain exactly 2 predictors and picking the model with lowest deviance, and so forth. Then a single best model is selected from the models picked, using a metric such as AIC. In general, there are 2^p models that are fit, which can be quite a large search space as p increases. (Note: global minimum).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain how stepwise selection works and how it addresses the limitations of best subset selection

A

Stepwise selection is an alternative to best subset selection, which is computationally more efficient, since it considers a much smaller set of models. For example, forward stepwise selection begins with a model containing no predictors, and then adds predictors to the model, one-at-a-time, until adding a predictor leads to a worse model by a measure such as AIC. At each step the predictor that gives the greatest additional improvement to the fit is added to the model. The best model is the one fit just before adding a variable leading to a decrease in performance. It is not guaranteed that stepwise selection will find the best possible model out of all 2^p models. (Note: local minimum).

17
Q

Explain why it is not a good idea to add or drop multiple features at a time when doing stepwise selection

A

Because the significance of a feature can be significantly affected by the presence or absence of other features due to their correlations. For ex., a feature can be significant on its own, but become insignificant in the presence of another feature.

18
Q

Explain the difference in the model fitting processes between stepwise selection and regularization

A

Stepwise selection and regularization can both be used to reduce the complexity of a linear model. Stepwise selection goes through a list of candidate models fitted by OLS to decide on a final model with respect to a certain selection criterion. The regression coefficients of the non-predictive features are then set to zero. Regularization considers only a single model hosting all potentially useful features. Instead of OLS, we fit the model using unconventional methods to shrink the coefficient estimates toward zero. The non-predictive features will then have a weaker association with the target variable, and in some instances, become exactly zero, to be dropped from the model.

19
Q

Explain how the regularization parameter λ affects a regularized model

A

The regularization parameter λ quantifies the trade-off between model fit and model complexity.
* λ = 0, the regularization penalty vanishes and the coefficient estimates equal OLS estimates
* λ increases, there is increasing pressure for the coefficient estimates to be closer to zero. The flexibility of the model drops, which leads to decreased variance and an increased bias.
* λ –> infinity, coefficient estimates are all zero, becoming the intercept-only model.

20
Q

Explain why λ and 𝛼 are hyperparameters of a regularized model and how they are typically selected

A

λ and 𝛼 are hyperparameters, which are pre-specified inputs that go into the model fitting process and are not determined as part of the optimization procedure. They are typically selected by cross-validation. This is done by constructing a fine grid of values of (λ,𝛼) in advance, computing the cross-validation error for each pair of values of (λ,𝛼), and choosing the pair that produces the lowest cross-validation error

21
Q

Stage 1: Define the Business Problem. What are the objectives and constraints?

A

Objectives: Is the objective prediction-focused or interpretation-fouced?
Constraints: What is the availability of easily accessible and high quality data? Are there any implementation issues?

22
Q

Stage 2: Data Collection. Data Design stage; Explain the relevance of data and the importance of data source

A

Other things equal, having more data is generally desirable since more information is available and it makes model training more robust and less vulnerable to noise.
It’s necessary to source the data from the right population and time frame
* Population: Data source should be a reasonably good proxy of a representative sample with the true population
* Time frame: The time period chosen should reflect the business environment in which we will be implementing our models

23
Q

What is sufficient documentation of a dataset?

A
  • A description of the dataset overall, including the data source
  • A description of each variable in the data, including its name, definition, and format
  • Notes about any past updates or other irregularities of the dataset
  • A statement of accountability for the correctness of the dataset
  • A description of the governance processes used to manage the dataset
24
Q

What are other data issues related to the collection and use of data?

A
  • PII - personally identifiable information. It’s important to comply with laws, regulations, and standards of practice pertaining to personal data. anonymization (de-identify data), data security (encryption and access/transfer restrictions), and terms of use (be aware of terms and conditions and privacy policy related to collection and use of data)
  • Variables causing unfair discrimination/sensitive information such as race, ethnicity, or gender. Differential treatment may lead to unfair discrimination and could be deemed unethical when using them as predictors.
  • Target leakage
25
Q

What are some considerations in selecting the best model?

A
  • Prediction performance
  • Interpretability
  • Ease of implementation
26
Q

What are 3 ways to reduce the dimensionality of a categorical predictor?

A
  • Combine sparse categories with others that exhbit a similar behavior
  • Combine similar categories (w.r.t mean or median)
  • Use prior knowledge (for ex. grouping hour of the day into early, morning, afternoon, and evening)
27
Q

Explain separately what an R squared of 0 and 1 indicates for a linear model

A
  • R squared = 0: this implies that RSS=TSS which in turn means that the fitted linear model is essentially the intercept only model. The predictors collectively bring no useful information for understanding the target variable
  • R squared = 1: this implies that RSS = 0, which in turn means that the model perfectly fits each training observation. Although the goodness of fit to the training set is perfect, this model has probably overfitted the data and may not do well on future, unseen data
28
Q

Compare the bias and variance of a quadratic regression model and a cubic regression model

A

The cubic regression model is more complex than the quadratic regression model. This results in the cubic model having a lower squared bias but a higher variance. The additional degree of freedom coming from the cubic term provides the model with greater flexibility, but makes the model more vulnerable to overfitting

29
Q

Suggest two ways to choose between two polynomial regression models

A
  1. We can compare the AIC of the two models and choose the one with the lower AIC
  2. We can use cross validation and choose the model with the lower cross validation error
30
Q

What are the methods for handling non-linearity?

A
  1. Polynomial regression
  2. Binning - piecewise constant functions
  3. Piecewise linear functions
31
Q

What is collinearity?

A

When two or more features are closely, if not exactly, linearly related. An example of when perfect collinearity exists is when a model includes the dummy variables of all levels of a categorical predictor, which means the dummy variables always sum to 1

32
Q

What is the problem with collinearity?

A
  1. The presence of collinear variables means that some of the variables do not bring much additional information because their values can be largely deduced from the values of other variables, leading to redundancy
  2. The interpretation of coefficient estimates becomes difficult and it is hard to separate the individual effects of the target variable
33
Q

How can we handle collinearity?

A
  1. Delete one of the problematic predictors causing collinearity. Due to their strong linear relationship, the deletion is going to cause a minimial impact on the model (judgement call)
  2. Pre-process the data using dimension reduction techniques such as PCA to combine collinear predictors into a much smaller number of predictors which are far less correlated with each other and capture different kinds of information in the data
34
Q

List 3 differences bewteen stepwise selection and regularization

A
  1. Stepwise selection uses the number of features as a direct measue of model complexity. Regularization uses the regularization paramater as an indirect measure of model complexity
  2. For stepwise selection, the entire categorical predictor with all levels are added or dropped as the algorithim iterates, unless we manually binarize the categorical predictors in advance. Regularization automatically binarizes the categorical predictors into its factor levels
  3. For stepwise selection, numeric predictors are left intact without standardization and whether or not they are standardized has no impact on the fitted model. For regularization, numeric predictors are typically standardized (dividing each predictor by their standard error) so that they are on a common scale when the model is fitted