Session 6 - Model Assessment Flashcards

1
Q

What do we observe in statistical modelling?

How can this relationship be written?

A

A response variable Y and one or more different predictors X1, X2, …Xp.
and assume there is some relationship between Y and X1, X2, …Xp

𝒀=𝒇(𝑿)+𝜺

with
𝑋=”(X1, X2, …Xp)”
ε “= random error term which is independent of X and has a mean of 0”
𝑓(𝑋) “is a function and describes the systematic part between Y and X”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does statistical learning refer to?

A

A set of approaches for estimating f()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In statistical learning f() can be what?

A

Unknown or known (or better “assumed”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In statistical learning if f() is known, then we just need to estimate what?

A

The parameters:

Example: Linear regression
y = B0 + B1 x1 + B2 x2 + … + Bk xk + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In statistical learning, what do we need to identify if f() is unknown?

A

Variables, linear and non-linear relationships and the best methodology (“machine”) to estimate f()

Examples: (feature) variable selection, polynomial regression, generalized additive models, regularized regression, random forests, support vector machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If our aim is inference what are we describing?

This is usually summarised as what?

A

The way Y is affected by changes in X

Usually summarized as average changes (If we increase x1 by 1 unit, Y will change on average by b1 units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the aim of inference?

A

Understanding how Y changes as function of X and we want to know the exact form of f()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What kind of estimator do we want from an inferential model?

A

One that does not differ from the population parameter in a systematic manner (unbiased)

Consistent unbiased estimator reach the true with increasing sample size

Among unbiased estimator we choose the one with the minimum sampling variance (efficient).

If we estimate the parameter many times, than the mean of all parameters is close to true population mean and the variance of all estimates is the smallest possible!

Unbiased estimator can be mean or median but has been shown that mean has less sampling variance than median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the Gauss-Markhov Theorem tell us?

A

Within the class oflinear and unbiasedestimators, the Ordinary Least Square (OLS) estimator is most efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the Best linear unbiased estimator (BLUE) estimated?

A

By minimizing the sum of squares of the differences between observed and predicted values in a given data set (OLS method).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

With a given data set why does the OLS provide the smallest confidence intervals of all unbiased estimators?

A

Because it is unbiased, it has the smallest possible Mean Squared Error (MSE)within the linear and unbiased class of estimators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we estimate prediction error?

A

An estimate of prediction accuracy for continuous outcomes is the Mean Squared Error (MSE):

𝑀𝑆𝐸= (∑((𝑌𝑖)̂− 𝑌𝑖)2)/𝑛 with
Y^𝑖= estimated Y for case i
Yi = observed Y for case i
N = number of cases in hold out sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the assumptions for a linear regression?

A

Linearity: the response variable Y is linearly related to the independent variables (X’s)

Independence: errors (and hence the observations on Y) are independent of each other

Normality: errors (and hence the observations on Y) are normally distributed

Homoscedasticity: errors (and hence Y’s) have constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is important for validity of inference?

A

Assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens if the assumptions for a linear regression model are fulfilled?

A

We can form 95% confidence intervals around the estimated regression coefficients B1, B2, B3..

We can test the null hypotheses that B1= 0, B2= 0, B3= 0…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is statistical modelling

A

The formalization of relationships between variables in the form of mathematical equations.

we infer the process by which data was generated!

theory-driven

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are statistical models, such as regression, typically used for?

A

Explanatory research to assess causal hypotheses that explain why and how empirical phenomena occur

Explanatory research usually infers from a random sample to an expected mean response of the underlying population.

We want unbiased estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In prediction modelling what are we not interested in?

A

inference and expectations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is prediction modelling concerned with?

A

Reliable prediction of the outcome of unseen cases!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does prediction modelling aim to minimise?

A

Minimizing the difference or “loss” between predicted and observed outcomes of new cases!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can we estimate our model in prediction modelling?

A

By minimizing the error of new unseen cases and not the error of the original data set!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What would we like to know once we have developed our prediction model?

A

How well it predicts new unseen cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can we can get a reliable estimate of prediction accuracy?

A

Predict outcome of new cases that were not used for model development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is normal statistical modelling concerned with?

A

How well model predicts seen cases and thereby we calculate difference between observed and predicted and this error will be analyzed to see if it fulfills assumptions of normal distributed error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does a prediction model usually perform better in?

Why is this a problem?

A

The sample used to develop the model (development or training sample) than in other samples, even if those samples are derived from the same population.

Problem as:
- We overestimate the predictive ability of our model
- Model performance is over-optimistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

1) What is something that works well for statistical modelling but not prediction modelling?

2) What is needed instead?

A

Assessing the assumptions of our model does not allow us to assume that it is going to work well on data that it has not seen before!

In other words, a high r2 (explained variance) in our training sample does not allow us to conclude that our model will have the desired performance if we apply it to new cases.

  1. Some kind of assurance of the accuracy of the predictions that our model is putting out! - need to validate our model.

To evaluate the performance of any prediction model, we need to test it on some unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Whether a model performs well or not is based upon what?

A

Models performance on unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are 3 model validation types?

A

Apparent validity
Internal validity
External Validity

29
Q

How do we estimate prediction error?

A

Mean Squared Error (MSE):

𝑀𝑆𝐸= (∑(𝑌𝑖̂− 𝑌𝑖)2/𝑛 with
𝑌̂𝑖= estimated Y for case i
Yi = observed Y for case i
N = number of cases in hold out sample

MSE is an estimate of the expected mean squared error E” (𝑌−𝑌̂)2 in the population!

30
Q

What can the mean squared error also be written as?

A

∑(𝑌𝑖̂− 𝑌𝑖)2 = e12 + e22+ e32…
= Residual Sum of Squares (RSS)

RSS – add up square of each residuals, sum of predicted minus observed squared

M𝑆𝐸= 𝑅𝑆𝑆/𝑛

31
Q

How do the aim of prediction modelling and inferential statistical modelling differ?

A

Aim of prediction modelling is to minimize MSE of unseen cases, unlike inferential modelling where we want to minimize the MSE of the apparent (seen) cases (Least square method of linear regression).

32
Q

What is the problem with using the MSE to explain variance and what is a solution to this?

A

MSE is an absolute measure of prediction error and sometimes difficult to interpret

Therefore present root mean square error (RMSE) = √𝑀𝑆𝐸

Represents the SD of the differences between predicted and observed values

Still often not easy to interpret

33
Q

What is often preferred to explain variance?

A

A relative measure:
- unexplained variance (error variance) or explained variance r2 of unseen cases
- ranges form 0% to 100%
- same as already learnt for linear regression (but calculated using the same or apparent dataset! )

34
Q

What is a relative measure of prediction accuracy?

A

RSS = Total sum of squared error (observed - predicted)2

TSS = Total squared variation of observed outcomes Y:

Total sum of squares (TSS) = ∑(𝑌𝑖− 𝑌̅)
with 𝑌 ̅ = mean of all Y’s

Unexplained variance = 𝑅𝑆𝑆/𝑇𝑆𝑆 =(𝐸𝑟𝑟𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒)/(𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑌)

Explained variance of our model R2:
1- unexplained variance or R2= 1−𝑅𝑆𝑆/𝑇𝑆𝑆 =(𝑇𝑆𝑆−𝑅𝑆𝑆)/𝑇𝑆𝑆

35
Q

What is an absolute measure of prediction accuracy?

A

MSE

36
Q

What model does prediction modelling try to identify?

A

One which minimizes the MSE or equivalent maximises R2 of unseen cases

37
Q

What does inferential modelling try to identify?

A

Model which minimizes the MSE or equivalent maximizes R2 of apparent cases (same cases as used for model building)!

38
Q

What is apparent validity?

A

Testing the model on the whole original model development sample.

39
Q

What is internal validation?

A

Reproducibility of the model for the underlying population and setting where the development sample originated from
“Reproducibility”

40
Q

How can internal validation be measured?

A

Validation Set (or Hold-out or Split-Sample Validation)
k-fold Cross Validation
Bootstrap Validation

41
Q

What is external validation?

A

Generalizability of the model to populations that are plausibly related (clinical population of interest, or different ethnicity etc ). Testing the model on new subjects

Assess transportability rather than reproducibility of a model

42
Q

Internal validity must always be computed, external validity is not always possible.

True or false

A

True

43
Q

When is the performance of a predictive model overestimated?

A

When simply determined on the sample of subjects that was used to construct the model.

44
Q

Usually we cannot easily collect new data to test model on unseen cases.

How can this problem be overcome?

A

Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects.

Hold-out sample
N-fold cross validation
bootstrapping

45
Q

How can internal validation be distinguished?

A

a) with and b) without model selection

46
Q

How can we assess internal validity of a model?

A

A model’s predictive capacity can only be assessed when the model is tested on an independent data set that was not used to develop the model.

That is, there are two data sets:
1. A training set on which the model is derived (and which may be further split up in a training and a validation set for model selection) and

  1. A test set, an independent testing set on which the model is evaluated.
47
Q

What is a training set?

A

Data used to develop the model.

48
Q

What is a test set?

A

Data used to test the developed model.

49
Q

What is training error?

A

Error that results from applying the prediction model developed on the training set to the same training set.

50
Q

What is test error?

A

Expected error that results from applying the developed prediction model on a new observation.

51
Q

What does the hold out data or split sample approach involve?

A

The simplest approach to estimate prediction accuracy is to randomly split the data set into two.

In the larger set a model will be trained, such as estimating the parameters for a regression.

In the smaller test data set we then assess (test) the model by estimating the prediction accuracy.

In larger dataset, typically 70% of data, we will estimate parameter for our regression model and we have our regression model, test outcome in test data set by estimating prediction accuracy

Thus we develop our model, estimate regression coefficients and then predict using model the outcome for people in test data set and calculate MSE and R2 and

52
Q

What are the advantages of the split sample approach?

A

Simple

Easy to implement in any software

53
Q

What are the disadvantages of the split sample approach?

A

Validation result often depends on split: The test set error can be highly variable

Inefficient: Only a subset of observations are used to fit the model - thereby not convenient if have small dataset

54
Q

What does the non-random split sample involve?

A

Data set is split by some clustering factor, e.g. study site, geographic location, time, centre, etc.

Preferable to normal split sampling approach because the test data set is more likely to be different than the training data set

In small samples, similar problems arise as in the random split-sample and should be only used if training sample size is large

55
Q
  1. What is not usually recommended if you have a small sample size?
  2. What is recommended instead?
A
  1. Split-sample approach
  2. Better: Resampling methods (Harrell, 2015, Hastie et al., 2009) -
    - repeatedly drawing samples from a training set
    - refitting a model of interest on each sample in order to obtain more information about the fitted model

All data are used for model development to improve statistical efficiency

Internal validation is done via cross-validation or bootstrapping

e.g., n- fold cross validation

56
Q

What does N-fold cross-validation involve?

A
  • Usually recommended method
  • Use whole data set to fit model and then n- fold cross validation to estimate our prediction accuracy.
  • This is pursued by dividing single available dataset randomly into n folds (equal subsets) into equal subsets, typically 5 to 4
  • In turn each fold is used as the unseen data (test set) with the remaining n-1 folds pooled together as the training set.
  • Prediction performance is the average over the n-folds.
  • Difference between apparent and internal performance is optimism
  • Often the n-fold CV is repeated cross several times (i.e. 50 times) and the average of all repeats is used to get a more stable estimate
  • The parameter for the model are estimated using the whole data set!
  • Cross-validation simulates replication attempts to get an honest estimate of real-world performance
  • Will have 5 slightly different MSE or R2. Take average of MSE of five folds or average of R2 and this is our estimate of performance of model in unseen cases of the same population. This is our estimated prediction accuracy
57
Q

How many folds in cross validation provide nearly unbiased estimates of prediction accuracy?

A

5-10 folds

58
Q

When is N-fold CV for internal validation valid?

A

If no model selection or model tuning (i.e. identifying the best lambda, see next lecture) is performed, for example:

  • Theory driven models
  • Models with a small number of predictors relative to sample size
59
Q

Often we train a model in which hyperparameters also need to be optimized using cross-validation (example ridge, lasso, elastic net).

What issue does this cause?

  1. How can this be overcome?
A
  1. Choosing the parameters based on maximizing prediction accuracy using CV biases the model to the dataset, yielding an overly-optimistic score!
    - This is because testing data are part of the model building process!
  2. Nested CV estimates the generalization error of the underlying model and its (hyper)parameter search!

Nested CV efficiently uses a series of train/validation/test set splits.

60
Q

What does bootstrapping involve?

A

Bootstrapping: take a bootstrap sample (sample with replacement):

  1. Fit your model in your bootstrap sample (on average 63.2% of our data)
  2. Assess your model in the cases not selected by the bootstrap (on average: 36.8% of our data): Predict outcome and calculate the MSE
  3. Repeat the bootstrap 100-1000 times and average the MSEs and use this as an estimate of internal validity!
  4. Again, the final model is estimated using the whole data set!

Better bootstrap approaches are available (i.e. 0.632+)

61
Q

Which one should be chosen? split-sample methods, cross- validation or bootstrapping?

A

Research and simulations showed that that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Also insufficient as don’t use whole dataset to estimate parameters

5-10 cross-validation usually has low bias and low variability and often performs best.

Bootstrapping performs similar well to k-fold cross validation unless sample size is small - thereby Usually doesn’t matter if use k fold cross validation or bootstrapping but cross validation typically preferred

62
Q

Can bootstrapping be used for optimism correction?

A

Yes

63
Q
  1. The cross validation estimate is an estimate of what?
  2. What issues may arise with this?
A
  1. How the results will generalize to an independent data set of the same population (internal validation)
  2. However, internal validation may not be sufficient and indicative for the model’s performance in future patients.

Problem: Our sample is usually not a random sample of the clinical population.

64
Q
  1. What is essential before implementing prediction models in clinical practice?
  2. How is this done?
A
  1. External validation
  2. By a completely new data set collected at a different time, different location and ideally by different researchers -> Assessing the clinical population of interest!
65
Q

What are three different types of external validity?

A

Temporal validation – e.g. same investigators, cross-validate in recent years

Spatial validation (other place) – e.g. same investigators, cross-validate in centres

Fully external – e.g. other investigators, other centre , different times

66
Q

How is optimism calculated?

A

Optimism = Apparent validation estimate – Internal validation estimate

67
Q

Why is the apparent validation estimate usually larger compared to internal?

A

Due to optimism

68
Q

We use all data for model development to get optimal efficiency and then apply internal validation procedures using cross-validation or bootstrapping to estimate performance in new data of the same underlying population.

What does this correct for?

A

Optimism of our performance measures derived from the apparent data set and get (nearly) unbiased performance estimates for prediction of new cases of the same population.