7 - Validating and Evaluating Data Models Flashcards Preview

Advanced Business Analytics > 7 - Validating and Evaluating Data Models > Flashcards

Flashcards in 7 - Validating and Evaluating Data Models Deck (19)
Loading flashcards...
1
Q

How can forecasts be evaluated?

A
  • compare the forecast to what actually happened

- compare to naive forecast

2
Q

How can forecasts be evaluated?

Compare the forecast to what actually happened

A
  • distance between observations and forecast should be minimal
  • fit can change if the forecasted market changes
  • careful: self-fulfilling prophecies can lead to a perfect fit that is still not helpful
3
Q

How can forecasts be evaluated?

Compare to naive forecast

A
  • naive forecast: assume what happened in the previous period will happen in this period
  • any more complex forecast should be better that the naive forecast, otherwise why pay for a complex method?
4
Q

Measuring forecast performance

Error measures for numerical values

A

Absolut: RMSE (root mean squared error)
-> depends on scale

Percentage: MAPE (mean absolute percentage error)
-> independent of scale

5
Q

Self-fulfilling forecasts can have bad consequences

Example

A
  • a firm offers several products at different prices
  • customers always buy the cheapest product, substituting higher-priced products
  • the firm sells fewer expensive products than expected
  • the forecast predicts little demand for expensive products
  • the firm stocks more cheap products
  • profit spirals down
6
Q

Measuring classification performance

Error measures for categorical values

A
  • error rate per category
  • error rate across categories
  • comparing error rates
7
Q

Measuring classification performance

Error measures for categorical values

Error rate per category

A

Recall = no. of instances correctly assigned to class / no. of instances that are actually in class
(starts from the true assignment)

Precision = no. of instances that are actually in class / no. of instances assigned to class
(starts from the predicted assignment)

8
Q

Measuring classification performance

Error measures for categorical values

Error rate across categories

A
  • average or weighted average

- weighted according to exogenous or endogenous importance of a class

9
Q

Measuring classification performance

Error measures for categorical values

Comparing error rates

A
  • error on training set vs. validation set vs. test set

- expected error (probability) vs. observed error vs. error from benchmark approaches

10
Q

Measuring classification performance

Benchmarking

Possible benchmarks

A
  • statistically expected error rate - probabilistic distribution of instances
  • naive rules
  • expert assignment
11
Q

Measuring classification performance

Benchmarking

Benchmark factors beyond accuracy

A
  • effort - computational, financial, …
  • reliability - over time, data sets, …
  • acceptance - who gets to overwrite?
12
Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

Example: Decision tree

A

Training set: build the tree

Validation set: prune the tree

Test set: evaluate the tree’s predictions

13
Q

Cross-Validation and Bootstrapping

Splitting the data set for evaluation

A

Split the data set:

  • training set
  • validation set
  • test set

Build the model:
- Training set
- Validation set
(both overlapping)

Evaluate the model:
- test set

14
Q

Cross-Validation and Bootstrapping

Hold out 1: k-fold Cross validation

A

Split the data set into k partitions of equal size

  • use k-1 partitions for training (and validation)
  • use the k-th partition for evaluation (“hold-out”)
  • common: k=10

Repeat the cross validation k times, where the hold-out partition alternates across all k partitions

Average the result over the k repetitions for a single measure

15
Q

Cross-Validation and Bootstrapping

Hold out 2: Bootstrap

A
  • alternative to cross validation, applicable for small data sets
  • n = size of the original data set
  • draw n instances with replacement from the data set to generate a training set
  • > drawing with replacement: the same instance can be included multiple times, others are ignored
  • use the instances that were never drawn for the test
16
Q

What’s a lift factor?

A
  • describes the increase in response rate yielded by the learning tool
  • describes only the percentage increase, not an increase in absolute respondents
  • but assuming that any additional sample costs money, computing lift factors enables cost-benefit analyses
17
Q

Computing the lift factor for deterministic classification

Steps

A
  1. consider for all instances the prediction and the actual class
  2. compute the overall share of the desired class (e.g. “positive response to the newsletter”)
  3. compute the share of the desired class in those instances predicted to belong to the class
  4. lift factor = share within the in-class predicted instances/overall share
18
Q

Computing the lift factor for probabilistic classification

Steps

A
  1. consider for all instances the predicted class probability and the actual class
  2. order the instances by descending probability of belonging to the desired class (e.g. “positive response to newsletter”)
  3. select a sample size and select the corresponding number of instances from the top of the ordered list
  4. compute the share of the desired class in the selected instances
  5. lift factor = share within the sample / overall share
19
Q

Lift chart

A
  • lift charts can be computed when classification is probabilistic
  • compute the lift factor when increasing the sample size, possibly comparing to the increase in cost caused by increasing the sample size