Assessing Algorithms Flashcards
KNN Pro/Con
P - Doesn’t overfit model
C - Can’t extrapolate
Parametric Pro/Con
P - Ability extrapolate
C -
RMS Error
sqrt ( (sum(actual - predicted)^2) / n))
out of sample error
RMSE on test data
Error for data predicted outside of training model data
Cross validation
When there isn’t enough training data, split existing data into chunks.
Then, use different combinations of chunks to train/test, eg round1: test(1-4), train(5), round2: test(2-5), train(1)
Cross validation and financial data
Because it can “peek” at future values, it does not fit finance data well
Roll forward cross validation
Cross Validation where Training data is ALWAYS before test data
Correlation
For a regression algorithm:
Look at relationship between predicted and actual values
Scatter plot actual and predicted values, and calc correlation coefficient
Correlation != slope
Overfitting
When in sample error decreases and out of sample error increases