Backtesting¶
Backtesting refers to a validation test that assesses the robustness of a model using the existing historical trained data through a series of iterative training where training data is used from its recent to oldest collected values.
A predictive model is typically fitted on a training dataset and assessed using a separate test dataset, where the test dataset does not overlap with training data. Frequently the training data is collected over time and has an explicit time dimension. In such cases, it is common practice to utilize the most recent dataset points for the model test dataset, as it will better mimic a real application of the model.
By applying Backtesting to a model, we can refit the model multiple times, where every time, we use shorter time spans of the training data while using a portion of that data as test data. As a result, the test dataset is replaced with a series of values over time.
Backtesting enables you to:
- Understand the variance of the model accuracy
- Understand and visualize how the model accuracy develops over time
- Identify potential reasons for any performance issues with the modeling approach in the past (e.g., problems around data collection)
Each iteration during Backtesting, the model is fully refitted, which includes a rerun of feature engineering and feature selection. Not entirely refitting during each iteration results in an incorrect Backtesting outcome because the next iteration would have selected features based on the entire data. An incorrect Backtesting outcome also leads to data leakage where information from the future is explicitly or implicitly reflected in the current variables.
Note
- Submit and view feedback for this page
- Send feedback about H2O Model Validation to cloud-feedback@h2o.ai