5. Backtest Range

During the creation process, a continuous set of years must be chosen for the initial backtest of your model. We recommend using years leading up to the present day for models intended to be taken live.

Each period of the backtest will use out-of-sample data with respect to the sample data used to train the model. Typically at any given point in time, the in-sample data of the model is the previous 8 years of data.

We generally recommend backtesting at least 8 years when possible.

TRAINING FREQUENCY

The training frequency sets the interval at which the underlying statistical models are trained with new data. Increasing this frequency will result in more up to date data being included in the in-sample data. The computation required is increased which will incur a higher cost though.

The backtest is part of the model cross-validation process. In the above examples where the backtest years are 2016-2021 with annual retraining, with 8 years of training data and 1 year of validation data; the model would first train on 2007-2014, validate on 2015, then inference on 2016. At the end of 2016 the model will retrain entirely on 2008-2015, validate on 2016, and inference on 2017, and so on.