-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Backtesting Using Global ML Models #2613
Comments
A followup question: I have found that if I fit the model, then call historical forecasts with retrain = False it seems to product these forecasts quite quickly. How is this the case? And are these historical forecasts true backtested forecasts (no data leakage of future values?) Is this the way to backtest ML models globally in darts, to first fit then call historical forecasts with retrain == False? |
Hi @jrodenbergrheem, we have updated the historical forecast documentation a while ago and it'll be released in the next few weeks. Here is the updated description, which I hope should answer your questions :)
And regarding your question:
And yes, backtesting without re-training is a good idea:
|
Interesting, so global backtesting (with refit) is not currently implemented from what I understand, got it. That explains the long training time. But if I fit my series to all data and backtest on all data without refit, I am not sure I understand how it is not leaking data... Essentially under the hood is it still fit/predict for each input/output window?? If so then I don't quite grasp why a retrain mode is needed; if the result of historical forecast with refit == False is the same as multiple fit/predict calls on different slices of data that is what I am after. Appreciate the response and the work you do, Jack |
Hi @jrodenbergrheem. Yes, if you use the same series that you trained your model on then you would have data leakage. You can look at The re-train mode is required for example for local models which must be re-trained in order to forecast at each step of the historical simulation. Hope it helps. Let me know if you need more information. |
My question: when backtesting, how is it that my model fits in 2 minutes using .fit() but when I call historical forecasts, in which my model should make 4 fits it seems it could take 1-2 hours to complete??
I notice when using .historical_forecasts and printing the progress it seems to be generating forecasts for one series at a time... Is there a way to backtest globally? I wasn't under the impression historical forecast would have to refit the model for each series given I am using a global model.
Here is my code, data_transformed is a list of timeseries objects that I created using from_group_dataframe
ml_backtested = catboost_model.historical_forecasts(series = data_transformed,
start=start_point,
stride=horizon,
forecast_horizon=horizon,
show_warnings=False,
last_points_only=False,
verbose=True
)
The text was updated successfully, but these errors were encountered: