-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Y_df
parameter in reconcile()
method requiring more columns than necessary when using TopDown or MiddleOut
#329
Comments
I just encountered an even stranger related behavior: Following the same steps as in the notebook, I tested the following configurations for the ✅ Works:
✅ Works:
However, combining these four working reconcilers causes an error: ❌ Fails:
It’s unexpected that individually working reconcilers would fail when combined, right? Moreover, why would MiddleOut with historical proportions work (no need for in-sample predictions in I am using the latest version |
Thanks! This sounds like #290 |
The first behaviour is strange, I'll look into it. Second point seems again behaviour of #290 |
Thanks for the quick response, Olivier. I think this might be a different behavior than #290. In #290, the discussion was about needing base forecasts for all levels in the hierarchy, even when using traditional single-level methods. This meant extra rows are required in Y_hat_df. What I am noticing here is that extra columns are being required in Y_df, specifically the in-sample predictions, even though single-level methods shouldn't need them. Let me know if that makes sense. |
In #290 the same issue occurs, ánd it leads to having too few rows in Concretely, the following statement is incorrect:
TopDown and MiddleOut methods are based on insample proportions, and require in-sample data. It's a fair question whether this is the best behaviour though, the current behaviour is more general but I suspect most users don't expect it. |
@macarw I've made a PR that addresses both issues, that should fix them.
Edit: I was incorrect. The behavior is a result of a bug, TopDown only uses historical actual values. It does require the existence of the model insample predictions, but doesn't actually use them. This has an underlying different reason, that also needs to be addressed. |
Thank you, that was super quick 🚀 I’ll take a look at the PR. Before I do, I’d like to clarify my point:
I hope I have expressed myself clearly, and I will be reviewing the changes you have made. |
You're right, I am wrong! Apologies. I was put off by our own code base, which requires the insample predictions for TopDown, but doesn't actually use it to compute the top-down reconciliation, for which it uses the historical insample (actual) values. There's a weird quirk requiring the existence of the model insample predictions due to an unfortunate combination of conditional checks in the code. Anyways, apologies again, I should've read better! |
No worries at all! Thanks for taking another look and for putting together a PR that addresses the issues so quickly. I’ll be testing the new version with these changes once it’s available 👏 |
Hi! Thanks for this amazing library.
I was trying to reproduce the example notebook TourismSmall.ipynb. In the example, you use BottomUp, TopDown with "forecast_proportions", and MiddleOut with "forecast_proportions", none of which require in-sample data.
However, when attempting to use the remaining TopDown and MiddleOut methods that are based on historical proportions ("average_proportions" and "proportion_averages"), and which do require in-sample data, I encountered the following errors:
ColumnNotFoundError: The following columns were not found: [‘AutoARIMA’]
or
ColumnNotFoundError: The following columns were not found: [‘Naive']
To clarify, the code is identical to the notebook, except that I added four additional reconcilers to the
reconcilers
list:In the notebook, the training dataset containing the columns
unique_id
,ds
andy
are passed asY_df
. This should be sufficient to compute historical proportions, yet these methods also requiresY_df
to include the columns fromY_hat_df
(e.g.,AutoARIMA
andNaive
).Why do we need to include in-sample predictions to use these methods, even though they should not rely on them?
I found a similar previous issue.
I’d really appreciate any clarification.
Thanks in advance.
The text was updated successfully, but these errors were encountered: