Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: how to calculate features on rolling windows but with fixed start point for each window? #100

Open
saheel1115 opened this issue May 10, 2023 · 1 comment

Comments

@saheel1115
Copy link

Hi folks,

Firstly, grateful to you for creating this library. It has crucial features missing in other libraries and I love it so far.

I seek your help in figuring out:

  • how to calculate features on entire time series?
  • how to move the end point of window forward by a fixed amount (say, 1 week) but not move the start of the window? Basically, my data is weekly and I am solving a forecasting problem. I want to calculate time series features for each training sample without data leakage. Thus, for each week, I want to calculate features for the time series till that week; then move on to next week and calculate features for time series till that week; and so on.

Data:

Index Sales
2023-01-02 4
2023-01-09 3
2023-01-16 7
2023-01-23 5

Output should be something like this:

Features considering only 1st week of data:

Index Sales Length MeanSales
2023-01-02 4 1 4

Features considering first 2 weeks of data:

Index Sales Length MeanSales
2023-01-09 3 2 3.5

Features considering first 3 weeks of data:

Index Sales Length MeanSales
2023-01-16 7 3 4.67

Features considering first 4 weeks of data:

Index Sales Length MeanSales
2023-01-23 5 4 4.75

Thanks in advance,
Saheel.

@saheel1115
Copy link
Author

Update:

I am currently using this solution to achieve what I want:

  • use a really large window size such that it covers the entire dataset
  • use the index of DF as segment_end_idxs
fc = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=[np.max, np.mean],
          series_names=["Sales"],
          windows="700days",
          strides="7days",
    )
)

fc.calculate(
    data=[df],
    include_final_window=True,
    segment_end_idxs=df.index[1:]
)

Is there a better solution? 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant