Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/strided dataset for torch and regression models #2624

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
aaaa234
feat: adding stride to shifted_dataset
madtoinou Dec 18, 2024
dec1636
feat: adding stride to the sequential dataset
madtoinou Dec 18, 2024
56b53ed
feat: add striding to tabularization
madtoinou Dec 18, 2024
9847469
feat: add stride to RegressioModel fit API
madtoinou Dec 18, 2024
d055824
fix: bug in striding implementation
madtoinou Dec 18, 2024
67fd550
feat: updating the tabularization tests
madtoinou Dec 19, 2024
d615552
fix: bug
madtoinou Dec 19, 2024
1162c2c
Merge branch 'master' into feat/strided_dataset
madtoinou Dec 19, 2024
9ff31a4
feat: adding test for stride in torch datasets
madtoinou Dec 19, 2024
69fbc47
update changelog
madtoinou Dec 19, 2024
911b8c5
fix: missing test and small bug
madtoinou Dec 20, 2024
7c3e5a2
Merge branch 'master' into feat/strided_dataset
madtoinou Dec 30, 2024
e8ac91d
update changelog
madtoinou Dec 30, 2024
c375832
Merge branch 'master' into feat/strided_dataset
madtoinou Dec 31, 2024
198edd5
doc: update some docstrings
madtoinou Jan 3, 2025
a38b9f5
feat: stride is now applied from the end of the series
madtoinou Jan 9, 2025
c546991
feat: added stride support to the horizon based dataset
madtoinou Jan 9, 2025
dd8ab50
feat: updated unit tests
madtoinou Jan 9, 2025
2fc77e0
feat: adding support of stride in the fit method of torch models
madtoinou Jan 9, 2025
1a7e42d
fix: update the test so that the stride is applied from the end
madtoinou Jan 9, 2025
ad6bd31
Merge branch 'master' into feat/strided_dataset
madtoinou Jan 9, 2025
dd07ed6
Merge branch 'master' into feat/strided_dataset
madtoinou Jan 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- Added more resampling methods to `TimeSeries.resample()`. This allows to aggregate values when down-sampling and to fill or keep the holes when up-sampling. [#2654](https://github.com/unit8co/darts/pull/2654) by [Jonas Blanc](https://github.com/jonasblanc)
- Added general function `darts.slice_intersect()` to intersect a sequence of `TimeSeries` along the time index. [#2592](https://github.com/unit8co/darts/pull/2592) by [Yoav Matzkevich](https://github.com/ymatzkevich).
- Added new time aggregated metric `wmape()` (Weighted Mean Absolute Percentage Error). [#2544](https://github.com/unit8co/darts/pull/2648) by [He Weilin](https://github.com/cnhwl).
- Added a `stride` argument to the `Dataset` classes and the `fit()/predict()` methods of the `RegressionModels` and torch-based models to reduce the size of the training set or apply elaborate training approaches. [#2624](https://github.com/unit8co/darts/pull/2529) by [Antoine Madrona](https://github.com/madtoinou)

**Fixed**
- Fixed a bug when performing optimized historical forecasts with `stride=1` using a `RegressionModel` with `output_chunk_shift>=1` and `output_chunk_length=1`, where the forecast time index was not properly shifted. [#2634](https://github.com/unit8co/darts/pull/2634) by [Mattias De Charleroy](https://github.com/MattiasDC).
Expand Down
2 changes: 2 additions & 0 deletions darts/models/forecasting/global_baseline_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,7 @@ def _build_train_dataset(
future_covariates: Optional[Sequence[TimeSeries]],
sample_weight: Optional[Sequence[TimeSeries]],
max_samples_per_ts: Optional[int],
stride: int,
) -> MixedCovariatesTrainingDataset:
return MixedCovariatesSequentialDataset(
target_series=target,
Expand All @@ -264,6 +265,7 @@ def _build_train_dataset(
input_chunk_length=self.input_chunk_length,
output_chunk_length=0,
output_chunk_shift=self.output_chunk_shift,
stride=stride,
max_samples_per_ts=max_samples_per_ts,
use_static_covariates=self.uses_static_covariates,
sample_weight=sample_weight,
Expand Down
14 changes: 12 additions & 2 deletions darts/models/forecasting/regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,7 @@ def _create_lagged_data(
future_covariates: Sequence[TimeSeries],
max_samples_per_ts: int,
sample_weight: Optional[Union[TimeSeries, str]] = None,
stride: int = 1,
last_static_covariates_shape: Optional[tuple[int, int]] = None,
):
(
Expand All @@ -624,6 +625,7 @@ def _create_lagged_data(
check_inputs=False,
concatenate=False,
sample_weight=sample_weight,
stride=stride,
)

expected_nb_feat = (
Expand Down Expand Up @@ -675,6 +677,7 @@ def _fit_model(
future_covariates: Sequence[TimeSeries],
max_samples_per_ts: int,
sample_weight: Optional[Union[Sequence[TimeSeries], str]],
stride: int,
val_series: Optional[Sequence[TimeSeries]] = None,
val_past_covariates: Optional[Sequence[TimeSeries]] = None,
val_future_covariates: Optional[Sequence[TimeSeries]] = None,
Expand All @@ -692,6 +695,7 @@ def _fit_model(
max_samples_per_ts=max_samples_per_ts,
sample_weight=sample_weight,
last_static_covariates_shape=None,
stride=stride,
)

if self.supports_val_set and val_series is not None:
Expand Down Expand Up @@ -741,6 +745,7 @@ def fit(
max_samples_per_ts: Optional[int] = None,
n_jobs_multioutput_wrapper: Optional[int] = None,
sample_weight: Optional[Union[TimeSeries, Sequence[TimeSeries], str]] = None,
stride: int = 1,
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
**kwargs,
):
"""
Expand Down Expand Up @@ -774,6 +779,10 @@ def fit(
`"linear"` or `"exponential"` decay - the further in the past, the lower the weight. The weights are
computed globally based on the length of the longest series in `series`. Then for each series, the weights
are extracted from the end of the global weights. This gives a common time weighting across all series.
stride
The number of time steps between consecutive samples (windows of lagged values extracted from the target
series), applied starting from the end of the series. This should be used with caution as it might
introduce bias in the forecasts.
**kwargs
Additional keyword arguments passed to the `fit` method of the model.
"""
Expand Down Expand Up @@ -952,6 +961,7 @@ def fit(
sample_weight=sample_weight,
val_sample_weight=val_sample_weight,
max_samples_per_ts=max_samples_per_ts,
stride=stride,
**kwargs,
)
return self
Expand Down Expand Up @@ -993,11 +1003,11 @@ def predict(
If set to `True`, the model predicts the parameters of its `likelihood` instead of the target. Only
supported for probabilistic models with a likelihood, `num_samples = 1` and `n<=output_chunk_length`.
Default: ``False``
show_warnings
Optionally, control whether warnings are shown. Not effective for all models.
**kwargs : dict, optional
Additional keyword arguments passed to the `predict` method of the model. Only works with
univariate target series.
show_warnings
Optionally, control whether warnings are shown. Not effective for all models.
"""
if series is None:
# then there must be a single TS, and that was saved in super().fit as self.training_series
Expand Down
2 changes: 2 additions & 0 deletions darts/models/forecasting/rnn_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -566,12 +566,14 @@ def _build_train_dataset(
future_covariates: Optional[Sequence[TimeSeries]],
sample_weight: Optional[Sequence[TimeSeries]],
max_samples_per_ts: Optional[int],
stride: int,
) -> DualCovariatesShiftedDataset:
return DualCovariatesShiftedDataset(
target_series=target,
covariates=future_covariates,
length=self.training_length,
shift=1,
stride=stride,
max_samples_per_ts=max_samples_per_ts,
use_static_covariates=self.uses_static_covariates,
sample_weight=sample_weight,
Expand Down
2 changes: 2 additions & 0 deletions darts/models/forecasting/tcn_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,12 +538,14 @@ def _build_train_dataset(
future_covariates: Optional[Sequence[TimeSeries]],
sample_weight: Optional[Sequence[TimeSeries]],
max_samples_per_ts: Optional[int],
stride: int,
) -> PastCovariatesShiftedDataset:
return PastCovariatesShiftedDataset(
target_series=target,
covariates=past_covariates,
length=self.input_chunk_length,
shift=self.output_chunk_length + self.output_chunk_shift,
stride=stride,
max_samples_per_ts=max_samples_per_ts,
use_static_covariates=self.uses_static_covariates,
sample_weight=sample_weight,
Expand Down
2 changes: 2 additions & 0 deletions darts/models/forecasting/tft_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1162,6 +1162,7 @@ def _build_train_dataset(
future_covariates: Optional[Sequence[TimeSeries]],
sample_weight: Optional[Sequence[TimeSeries]],
max_samples_per_ts: Optional[int],
stride: int,
) -> MixedCovariatesSequentialDataset:
raise_if(
future_covariates is None and not self.add_relative_index,
Expand All @@ -1179,6 +1180,7 @@ def _build_train_dataset(
input_chunk_length=self.input_chunk_length,
output_chunk_length=self.output_chunk_length,
output_chunk_shift=self.output_chunk_shift,
stride=stride,
max_samples_per_ts=max_samples_per_ts,
use_static_covariates=self.uses_static_covariates,
sample_weight=sample_weight,
Expand Down
16 changes: 15 additions & 1 deletion darts/models/forecasting/torch_forecasting_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,7 @@ def _build_train_dataset(
future_covariates: Optional[Sequence[TimeSeries]],
sample_weight: Optional[Union[Sequence[TimeSeries], str]],
max_samples_per_ts: Optional[int],
stride: int,
) -> TrainingDataset:
"""
Each model must specify the default training dataset to use.
Expand Down Expand Up @@ -659,6 +660,8 @@ def fit(
val_sample_weight: Optional[
Union[TimeSeries, Sequence[TimeSeries], str]
] = None,
stride: int = 1,
eval_stride: int = 1,
) -> "TorchForecastingModel":
"""Fit/train the model on one or multiple series.

Expand Down Expand Up @@ -732,7 +735,12 @@ def fit(
are extracted from the end of the global weights. This gives a common time weighting across all series.
val_sample_weight
Same as for `sample_weight` but for the evaluation dataset.

stride
The number of time steps between consecutive samples (windows of lagged values extracted from the target
series), applied starting from the end of the series. This should be used with caution as it might
introduce bias in the forecasts.
eval_stride
Same as for `stride` but for the evaluation dataset.
Returns
-------
self
Expand All @@ -750,10 +758,12 @@ def fit(
past_covariates=past_covariates,
future_covariates=future_covariates,
sample_weight=sample_weight,
stride=stride,
val_series=val_series,
val_past_covariates=val_past_covariates,
val_future_covariates=val_future_covariates,
val_sample_weight=val_sample_weight,
eval_stride=eval_stride,
trainer=trainer,
verbose=verbose,
epochs=epochs,
Expand All @@ -774,12 +784,14 @@ def _setup_for_fit_from_dataset(
past_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
future_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
sample_weight: Optional[Union[TimeSeries, Sequence[TimeSeries], str]] = None,
stride: int = 1,
val_series: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
val_past_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
val_future_covariates: Optional[Union[TimeSeries, Sequence[TimeSeries]]] = None,
val_sample_weight: Optional[
Union[TimeSeries, Sequence[TimeSeries], str]
] = None,
eval_stride: int = 1,
trainer: Optional[pl.Trainer] = None,
verbose: Optional[bool] = None,
epochs: int = 0,
Expand Down Expand Up @@ -855,6 +867,7 @@ def _setup_for_fit_from_dataset(
future_covariates=future_covariates,
sample_weight=sample_weight,
max_samples_per_ts=max_samples_per_ts,
stride=stride,
)

if val_series is not None:
Expand All @@ -864,6 +877,7 @@ def _setup_for_fit_from_dataset(
future_covariates=val_future_covariates,
sample_weight=val_sample_weight,
max_samples_per_ts=max_samples_per_ts,
stride=eval_stride,
)
else:
val_dataset = None
Expand Down
Loading
Loading