GPVAR estimator: GP-Copula distr_output #607

StatMixedML · 2020-02-08T09:59:40Z

StatMixedML
Feb 8, 2020

Hi,

I was wondering what the correct distr_output argument is to choose for GP-Copula model described in this paper High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. I am not planning to replicate the paper results, but want to better understand the subtleties of the distr_output argument.

On the official gluont-ts repo, the function GPVAREstimator says that it uses a LowrankGPOutput as the default for the distr_output argument. It also says that

Note that this implementation will change over time and we further work on
this method. To replicate the results of the paper, please refer to our
(frozen) implementation here:
https://github.com/mbohlkeschneider/gluon-ts/tree/mv_release

I checked the implemtnation of @mbohlkeschneider and he is using GPLowrankMultivariateGaussianOutputTransformed for the distr_output argument (line 213). Line 78 shows that the latter is a function of the former GPLowrankMultivariateGaussianOutputTransformed(LowrankGPOutput).

Any advice what the difference between GPLowrankMultivariateGaussianOutputTransformed(LowrankGPOutput) and LowrankGPOutput is and which one to use?

Highly appreciate any advice!

mbohlkeschneider · 2020-02-08T21:25:17Z

mbohlkeschneider
Feb 8, 2020
Maintainer

Hi @StatMixedML,

Using LowRankGPOutput is fine. The GPLowrankMultivariateGaussianOutputTransformed is just an artifact of an earlier implementation and is actually the same as LowRankGPOutput in our GluonTS implementation. It actually does not do anything in the GluonTS implementation because the used transformation is identity .

That being said, you can still use GPLowrankMultivariateGaussianOutputTransformed if you like to play around with custom transformations of the distribution. The TransformedDistribution allows to transform distribution using the change of variable formula.

Hope that helps. Let me know if you have more questions.

Best,
Michael

0 replies

StatMixedML · 2020-02-09T10:10:47Z

StatMixedML
Feb 9, 2020
Author

Thanks @mbohlkeschneider! Really appreciate your input.

I close the issue.

0 replies

StatMixedML · 2020-02-09T17:35:22Z

StatMixedML
Feb 9, 2020
Author

Hi @mbohlkeschneider!

sorry for re-opening the issue. I do have a question concerning the running time of the GPVAREstimator: it seems excessively long to train a model, see the screenshot below

The Data set has 133 monthly time series, with each 12 years of history, across several hierarchies, resulting in ~ 55,000 observations. I am training the model on a 32GB RAM, 6 CPU i7 Desktop machine. Do you have any experience with running time for the GPVAREstimator? Comparing it to DeepAR, DeepAR is blazingly fast, finishing 200 iterations on the same data set in around 5 minutes.

Appreciate your comments!

0 replies

mbohlkeschneider · 2020-02-10T07:52:24Z

mbohlkeschneider
Feb 10, 2020
Maintainer

Hi @StatMixedML,

the model is quite a bit slower than DeepAR, for sure (it is also quite a bi more complex)! Using the parameters from our paper (especially rank=10), DeepAR should be around 10x faster than GPVAR. Note that the master implementation of GPVAR is a bit slower than our release fork because of dependency constraints we have with GluonTS at the moment.

0 replies

StatMixedML · 2020-02-10T08:53:17Z

StatMixedML
Feb 10, 2020
Author

Hi @mbohlkeschneider,

I am asking since the 1st iteration of GPVAREstimator took around 5,000 seconds (rank = 32) while it only takes around 5 seconds for DeepAR.

I am also not sure if I correctly pass the data to the estimator. What I do is as follows

train_ds = ListDataset([{FieldName.TARGET: target,
FieldName.START: start,
FieldName.ITEM_ID: item_id,
FieldName.FEAT_DYNAMIC_REAL: feat_dynamic_real,
FieldName.FEAT_STATIC_CAT: feat_static_cat}
for (target, start, item_id, feat_dynamic_real, feat_static_cat) in zip(target_train,
start_train,
item_id_train,
feat_dynamic_real_train,
state_industry_train)],
freq = "1M")
grouper_train = MultivariateGrouper(max_target_dim=target_dim)
train_ds = grouper_train(train_ds)
train_ds.list_data[0]["feat_dynamic_real"] = feat_dynamic_real_train
train_ds.list_data[0]["feat_static_cat"] = state_industry_train

Step 1.) is the same as for DeepAR. Step 2.) is transforming the univariate time series to multivariate using the grouper function. I then also append feat_dynamic_real_train and feat_static_cat as it does not appear in the initial train_ds = grouper_train(train_ds).

Is that the right way of doing it?

0 replies

mbohlkeschneider · 2020-02-10T10:10:33Z

mbohlkeschneider
Feb 10, 2020
Maintainer

Something seems iffy to me. Can you share the exact values of the hyperparamters?

0 replies

geoalgo · 2020-02-10T10:20:04Z

geoalgo
Feb 10, 2020

I am asking since the 1st iteration of GPVAREstimator took around 5,000 seconds (rank = 32) while it only takes around 5 seconds for DeepAR.

This is expected actually: MXNet takes a long time to create large computational graphs.

To confirm that this is the culprit, you can run with hybridize=False, then the training should start instantly (but it will be slow). If this is the issue, I have unfortunatly no suggestion but to report to open an issue in MXNet to report the slowness when creating large graph.

0 replies

StatMixedML · 2020-02-10T12:43:51Z

StatMixedML
Feb 10, 2020
Author

Hi @mbohlkeschneider & @geoalgo,

given the above data structure, I use the following setup

`np.random.seed(123)
mx.random.seed(123)
pred_h = 24
rank = 5
trainer = Trainer(epochs = 50,
ctx = mx.context.gpu())

estimator = gpvar.GPVAREstimator(freq = "1M",
prediction_length = 24,
target_dim = 133,
trainer = trainer,
context_length = 24,
lags_seq = [1, 12],
rank = 5,
distr_output = LowrankGPOutput(dim = 133, rank = 5))
predictor = estimator.train(training_data = train_ds)
`

@geoalgo: if I add hybridize = False to the Trainer(), model estimation indeed starts right away. This is how it looks like

If I use deepvar.DeepVAREstimator instead of gpvar.GPVAREstimator without the hybridize option, model training starts right away with similar iteration speed compared to hybridize = True for gpvar.GPVAREstimator. Not sure if that adds to clarifying the problem.

How do we proceed? Would you open the issue on mxnet as the author of the package?

0 replies

mbohlkeschneider · 2020-02-11T08:26:33Z

mbohlkeschneider
Feb 11, 2020
Maintainer

AFAIK, this issue is known to the mxnet team (I cannot find an exact issue # right now, though).

Feel free to open another one!

0 replies

StatMixedML · 2020-02-11T09:32:16Z

StatMixedML
Feb 11, 2020
Author

Ok thanks, will do. I leave the issue open for now to refer to it when opening it on mxnet.

0 replies

rt3722 · 2020-02-28T16:40:32Z

rt3722
Feb 28, 2020

@StatMixedML can you share you complete implementation, i think i have similar problem to solve and might be helpful to me.

Thank you

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPVAR estimator: GP-Copula distr_output #607

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

GPVAR estimator: GP-Copula distr_output #607

StatMixedML Feb 8, 2020

Replies: 11 comments

mbohlkeschneider Feb 8, 2020 Maintainer

StatMixedML Feb 9, 2020 Author

StatMixedML Feb 9, 2020 Author

mbohlkeschneider Feb 10, 2020 Maintainer

StatMixedML Feb 10, 2020 Author

mbohlkeschneider Feb 10, 2020 Maintainer

geoalgo Feb 10, 2020

StatMixedML Feb 10, 2020 Author

mbohlkeschneider Feb 11, 2020 Maintainer

StatMixedML Feb 11, 2020 Author

rt3722 Feb 28, 2020

StatMixedML
Feb 8, 2020

mbohlkeschneider
Feb 8, 2020
Maintainer

StatMixedML
Feb 9, 2020
Author

StatMixedML
Feb 9, 2020
Author

mbohlkeschneider
Feb 10, 2020
Maintainer

StatMixedML
Feb 10, 2020
Author

mbohlkeschneider
Feb 10, 2020
Maintainer

geoalgo
Feb 10, 2020

StatMixedML
Feb 10, 2020
Author

mbohlkeschneider
Feb 11, 2020
Maintainer

StatMixedML
Feb 11, 2020
Author

rt3722
Feb 28, 2020