Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse GPs #4

Open
theogf opened this issue May 27, 2020 · 14 comments
Open

Sparse GPs #4

theogf opened this issue May 27, 2020 · 14 comments

Comments

@theogf
Copy link
Member

theogf commented May 27, 2020

Do we want to include Sparse Gaussian Processes in this package or create an extra one for it?
In my experience a lot of things can be written once for both sparse and non-sparse models, so having them in the same package might simplify things

@sharanry
Copy link
Contributor

@theogf By Sparse GPs, are referring to approximation using pseudo points? In this case, with non-Gaussian likelihoods?

I think this is the right place for them. @willtebbutt What are your thoughts on this?

@theogf
Copy link
Member Author

theogf commented May 30, 2020

Yes I meant the inducing points approach. And the Gaussian likelihood is also a usecase

@willtebbutt
Copy link
Member

willtebbutt commented May 30, 2020

Yeah, sparse approximations are definitely something that we should include here.

It's worth noting that we've already got some infrastructure for this in AbstractGPs. It might be worth considering how we could build sparse approximations for problems involving non-Gaussian likelihoods on top of this. Currently it assumes that you just want the closed-form solution, which isn't necessarily the case and should probably be relaxed i.e. perhaps you want to do stochastic variational inference (SVI) and iteratively find the optimal values for the approximate posterior mean and precision at the diagonalised pseudo-points. This kind of thing would also be helpful for this package, because you're typically going to be optimising the parameters of your approximate posterior process.

Thinking about this has made me realise that it might have been a mistake to make the LatentGP type wrap a FiniteGP. Specifically, it might be better to rename LatentGP to FiniteLatentGP, and have LatentGP be a thing that wraps an AbstractGP and a function that spits out a likelihood function when indexed at a particular collection of indices x.

The issue is that under the current LatentGP interface we've not got a way to talk about making predictions at other inputs locations other than those in the wrapped FiniteGP. This is problematic even without considering pseudo-points, but is really problematic when you consider pseudo-points because it leaves you with no way to discuss a collection of pseudo-points relative to a collection of training points. I'll reference this comment in our general design issue. Thanks for raising this point @theogf.

edit: Hmm I might have been a bit hasty above. We might be totally fine as we are.

@willtebbutt
Copy link
Member

willtebbutt commented May 30, 2020

With our current API, the following is possible:

# Set up prior and initialise approximate posterior.
f = GP(k)
f_approx_post = ApproxPosteriorGP(f, some_parameters_to_be_optimised)
lf_approx_post = LatentGP(f_approx_post(x), BernoulliLikelihood())

# Maybe you're doing SVI, so you compute an estimator of the ELBO.
elbo_estimator(lf_approx_post, y)

# Make some predictions at new locations under the approximate posterior.
rand(f_approx_post(x_pred), BernoulliLikelihood())

This approach has the advantage of not requiring any new objects in LatentGPs other than LatentGP. It has the disadvantage of meaning that you need to pass around the likelihood that you want to make predictions under, rather than it coming packaged together with your model.

If we renamed LatentGP to LatentFiniteGP (as per my previous suggestion) and introduced a LatentGP that is constructed via a GP and a likelihood we would get the following slightly better API:

# Set up prior and initialise approximate posterior.
f = GP(k)
f_approx_post = ApproxPosteriorGP(f, some_parameters_to_be_optimised)
lf_approx_post = LatentGP(f_approx_post, BernoulliLikelihood()) # note no `x`

# Maybe you're doing SVI, so you compute an estimator of the ELBO.
elbo_estimator(lf_approx_post(x), y) # Note the introduction of `x` here.

# Make some predictions at new locations under the approximate posterior.
rand(lf_approx_post(x_pred))

Note that lf_approx_post(x) and lf_approx_post(x_pred) would be LatentFiniteGPs. The crucial difference is in the last line -- there's no need to continue to know what likelihood you want to use when predicting at new locations, because that information is stored in the lf_approx_post object. In my experience this can be quite a nice feature of an API, but I'm not sure that it's really crucial from a user's perspective.

@sharanry @theogf @devmotion any preferences between the above two approaches?

edit: the second approach would introduce a discrepancy between how AbstractGPs and LatentGPs do posterior predictions. AbstractGPs currently requires you to know how much observation noise you want when making posterior predictions, and in my experience that's okay albeit a little annoying. So maybe this suggests that the AbstractGPs design should be refined a little to reflect the fact that you maybe want to include the observation noise process in with the original GP object itself?

@devmotion
Copy link
Member

I think I'd prefer the second approach, it feels natural to me to only state the likelihood once.

@sharanry
Copy link
Contributor

I agree with @devmotion, second approach seems cleaner and more natural. I also think we should update how AbstractGPs handles observation noise to have more uniform API.

@yebai had also suggested considering AbstractGPs as a special case of LatentAbstractGPs with Gaussian likelihood and maybe merge the packages at some point. We could then dispatch already implemented posterior, elbo, logpdf, mean and cov functions based on the defined likelihood. This way we could enable an even more unified API without the need for two extra structs.

@yebai
Copy link
Contributor

yebai commented Jun 29, 2020

yes, it would be great to keep all the types and interface methods related to GP in one place, i.e. AbstractGP. I still think the term LatentGP slightly confusing although I understand the motivations for it. In practice, GP is always associated with some likelihoods, therefore always a latent variable model.

It is ok to have a separate package for non-Gaussian LikelihoodFunctionswhich implements inference specific details of each likelihood for GPs.

@willtebbutt
Copy link
Member

In practice, GP is always associated with some likelihoods, therefore always a latent variable model.

This is not true, and it's not the case that incorporating one's likelihood in with a GP type is strictly a win from an API perspective.

First the API: the main distinction is that when working with the hierarchical LatentGP model, by definition one needs to distinguish between the latent and observed processes, and in general a user may want one or both of these things, depending upon the application. Conversely, an AbstractGP is not to be interpreted as a hierarchical model, it's just a GP that maybe has some Gaussian noise when you make observations of it -- from LatentGPs / users of GPs perspective this noise is an important trick to help with conditioning, and is something you want to be able to use even if you're working with non-Gaussian likelihoods.

Having a larger API would make implementing new AbstractGP subtypes in other packages more cumbersome -- having to worry about likelihoods at all would be a loss in such settings. For example, TemporalGPs and Stheno will both be ported over to the AbstractGPs API when I get the time, and it makes little sense for them to have to concern themselves with likelihood functions. Under the current interface, they just need to know about AbstractGP objects, and different noise levels that can be associated with FiniteGPs. There's no notion of latent / observed processes.

This lack of a observed / latent proesses in the AbstractGP API means that it's unambiguous what cov(f(x, S)) means -- it's the covariance matrix associated with the GP f at inputs x observed under noise S. This not the case in LatentGPs, nor would it be the case if one included the likelihoods inside the AbstractGP object -- you would need to answer whether it refers to the process that you observe or the latent one.

Also, a LatentGP is not a GP, so including the likelihood in the AbstractGP type simply doesn't make sense. As discussed above, keeping the conceptual distinction between the two is helpful.

Furthermore, the distinction between an AbstractGP and a LatentGP has implications from a probabilistic programming perspective in two regards:

  1. Turing / Soss / Genn integration: you would almost certainly want to implement the likelihood-related stuff in Turing / Soss directly, rather than using something from LatentGPs. In an ideal world, we wouldn't even have LatentGPs, we would just have AbstractGPs and Turing / Soss / Gen integration. And again you typically need the noise bit in FiniteGP for the sake of conditioning.
  2. Stheno: As discussed above, I'm going to refactor Stheno so that the programmes that it constructs sub-type AbstractGP, so that you can do all of the fancy stuff that Stheno enables, but within a framework that plays nicely with any custom approximate inference things we develop in this package for non-Gaussian likelihoods. This becomes quite a bit harder if we don't have a separate GP object without a notion of a likelihood -- indeed this is one of the cases when most of the processes in your model won't have a likelihood function defined for them because they're latent. For example, what would GP(m1, k1, non_gaussian_likelihood) + GP(m2, k2, non_gaussian_likelihood2) represent? The idea that a GPPP is a GP no longer holds if you decide to conflate GPs and the hierarchical models represented by LatentGPs.

In short, LatentGPs and AbstractGPs solve two quite different problems, and neither API is strictly superior to the other -- both are very important. For these these reasons, I am on-board with modifying the LatentGP API, but not with modifying the AbstractGP API.

I still think the term LatentGP slightly confusing although I understand the motivations for it.

I am always open to suggestions for better names.

@yebai
Copy link
Contributor

yebai commented Jun 29, 2020

There is some confusion around my suggestion I think. Firstly, I am not suggesting removing the current GP type. Assume that we keep the name LatentGP for the sake of discussion. What I am suggesting is to place LatentGP inside AbstractGPs package. This wouldn't hurt if the user wants to write his own likelihood inside another package, e.g. Stheno, Turing or AugmentedGaussianProcesses.

Secondly, what I mean there is always some likelihood is probably not clear. I didn't mean the standard regression or classification setting, where when specifying a GP model, we often simultaneously also define a likelihood model. For example, in a time-series setting, where we apply a state-space model. In such a case, the likelihood is multi-level, the latent transitioning dynamics could have a Gaussian likelihood, while the observation process often has a separate likelihood model. It is also possible, the latent dynamics don't have a likelihood at all, while the observation process has a Gaussian likelihood. However, this setting still significantly differs from a GP with a Gaussian likelihood. In summary, the likelihood always exists but sometimes the specification is delayed (i.e. not immediately specified after the GP prior).

In a more standard use case of GPs for regression and classification, the user can combine AbstractGPs + KernelFunctions + LikelihoodFunctions to get the most functionality of existing GP libraries such as GPML.

@yebai
Copy link
Contributor

yebai commented Jun 29, 2020

Regarding the name LatentGP, maybe we don't need a separate name. We can leverage parametric types such that the GP type includes all GP specialisations:

  • no likelihood, i.e. a GP prior only
  • Gaussian likelihood
  • non-Gaussian likelihoods

@willtebbutt
Copy link
Member

I'm also a bit confused now haha.

What I am suggesting is to place LatentGP inside AbstractGPs package.

I don't have partcularly strong feelings either way here. Provided that LatentGP is not a subtype of AbstractGP I'm happy --- AbstractGP (at least conceptually) remains a collection of jointly-Gaussian random variables. If LatentGP move to AbstractGPs though, I can't see why we wouldn't have the likelihoods there since there's not a lot you can do with LatentGPs unless you've got likelihoods.

This wouldn't hurt if the user wants to write his own likelihood inside another package, e.g. Stheno, Turing or AugmentedGaussianProcesses.

Agreed.

Secondly, what I mean there is always some likelihood is probably not clear. I didn't mean the standard regression or classification setting, where when specifying a GP model, we often simultaneously also define a likelihood model. For example, in a time-series setting, where we apply a state-space model. In such a case, the likelihood is multi-level, the latent transitioning dynamics could have a Gaussian likelihood, while the observation process often has a separate likelihood model. It is also possible, the latent dynamics don't have a likelihood at all, while the observation process has a Gaussian likelihood. However, this setting still significantly differs from a GP with a Gaussian likelihood. In summary, the likelihood always exists but sometimes the specification is delayed (i.e. not immediately specified after the GP prior).

I agree with this.

In a more standard use case of GPs for regression and classification, the user can combine AbstractGPs + KernelFunctions + LikelihoodFunctions to get the most functionality of existing GP libraries such as GPML.

I agree with this but, again, I think it makes sense to define the likelihoods wherever we define the LatentGP type, since you can't really do much with it without the likelihoods.

@yebai
Copy link
Contributor

yebai commented Jun 29, 2020

I agree with this but, again, I think it makes sense to define the likelihoods wherever we define the LatentGP type, since you can't really do much with it without the likelihoods.

I am thinking that AbstractGPs can define the interface (abstract types and methods) for likelihoods, while LikelihoodFunctions provides the actual implementations. I agree that the likelihood interface in AbstractGPs should be minimal if possible. More methods can be added in LikelihoodFunctions.

@willtebbutt
Copy link
Member

I am thinking that AbstractGPs can define the interface (abstract types and methods) for likelihoods, while LikelihoodFunctions provides the actual implementations. I agree that the likelihood interface in AbstractGPs should be minimal if possible. More methods can be added in LikelihoodFunctions.

Okay. We should probably define the GaussianLikelihood in AbstractGPs though, just so that we've got something to test against. I agree that defining likelihoods in a separate package makes a lot of sense if it helps to keep the number of dependencies in AbstractGPs to a minimum.

@yebai
Copy link
Contributor

yebai commented Jul 2, 2020

Okay. We should probably define the GaussianLikelihood in AbstractGPs though, just so that we've got something to test against. I agree that defining likelihoods in a separate package makes a lot of sense if it helps to keep the number of dependencies in AbstractGPs to a minimum.

Sounds good -- I suggest that we move latent_gp to AbstractGPs and rename this package to LikelihoodFunctions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants