Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempered mcmc #180

Closed
wants to merge 23 commits into from
Closed

Tempered mcmc #180

wants to merge 23 commits into from

Conversation

JesusTorrado
Copy link
Contributor

[WIP]

Missing testing and getdist side of things.

if self.temperature:
raise LoggedError(
self.log,
"Temperature != 1 and dragging are not compatible at the moment.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a quick glance, I think dragging is correct with just the current PR changes (since temperature just scales all the logposts by the same linear factor)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, I guess, for any sampling algorithm, since it's just importance reweighting? Anyway, will test before merging with baseline Planck.

@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2021

Codecov Report

Merging #180 (8e8a5f4) into master (ac3ed01) will decrease coverage by 0.04%.
The diff coverage is 75.40%.

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
- Coverage   87.59%   87.55%   -0.05%     
==========================================
  Files          91       91              
  Lines        8345     8390      +45     
==========================================
+ Hits         7310     7346      +36     
- Misses       1035     1044       +9     
Impacted Files Coverage Δ
cobaya/theories/camb/camb.py 92.00% <ø> (ø)
cobaya/collection.py 85.39% <68.29%> (-1.54%) ⬇️
cobaya/samplers/mcmc/mcmc.py 91.16% <87.50%> (-0.31%) ⬇️
cobaya/model.py 93.92% <100.00%> (+0.03%) ⬆️
cobaya/sampler.py 89.36% <0.00%> (+0.42%) ⬆️
cobaya/prior.py 97.87% <0.00%> (+1.41%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@JesusTorrado
Copy link
Contributor Author

JesusTorrado commented Sep 8, 2021

@cmbant Mostly done. As expected by getdist, now storing tempered weights (integer) and logpost in the collection. Statistical functions such as mean, cov take into account the temperature with which the sample was generated and return the statistics corresponding to the original pdf (unless otherwise requested with ignore_temperature=True, as convergence checks in tempered MCMC do).

A couple of missing things:

  • Collection.thin_samples: should it care about temperature? I reckon it should not at this point in the code, right? (that's the point of integer weights?) But not sure.
  • Handling of covmats I/O: I would say we would load/save the temperature=1 covmats, and modify them internally, since we may want to reuse them in sampling processes with different temperatures?
  • What to do exactly about getdist: now one can load a tempered sample, call MCSamples.cool() with the temperature, and recover the original pdf. But since the temperature is known at the time of loading the sample (stored in the updated.yaml), wouldn't it make sense that getdist is smater about it, e.g. having a default value for the cool argument in MCSamples.cool(), or even calling it automatically before plotting, even if weights are still stored tempered/integer for chain operations purposes?

@cmbant
Copy link
Collaborator

cmbant commented Sep 17, 2021

back from hols now. I agree thinning should just use the integer weights, and saving covmats with T=1 makes sense.
Could certainly add some temperature meta data tag to .properties.ini and yaml equivalent, though people may also want to look at the high-temperature results (e.g. to visually check tail convergence, or see what results look like with effectively larger errors).

@JesusTorrado
Copy link
Contributor Author

Thanks!

  • Then I don't think there is anything to be done then on the thinning side, since the relevant auxiliary GetDist chains are already initialised with weights and logposterior of the tempered posterior (so integer weights and exponentiated minuslogposterior).

  • CovMat I/O is now done for the T=1 pdf.

  • About GetDist:

    • The question is about default behaviour: when we plot a tempered chain, do we want to get plots/statistics for the original pdf (maybe together with a warning) if no kwarg is overriden, or the other way around? I think it should be the first one, but your call. Of course, ideally in both cases one should be able to get plots/statistics of both the tempered and original pdf.
    • No need for now for metadata: the temperature is in the updated.yaml file, and I have already wrote a function for GetDist to extract it (just not sure where to put it, because GetDist has a few different ways/places for loading chains). As stated above, once set as an attribute of the chain, the temperature could be the default value for the arg of MCSamples.cool().

I'll open a GetDist issue about this.

@JesusTorrado
Copy link
Contributor Author

@cmbant do you agree with the changes in the docs here, or have anything to add? 89b3188

@cmbant
Copy link
Collaborator

cmbant commented Sep 22, 2021

Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability". Can be useful for evaluating small tail probabilities. In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).

Did you fix the issue with the covariance scaling?

I don't have a very strong opinion about default GetDist behaviour. Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).

@JesusTorrado
Copy link
Contributor Author

Thanks a lot for the review!

Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability".

The first "with" should not be there, sorry. I mean when computing any quantity for which one weights samples with their posterior/prior/likelihood, e.g. in the GP sampler we try to do quick estimates of the covariance matrix from small high-temperature samples, too small to be fair, but sparse enough to get a decent estimation (hopefully).

Can be useful for evaluating small tail probabilities.

I'll add that.

In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).

Ok, I guess I was thinking too naively about this. Do you think it's still worth mentioning that it may be useful in low dimensions, or should we remove any mention of re-weighting there?

Did you fix the issue with the covariance scaling?

Yes, it was indeed the prior, thanks!

I don't have a very strong opinion about default GetDist behaviour.

My strongest opinion is that, if the default is plotting/using the high-temperature chain, users should get at least a warning when loading the chain with an explanation of how to cool it down. Another strong opinion: some hint/checkbox in the GUI for cooling down on the spot (even it the cooled-down chain only stays in RAM). Otherwise users would have to load the chains by hand in a Python shell/notebook and cool them to be able to plot them? Not sure how you did that before with CosmoMC.

In general, I think GetDist should work with it more like Cobaya's collections: stored as high-temperature, but cooled-down weights and posterior can be requested via methods without modifying the internal high-temperature data. This way you don't need to cool down the sample to get "cool" statistics and plots.

Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).

I'll turn this into another issue and get to work on it over the following weeks.

As for post processing, the easiest way to go at the moment is to automatically cool on load, and thus not preserve temperature. Unless you strongly disagree, I'd leave it like that for now and open another issue to be worked on in the following weeks. (I need this one merged soon).

@cmbant
Copy link
Collaborator

cmbant commented Sep 23, 2021

I started support for auto-cool in getdist at https://github.com/cmbant/getdist/tree/autocool.
Cobaya temperature setting could be propagated to properties as other metadata settings like sampler. Does this work?

GetDist gui currently doesn't have a good way to change settings per chain (since you often operate on many at once), but this I think works analogously to ignore_rows/burn_removed, in that set globally, but chain metadata specified whether should be applied to each specific chain.

@JesusTorrado
Copy link
Contributor Author

Thanks! Could work. Since this is new, we can test it ourselves and see whether it feels convenient. I am going to be using it this coming week myself.

To get the temperature from a yaml, assuming you have got the sampler name with the get_sampler function that is already there, you can use:

def get_sample_temperature(filename_or_info, sampler):
    return yaml_file_or_dict(filename_or_info).get(_sampler, {}).get(sampler, {})\
        .get("temperature", 1)

So if I got it correctly, GetDistGUI will cool down on load unless the metadata requests otherwise. That would look good to me then.

Any comment on my answer above re documentation?

@cmbant
Copy link
Collaborator

cmbant commented Sep 27, 2021

Doc changes sound OK, can certainly mention could be useful for importance sampling in some cases.

@cmbant
Copy link
Collaborator

cmbant commented Feb 24, 2022

TODO (at least): post processing, how to store temperature/cool status (#202), getdist setting of temperature property on cobaya load.

I pushed a few of the tempering-independent fixes in this PR to master.

@cmbant
Copy link
Collaborator

cmbant commented Oct 6, 2022

@JesusTorrado: @AndreasNygaard is interested in using temperatures for training Connect - how far off do you think this is from merge?
(I re-activated travis account again..)

@JesusTorrado JesusTorrado deleted the mcmc_tempered branch March 29, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants