Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale rates via datacard parser #1015

Closed
wants to merge 1 commit into from

Conversation

IzaakWN
Copy link
Contributor

@IzaakWN IzaakWN commented Nov 8, 2024

This PR allows users to scale rates in datacard parser via command line for combineCards.py or text2workspace.py.

I ran into a use case where I want to combine cards with signal pdfs normalized to 1 pb to set limits on the cross section in units of pb. When combined, the signal yields/rates need to be reweighed by their relative fraction of the total cross section. With this PR, one can do

combineCards.py datacard_wh.txt datacard_zh.txt --scale-rate='wh_.*=0.6,zh_.*=0.4'

where 0.6 = 1.37/(1.37+0.88) and 0.4 = 0.88/(1.37+0.88) for 𝜎(WH) = 1.37 pb and 𝜎(ZH) = 0.88 pb.

The implementation allows to use regular expression by default, and specify the bin, e.g.

--scale-rate='ch[12]/wh.*=0.6,ch[12]/wz.*=0.4'

Mathematical expressions native to python are possible thanks to implementation with eval(), e.g.

--scale-rate='wh.*=1.37/(1.37+0.88),zh.*=1-1.37/(1.37+0.88)'

Hope this can be useful to others? I think it should work for simple datacards without shapes and just the rates, or datacards with PDFs, but not histograms (see documentation)?

@adewit
Copy link
Collaborator

adewit commented Nov 8, 2024

Thanks for the initiative - I'm wondering why a physics model wouldn't work for this ?

If a physics model wouldn't suffice, I think it's probably better to rescale in the input datacards, e.g. by parsing them with CombineHarvester and adding a rateParam frozen to the scaling value, or by adjusting the original datacards, if the code used to make those cards is still available.

You seem to say that the option wouldn't work in all cases, and I'd be a bit worried about adding text2workspace runtime options that don't have a fully defined behaviour (we've already seen problems when the nuisance edit directive has been overused in situations for which it was not designed). Maybe I misunderstood what you said :-)

@IzaakWN
Copy link
Contributor Author

IzaakWN commented Nov 8, 2024

Thanks for the initiative - I'm wondering why a physics model wouldn't work for this ?

Yes, it should not be too hard to create a physics model for this. Do you happen to know if one exists that allows you to scale specified processes?

If a physics model wouldn't suffice, I think it's probably better to rescale in the input datacards, e.g. by parsing them with CombineHarvester and adding a rateParam frozen to the scaling value, or by adjusting the original datacards, if the code used to make those cards is still available.

Yes, that is also be a good solution. I encountered the use case when reviewing simple cards for a SUS search with signal PDFs made with a custom datacard writer, so I thought this would be a very simple way for a user to scale the rates on the fly when combining individual cards with combineCards.py, and ensuring the total cross section remains 1 pb to compute limits.

You seem to say that the option wouldn't work in all cases, and I'd be a bit worried about adding text2workspace runtime options that don't have a fully defined behaviour (we've already seen problems when the nuisance edit directive has been overused in situations for which it was not designed). Maybe I misunderstood what you said :-)

No, I think you are right and understood correctly. My understanding is that if a process's shape is taken from a ROOT histogram, the yield in the rates must match the integral of the respective histogram, unless -1 is used? If the rate differs from the histogram integral, text2workspace.py will complain here:

if self.DC.exp[b][p] == -1:
self.DC.exp[b][p] = norm
elif self.DC.exp[b][p] > 0 and abs(norm - self.DC.exp[b][p]) > 0.01 * max(1, self.DC.exp[b][p]):
if not self.options.noCheckNorm:
raise RuntimeError("Mismatch in normalizations for bin %s, process %s: rate %f, shape %f" % (b, p, self.DC.exp[b][p], norm))
I think the added runtime would be negligible if --scale-rate is not used, but I understand if you're prefer to not add options without fully defined behavior. Otherwise, I could specify a warning in the command line?

@IzaakWN
Copy link
Contributor Author

IzaakWN commented Nov 8, 2024

I was overthinking it. Adding the lines

f_wh rateParam * wh_* 0.6
f_zh rateParam * zh_* 0.4

is a perfectly good solution, and more transparant and universal as well. It's easy to do on the fly:

combineCards.py datacard_wh.txt datacard_zh.txt > datacard_comb.txt
sed '$s/.*/f_wh rateParam * wh_* 0.6\n/' -i datacard_comb.txt
sed '$s/.*/f_zh rateParam * zh_* 0.4\n/' -i datacard_comb.txt

I am closing this PR.

@IzaakWN IzaakWN closed this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants