-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CorrectionSet containing CorrectionSet #55
Comments
Option 1 implies a schema change. Are the sets to be arbitrarily deep? I originally thought we might have the year be an input, assuming that for a given correction the inputs would be the same or very similar across run years? |
If I understand correctly, the use cases that @IzaakWN would like to capture is for example, a correction in one year needs additional input variables, right?
Yes - I agree :) |
Yes, some type of base class or dynamic casting?
No, in our own use case it would be only two layers: year and then "SF type", see below.
No, at least for tau corrections, it is negligible. What is more important is that the TauPOG has many different types of corrections (SFs for DeepTauVSjet, SFs for DeepTauVSe, trigger SFs, tau energy scales, ...), that depend on different sets of tau variables (pt, eta, decay mode, ...) [1]. If we want one single JSON file Option 3 is similar to option 2, in which case I would do it similar to BTV and JME, who split their JSONs into different jet types: One JSON file per correction type, which is a correction set that contains one correction object per year:
Users load it as
Without the year in the name and because we can now merge certain SF thanks to the versatile JSON schema, this is already a significant reduction in the number of files that we had before in our repos [2–4], but If we don't want this, I would propose option 1, where users load as follows:
Both have the same downside of what has been discussed before: Analyzers need to match the right string in either the filename or in the key. However, this seems unescapable to me, because they anyway need to know which tau ID or what type of correction to select. The question is if you want to show it in the file name, or "hide" it in the correction set so users have to browse for it. My personal preference would be option 2. [1] Slide 11 in https://indico.cern.ch/event/1020470/#2-cms-universal-json-format-fo [2] Slide 20 in https://indico.cern.ch/event/1020470/#2-cms-universal-json-format-fo [3] https://github.com/cms-tau-pog/TauIDSFs/tree/master/data [4] https://github.com/cms-tau-pog/TauTriggerSFs/tree/master/data |
I'm making this a v2 item before we release the final version. My reluctance to just immediately put class CorrectionSet(Model):
schema_version: Literal[VERSION] = Field(description="The overall schema version")
corrections: List[Union[Correction, CorrectionSet]] is that if we ever do end up with some sort of database that can be queried, now the query key needs to be able to support some sort of nesting syntax. Perhaps we forbid |
Just for the record: We discussed this issue in XPOG, and opted for "Option 4", which is using subdirectories in the XPOG GitLab repo like,
which is a fast and elegant alternative to nested However, I still like the idea of nested |
Ok, it may well be that we find nested sets useful after gaining experience. But at least for now let's move it to a later schema version. |
@nsmith- @gouskos
In the XPOG
README.md
you see that they expect a structure something like this, right?In this way you could load one
Corrections
object likecset[year][sf]
, wheresf
for the TauPOG is something likeDeepTauVSjet
,DeepTauVSmu
,tau_energy_scale
, or whatever.The only problem is that currently, it does not seem like a correction set can contain a list of correction sets:
Also see
correctionlib/src/correctionlib/schemav2.py
Line 217 in 9c8d10f
correctionlib/src/correction.cc
Lines 450 to 455 in 9c8d10f
The problem is that each type of correction can have totally different inputs, and you want to avoid to pass to the evaluator the full list of inputs that covers all cases, nor do you want to load the whole collection of SF, if you only need one or two. I see two possible solutions:
cset[year][sf]
.The text was updated successfully, but these errors were encountered: