-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LHCb Z 13 TeV dimuon 2022 #1996
Conversation
Hi @achiefa coyld you cherry-pick or rebase the last few commits? (so that the branch contain only your changes). |
8f11b34
to
0dbe4c9
Compare
Hi @scarlehoff, done. Now the branch history should be clean. I apologise for that - yesterday night I didn't pay much attention on the rebase procedure and I messed it up. |
Should this be on top of |
My understanding from the last pheno meeting was to rebase on top of |
I wasn't there so I don't know what was discussed, but rebasing on master indeed seems reasonable to me |
Is there a report (ideally comparefits) that includes this dataset? |
No, there isn't. How can I produce such a report? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment you won't be able to produce a report using this dataset since the theory is missing.
There's a few things to fix though.
@@ -0,0 +1,104 @@ | |||
setname: "LHCB_Z_13TEV_DIMUON_2022" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setname: "LHCB_Z_13TEV_DIMUON_2022" | |
setname: "LHCB_Z_13TEV_DIMUON_2022" |
This can be either Z0
(for Z production) or Z0J
(for Z + J production).
you cannot have both processes in the same dataset even if they are coming from the same paper. Please break it down into two distinct datasets.
Also, I'd avoid using 2022
in the name of the dataset unless it is necessary to disambiguate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper deals with Z production only. So I'll change it to Z0
.
The 2022
label I thought was sort of necessary to tell it apart from the older LHCB_Z0
datasets which date back to 2016 and that are already implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZpT is Z + jet, that's why we have those as Z0J
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but legacy datasets fro the same experiment are Z0
. See for instance LHCB_Z0_13TEV. So I suppose I should do the same here, shouldn't I?
Moreover, since the same name is used for the implementation linked above, should I keep the 2022
label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With respect to the 2022
label, I guess we can live with that. But I'd prefer something that includes some information (like for instance the integrated luminosity, since is what we have in a few places already XYPB
).
With respect to the process names, at the moment we have no Z+J process in for LHCB
so this would be the first one.
Note that the process label Z0
(or Z0J
) refers in practice to the leading order process being considered.
For the Z rapidity this corresponds to Z0
while for the Z pt is instead Z0J
.
The difference is important since, for instance, this corresponds to distinct processes when computing predictions with matrix / nnlojet / madgraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I understand. It wasn't clear to me that Z0
and Z0J
must be broken down into two different datasets (i.e. two different folders), even if they come from the same measurement. I would suggest to specify that in the documentation, because it's not trivial for a novice (as I am).
RE the 2022
label, I can change it to the integrated luminosity if you wish. Note that the legacy implementation for Z0
does not report this information in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation needs a lot of updates ^^U
setname: "LHCB_Z_13TEV_DIMUON_2022" | ||
|
||
nnpdf_metadata: | ||
nnpdf31_process: "Z0PT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nnpdf31_process: "Z0PT" | |
nnpdf31_process: "NC DY" |
This is probably what you want here. But check the other Z0
and Z0J
datasets to check what to write here.
This information is used for the theory covariance matrix so it is important that it matches between datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked ATLAS_Z0J_8TEV and LHCB_Z0_13TEV and it seems that nnpdf31_process
is DY NC
for both Z0
and Z0J
.
Anyway, I appreciate that you also provide an explanation for what these values actually do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used to group datadets. For instance for the th covariance same processes correlate muR and muF while different processes correlate only muF.
Beyond that it is used also for plotting and organizational purposes.
label: '$\frac{d\sigma}{dp_T^Z}$', | ||
units: "[pb]"} | ||
observable_name: PT | ||
process_type: Z0PT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This process type doesn't exist yet.
You will need to add a new process type for this dataset https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py (at the bottom of the module you can find the ones currently implemented)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not really clear to me. In LHCB_Z0_13TEV, process_type
is set to EWK_RAP
for the rapidity distribution. However, this does not appear at the bottom of the module that you mentioned. The same holds for ATLAS_Z0J_8TEV, although here for the double distributions in pT. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically old datasets had the kinematic variables set to be always 3 variables and always the same one (for a given process, note that in this case "process" refer to both the physical process and the observable/measurement; so two datasets can have the same nnpdf31_process
* and different process_type
... it's confusing I know).
Anyway, the process_type basically defined how the variables are read by validphys and autogenerate some plotting labels. In the LHCB datasets this process_type still refers to the old process type because the kinematic variables are the same and written in the same order as before...
I guess we can talk about this on Wednesday but basically the summary is that in order to add (new) DY data you will need to add a new process to that list that is able to understand the kinematic variables that you are using.
file: kinematics_ZpT_y.yaml | ||
data_central: data_ZpT_y.yaml | ||
data_uncertainties: uncertainties_ZpT_y.yaml | ||
kinematic_coverage: [pT, yZ, sqrt_s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (but again, check with the other LHCB_Z
datasets that @Radonirinaunimi implemented) that for the rapidity the variable to use is y
.
Ah, I assumed that came with the PR. I should have read the first message more carefully. |
Could you please move this from |
@@ -252,6 +252,7 @@ def _displusjet_xq2map(kin_dict): | |||
"HQP_PTQ": HQP_PTQ, | |||
"HERAJET": HERAJET, | |||
"HERADIJET": dataclasses.replace(HERAJET, name="HERADIJET", description="DIS + jj production"), | |||
"DY_Z_Y" : dataclasses.replace(DY_W_ETA, name="DY_Z_Y", description="DY Z -> ll rapidity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made use of the process type implemented by MC, but I changed name and description. Moreover, note that the accepted kinematic variable in DY_W_ETA
is eta
. Should I use y
instead, since LHCb data is in the boson rapidity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are ok. Actually, we could call it also DY_RAP or something like that and unify both with a common description.
But changing these (internal) names are "as easy" as a search & replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rebase on top of master again? Currently DY_W_ETA
is not in process_options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you need to use it yes.
@@ -50,18 +51,19 @@ implemented_observables: | |||
label: '$\frac{d^2\sigma}{d p_T^Z y^Z}$', | |||
units: "[pb]"} | |||
observable_name: DIMUON-PT-Y | |||
process_type: EWK_PTRAP | |||
process_type: _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I left blank, because we are not going to use the double differential distribution. But if you wish I can implement the relative process type anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not going to use it?
(if we know already that we are not going to use it... why do we want to have it implemented?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the correlation matrix provided by the experimentalists is not positive definite. And ERN hasn't heard back from them for a while, so we cannot use this data set as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can you remove it then?
Just leave it implemented in the filter, but don't output this part.
That way, once we get (if we get it) we have most of the work already done.
Otherwise we risk using it by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean removing this distribution from the metadata? Observe that this part is not outputted in the yaml files exactly for that reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and don't include the kinematics/data/uncertainties files (which at least in the metadata are referenced)
treatment: MULT | ||
type: CORR | ||
bins: | ||
- Statistical_uncertainty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that I misunderstood how uncertainties must be listed in the file. Could please check if they are implemented correctly, @scarlehoff ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uncertainties need to be listed one per bin:
bins:
- stats: 0.10
some_name: 11.0
some_other_name: 143
- stats: 0.20
some_name: 143
and so on.
The statistical uncertainty should always be named stat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I've got correlated statistical uncertainties, so I guess I should write stat_corr_1, stat_corr_2, ..., right?
5a0cb48
to
e0d8b90
Compare
max: 2.2 | ||
sqrts: | ||
min: null | ||
mid: 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mid: 13 | |
mid: 13 |
ah, gotcha! The problem you have is that you are using sqrts in TeV, if you write 13000 that should solve the problem you're seeing.
The same is probably true in The rapidity case as well. Make sure that the units are always GeV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, really my bad. I've fixed that and now points are all within 1. Thanks for spotting it!
- pT: | ||
min: 0.0 | ||
mid: 1.1 | ||
max: 2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enocera I'm a bit worried about the kinematics here. For sure no fixed order calculation will be reliable at pT ~ 0, but I'm not even sure pT ~ 10 will be ok...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. The cut on pT that we have for the ATLAS and CMS 8 TeV data sets is at 30 GeV, therefore anything lower than that cannot be described by FO pert QCD (but needs resummation). So are you saying that the maximum pT in that measurement is 10 GeV? If this is the case, then we may forget about it until we do pT resummation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the paper we may have, say, 3 bins above pT~30 GeV. Anyways, at this point I'd complete the implementation of the data set - that may be useful for the future.
nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/metadata.yaml
Outdated
Show resolved
Hide resolved
nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/eigenvalues.yaml
Outdated
Show resolved
Hide resolved
90e6aec
to
612e2dc
Compare
bd4eac7
to
591ee36
Compare
I think the PR is now complete. A few comments before merging:
|
Thanks! I think this can be merged now.
This looks good (in that it looks like what I would expect: CMS is central while LHCB accesses the extremes). |
This PR implements the new measurements from LHCb for Z production at 13 TeV in the new commondata format. Please, observe the following: