LHCb Z 13 TeV dimuon 2022 #1996

achiefa · 2024-03-10T23:23:48Z

This PR implements the new measurements from LHCb for Z production at 13 TeV in the new commondata format. Please, observe the following:

The keys in the metadata relative to the FK tables will be modified once FK tables will be produced.
I'm not sure of the keys in "nnpdf_metadata", so please check if it is correct.
Recall that the double distribution in the boson transverse momentum and rapidity has the issue with negative define correlation matrix. Hence, you won't find yaml file for this particular distribution.

scarlehoff · 2024-03-11T07:18:01Z

Hi @achiefa coyld you cherry-pick or rebase the last few commits? (so that the branch contain only your changes).

achiefa · 2024-03-11T10:16:31Z

Hi @scarlehoff, done. Now the branch history should be clean. I apologise for that - yesterday night I didn't pay much attention on the rebase procedure and I messed it up.

RoyStegeman · 2024-03-11T10:19:58Z

Should this be on top of new_commodnata_utils or master? Currently it contains commits that are in master but not in new_commodnata_utils.

achiefa · 2024-03-11T10:26:44Z

My understanding from the last pheno meeting was to rebase on top of master, but I may be wrong. Should I rebase on top of new_commondata_utils ?

RoyStegeman · 2024-03-11T10:28:02Z

I wasn't there so I don't know what was discussed, but rebasing on master indeed seems reasonable to me

RoyStegeman · 2024-03-11T10:31:17Z

Is there a report (ideally comparefits) that includes this dataset?

achiefa · 2024-03-11T10:37:59Z

No, there isn't. How can I produce such a report?

scarlehoff

At the moment you won't be able to produce a report using this dataset since the theory is missing.

There's a few things to fix though.

scarlehoff · 2024-03-11T10:38:41Z

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

@@ -0,0 +1,104 @@
+setname: "LHCB_Z_13TEV_DIMUON_2022"


Suggested change

setname: "LHCB_Z_13TEV_DIMUON_2022"

setname: "LHCB_Z_13TEV_DIMUON_2022"

This can be either Z0 (for Z production) or Z0J (for Z + J production).
you cannot have both processes in the same dataset even if they are coming from the same paper. Please break it down into two distinct datasets.

Also, I'd avoid using 2022 in the name of the dataset unless it is necessary to disambiguate.

The paper deals with Z production only. So I'll change it to Z0.

The 2022label I thought was sort of necessary to tell it apart from the older LHCB_Z0 datasets which date back to 2016 and that are already implemented.

ZpT is Z + jet, that's why we have those as Z0J

Yes, but legacy datasets fro the same experiment are Z0. See for instance LHCB_Z0_13TEV. So I suppose I should do the same here, shouldn't I?

Moreover, since the same name is used for the implementation linked above, should I keep the 2022 label?

With respect to the 2022 label, I guess we can live with that. But I'd prefer something that includes some information (like for instance the integrated luminosity, since is what we have in a few places already XYPB).

With respect to the process names, at the moment we have no Z+J process in for LHCB so this would be the first one.

Note that the process label Z0 (or Z0J) refers in practice to the leading order process being considered.
For the Z rapidity this corresponds to $p \ p > Z > l^{+} \ l^{-}$ which we are calling Z0 while for the Z pt is instead $p \ p > Z + j$ where the measurement is the pT of the jet (and, if double differencial, pt and rapidity) and $j$ is a jet, and we are calling this one Z0J.

The difference is important since, for instance, this corresponds to distinct processes when computing predictions with matrix / nnlojet / madgraph.

Ok, now I understand. It wasn't clear to me that Z0 and Z0J must be broken down into two different datasets (i.e. two different folders), even if they come from the same measurement. I would suggest to specify that in the documentation, because it's not trivial for a novice (as I am).

RE the 2022 label, I can change it to the integrated luminosity if you wish. Note that the legacy implementation for Z0 does not report this information in the name.

The documentation needs a lot of updates ^^U

scarlehoff · 2024-03-11T10:39:58Z

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

+setname: "LHCB_Z_13TEV_DIMUON_2022"
+
+nnpdf_metadata:
+  nnpdf31_process: "Z0PT"


Suggested change

nnpdf31_process: "Z0PT"

nnpdf31_process: "NC DY"

This is probably what you want here. But check the other Z0 and Z0J datasets to check what to write here.
This information is used for the theory covariance matrix so it is important that it matches between datasets.

I just checked ATLAS_Z0J_8TEV and LHCB_Z0_13TEV and it seems that nnpdf31_process is DY NC for both Z0 and Z0J.

Anyway, I appreciate that you also provide an explanation for what these values actually do.

These are used to group datadets. For instance for the th covariance same processes correlate muR and muF while different processes correlate only muF.
Beyond that it is used also for plotting and organizational purposes.

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

scarlehoff · 2024-03-11T10:41:16Z

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

+                  label: '$\frac{d\sigma}{dp_T^Z}$',
+                  units: "[pb]"}
+    observable_name: PT
+    process_type: Z0PT


This process type doesn't exist yet.

You will need to add a new process type for this dataset https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py (at the bottom of the module you can find the ones currently implemented)

This part is not really clear to me. In LHCB_Z0_13TEV, process_type is set to EWK_RAP for the rapidity distribution. However, this does not appear at the bottom of the module that you mentioned. The same holds for ATLAS_Z0J_8TEV, although here for the double distributions in pT. Am I missing something?

Basically old datasets had the kinematic variables set to be always 3 variables and always the same one (for a given process, note that in this case "process" refer to both the physical process and the observable/measurement; so two datasets can have the same nnpdf31_process* and different process_type... it's confusing I know).

Anyway, the process_type basically defined how the variables are read by validphys and autogenerate some plotting labels. In the LHCB datasets this process_type still refers to the old process type because the kinematic variables are the same and written in the same order as before...
I guess we can talk about this on Wednesday but basically the summary is that in order to add (new) DY data you will need to add a new process to that list that is able to understand the kinematic variables that you are using.

scarlehoff · 2024-03-11T10:42:45Z

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

+      file: kinematics_ZpT_y.yaml
+    data_central: data_ZpT_y.yaml
+    data_uncertainties: uncertainties_ZpT_y.yaml
+    kinematic_coverage: [pT, yZ, sqrt_s]


I think (but again, check with the other LHCB_Z datasets that @Radonirinaunimi implemented) that for the rapidity the variable to use is y.

buildmaster/LHCB_Z_13TEV_DIMUON_2022/metadata.yaml

RoyStegeman · 2024-03-11T10:45:22Z

At the moment you won't be able to produce a report using this dataset since the theory is missing.

Ah, I assumed that came with the PR. I should have read the first message more carefully.

RoyStegeman · 2024-03-12T18:55:54Z

Could you please move this from buildmaster to validphys/datafiles/new_commondata?

achiefa · 2024-03-18T11:07:11Z

validphys2/src/validphys/process_options.py

@@ -252,6 +252,7 @@ def _displusjet_xq2map(kin_dict):
    "HQP_PTQ": HQP_PTQ,
    "HERAJET": HERAJET,
    "HERADIJET": dataclasses.replace(HERAJET, name="HERADIJET", description="DIS + jj production"),
+    "DY_Z_Y" : dataclasses.replace(DY_W_ETA, name="DY_Z_Y", description="DY Z -> ll rapidity")


I made use of the process type implemented by MC, but I changed name and description. Moreover, note that the accepted kinematic variable in DY_W_ETA is eta. Should I use y instead, since LHCb data is in the boson rapidity?

I think both are ok. Actually, we could call it also DY_RAP or something like that and unify both with a common description.

But changing these (internal) names are "as easy" as a search & replace.

Should I rebase on top of master again? Currently DY_W_ETA is not in process_options.

Yes, if you need to use it yes.

validphys2/src/validphys/process_options.py

achiefa · 2024-03-18T11:50:55Z

validphys2/src/validphys/datafiles/new_commondata/LHCB_Z0J_13TEV_2022/metadata.yaml

@@ -50,18 +51,19 @@ implemented_observables:
                  label: '$\frac{d^2\sigma}{d p_T^Z y^Z}$',
                  units: "[pb]"}
    observable_name: DIMUON-PT-Y
-    process_type: EWK_PTRAP
+    process_type: _


Here I left blank, because we are not going to use the double differential distribution. But if you wish I can implement the relative process type anyway.

Why are we not going to use it?

(if we know already that we are not going to use it... why do we want to have it implemented?)

The problem is that the correlation matrix provided by the experimentalists is not positive definite. And ERN hasn't heard back from them for a while, so we cannot use this data set as it is.

I see, can you remove it then?

Just leave it implemented in the filter, but don't output this part.
That way, once we get (if we get it) we have most of the work already done.

Otherwise we risk using it by mistake.

Do you mean removing this distribution from the metadata? Observe that this part is not outputted in the yaml files exactly for that reason.

yes, and don't include the kinematics/data/uncertainties files (which at least in the metadata are referenced)

achiefa · 2024-03-18T15:11:50Z

...dphys2/src/validphys/datafiles/new_commondata/LHCB_Z0_13TEV_2022/uncertainties_dimuon_y.yaml

+    treatment: MULT
+    type: CORR
+bins:
+- Statistical_uncertainty:


I have a feeling that I misunderstood how uncertainties must be listed in the file. Could please check if they are implemented correctly, @scarlehoff ?

The uncertainties need to be listed one per bin:

bins: - stats: 0.10 some_name: 11.0 some_other_name: 143 - stats: 0.20 some_name: 143

and so on.
The statistical uncertainty should always be named stat.

But I've got correlated statistical uncertainties, so I guess I should write stat_corr_1, stat_corr_2, ..., right?

scarlehoff · 2024-03-20T18:21:16Z

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0J_13TEV_2022/kinematics_dimuon_pT.yaml

+    max: 2.2
+  sqrts:
+    min: null
+    mid: 13


Suggested change

mid: 13

mid: 13

ah, gotcha! The problem you have is that you are using sqrts in TeV, if you write 13000 that should solve the problem you're seeing.

The same is probably true in The rapidity case as well. Make sure that the units are always GeV.

Yes, really my bad. I've fixed that and now points are all within 1. Thanks for spotting it!

scarlehoff · 2024-03-20T18:27:34Z

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0J_13TEV_2022/kinematics_dimuon_pT.yaml

+- pT:
+    min: 0.0
+    mid: 1.1
+    max: 2.2


@enocera I'm a bit worried about the kinematics here. For sure no fixed order calculation will be reliable at pT ~ 0, but I'm not even sure pT ~ 10 will be ok...

Indeed. The cut on pT that we have for the ATLAS and CMS 8 TeV data sets is at 30 GeV, therefore anything lower than that cannot be described by FO pert QCD (but needs resummation). So are you saying that the maximum pT in that measurement is 10 GeV? If this is the case, then we may forget about it until we do pT resummation.

Looking at the paper we may have, say, 3 bins above pT~30 GeV. Anyways, at this point I'd complete the implementation of the data set - that may be useful for the future.

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/metadata.yaml

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/eigenvalues.yaml

achiefa · 2024-06-05T10:00:34Z

I think the PR is now complete. A few comments before merging:

I resolved the conflict with the master branch in process_options. Specifically:
- There is now a single implementation for each of _dyboson_xq2map and _dybosonpt_xq2map. I kept my implementation.
- process_options now contains two DY-process classes: DY_PT and DY_2L. The specification for a particular process type is done using dataclasses.replace in the PROCESS dictionary. For instance, DY_W_ETA is obtained from DY_2L, and the dataset implemented by @comane is safe.
I made the report on the kinematic coverage for both LHCb and CMS (the latter was implemented by Mark). You can find the report here: report
I've attached the data-theory comparison at NNLO for the Z0 data set.

scarlehoff · 2024-06-05T10:13:24Z

Thanks! I think this can be merged now.

I made the report on the kinematic coverage for both LHCb and CMS (the latter was implemented by Mark). You can find the report here: report

This looks good (in that it looks like what I would expect: CMS is central while LHCB accesses the extremes).

achiefa added buildmaster NNPDF4.1 labels Mar 10, 2024

achiefa requested review from enocera and scarlehoff March 10, 2024 23:23

achiefa self-assigned this Mar 10, 2024

scarlehoff changed the base branch from new_commondata_utils to master March 11, 2024 07:16

scarlehoff changed the base branch from master to new_commondata_utils March 11, 2024 07:16

achiefa force-pushed the LHCB_Z_13TEV_DIMUON_2022 branch from 8f11b34 to 0dbe4c9 Compare March 11, 2024 10:13

RoyStegeman changed the base branch from new_commondata_utils to master March 11, 2024 10:28

scarlehoff reviewed Mar 11, 2024

View reviewed changes

achiefa commented Mar 18, 2024

View reviewed changes

validphys2/src/validphys/process_options.py Outdated Show resolved Hide resolved

achiefa commented Mar 18, 2024

View reviewed changes

achiefa force-pushed the LHCB_Z_13TEV_DIMUON_2022 branch from 5a0cb48 to e0d8b90 Compare March 20, 2024 10:08

scarlehoff reviewed Mar 20, 2024

View reviewed changes

scarlehoff reviewed Apr 15, 2024

View reviewed changes

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/metadata.yaml Outdated Show resolved Hide resolved

scarlehoff reviewed Apr 15, 2024

View reviewed changes

nnpdf_data/nnpdf_data/new_commondata/LHCB_Z0_13TEV_2022/eigenvalues.yaml Outdated Show resolved Hide resolved

achiefa force-pushed the LHCB_Z_13TEV_DIMUON_2022 branch from 90e6aec to 612e2dc Compare April 30, 2024 09:31

scarlehoff marked this pull request as ready for review May 8, 2024 14:09

achiefa added 22 commits May 30, 2024 14:53

Cleaning metadata

21f62d8

Changed process_type

495e9fc

Using correct process_type

ad02575

Removing pT data from dataset_names

011b3c4

Changing name for pT distribution

1ed91eb

Corrected name

65ff09f

Using DY_NC

74ef9fc

Add Z0J data to dataset_names

9a04f9f

Removing Z0J

1a38af4

Removing Z0 from dataset_names

f0fc63f

Tables as integers in Z0 metadata

a1d65df

Tables as integers in Z0J metadata

883c220

Change nnpdf31_process for Z0J

77deb2e

Remove theory key from metadata

8fc8d42

Change kin_dict in kin_info

d238a99

Using get_one_of

07c5538

Correct kin map for DY PT

6f3c1cc

Change process type names for DY

7df4e86

Add space

e9ba3ce

Corrected process_type in metadata

04d59bb

Restoring names

bd0ad6a

Changing name of the class for ZpT - correcting descritpion

591ee36

achiefa force-pushed the LHCB_Z_13TEV_DIMUON_2022 branch from bd4eac7 to 591ee36 Compare May 30, 2024 13:56

achiefa added 3 commits May 30, 2024 14:59

Correcting name for DY_PT process

82e94b4

Use raw strings

e61c220

Merge branch 'master' into LHCB_Z_13TEV_DIMUON_2022

3b3cfc2

scarlehoff merged commit 6691f56 into master Jun 5, 2024
6 checks passed

scarlehoff deleted the LHCB_Z_13TEV_DIMUON_2022 branch June 5, 2024 10:13

	setname: "LHCB_Z_13TEV_DIMUON_2022"
	setname: "LHCB_Z_13TEV_DIMUON_2022"

LHCb Z 13 TeV dimuon 2022 #1996

LHCb Z 13 TeV dimuon 2022 #1996

Conversation

achiefa commented Mar 10, 2024

scarlehoff commented Mar 11, 2024

achiefa commented Mar 11, 2024

RoyStegeman commented Mar 11, 2024

achiefa commented Mar 11, 2024

RoyStegeman commented Mar 11, 2024

RoyStegeman commented Mar 11, 2024

achiefa commented Mar 11, 2024

scarlehoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RoyStegeman commented Mar 11, 2024 • edited Loading

RoyStegeman commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enocera Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achiefa commented Jun 5, 2024

scarlehoff commented Jun 5, 2024

scarlehoff Mar 11, 2024 •

edited

Loading

RoyStegeman commented Mar 11, 2024 •

edited

Loading

enocera Mar 20, 2024 •

edited

Loading