Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding update for run3 #47299

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

winterchristian
Copy link

PR description:

This PR updates the tau embedding method (TauAnalysis/MCEmbeddingTools) so that it is possible to produce RUN 3 tau embedding samples. It is together with #43871 part of the ongoing effort to produce RUN 3 tau embedding samples.

The tau embedding method is used to estimate the genuine di-tau background from data. It is a common method used in analyses with tau leptons. More information about tau embedding can be found in the paper: https://doi.org/10.1088/1748-0221/14/06/P06032
In principle, one can split the method into 4 steps, but technically there are at least 6 steps that have to be executed.

The following things where changed in this pull request:

  • For the embedding merging step: removing some collections, which were only needed for Run2
  • For the embedding generator HLT step: replacing a filter by a simple dummy that ignores the fact that the tau-leptons are simulated in an else empty detector.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 7, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 7, 2025

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47299/43620

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@@ -639,6 +608,19 @@ def customiseGenerator_HLT(process, changeProcessname=True, reselect=False):
process.embeddingHltPixelVertices.clone()
)

# replace the original detector state filters in the HLT with a dummy module
process.hltPixelTrackerHVOn = cms.EDFilter("EmbeddingDetectorStateFilter",
Copy link
Contributor

@mmusich mmusich Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I am having difficulties to understand the logic here (@cms-sw/hlt-l2 FYI).
If you want to replace these two filters with something that always pass, I guess you can just use this configuration:

process.hltPixelTrackerHVOn = cms.EDFilter("HLTBool",
    result = cms.bool(True)
)

without the need of creating a new module EmbeddingDetectorStateFilter that does nothing.

But more in general can you explain what goes wrong with the regular module we use at HLT?
In the embedded event from where do the unpacked online metadata digis come from? From the real data event or the embedded MC event?

For the record, in the current menu we configure these modules as:

fragment.hltPixelTrackerHVOn = cms.EDFilter( "DetectorStateFilter",
    DebugOn = cms.untracked.bool( False ),
    DetectorType = cms.untracked.string( "pixel" ),
    acceptedCombinations = cms.untracked.vstring(  ),
    DcsStatusLabel = cms.untracked.InputTag( "" ),
    DCSRecordLabel = cms.untracked.InputTag( "hltOnlineMetaDataDigis" )
)

fragment.hltStripTrackerHVOn = cms.EDFilter( "DetectorStateFilter",
    DebugOn = cms.untracked.bool( False ),
    DetectorType = cms.untracked.string( "sistrip" ),
    acceptedCombinations = cms.untracked.vstring(  ),
    DcsStatusLabel = cms.untracked.InputTag( "" ),
    DCSRecordLabel = cms.untracked.InputTag( "hltOnlineMetaDataDigis" )
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for giving us this hint. We are going to change the implementation according to your proposal! The reason why we introduced a module which always evaluates to True is an issue with the DetectorStateFilter when processing embedded events. The two filters reject 100% of our events. (The results of corresponding study can be found in this presentation: https://indico.cern.ch/event/1389181/contributions/5841911/attachments/2817132/4918642/2024-03-11-tau-cqm-meeting-triggers-in-mu-to-tau-embedded-events.pdf.) To our understanding, this is related to the fact that embedded events are "real data" events, but the detector in the HLT simulation is a simulated detector. Thus, the module always tries to read out the detector status of the real detector, while a simulated detector is present in our HLT simulation of the embedded event. With our implementation, we want to emulate the behavior of this module on MC events, which always pass this filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moritzmolch is there a cmsDriver that can be used to reproduce such behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you want to achieve with a cmsDriver command.
First of all the tau embedding datasets are completely separated from normal cms or MC datasets.
You can find those datasets for Run2 UL here as described in this twiki.
In those some HLT filters perform worse or have an efficiency of 0%.

This is because the tau embedding method simulates two tau decays in an otherwise empty detector. If the HLT step now runs on these events, it makes perfect sense that some filters that expect a busy background will not work properly in this empty detector with only the tau decays.

This is therefore expected behavior up to a certain point. We solve this problem by switching off these problematic filters or changing them so that they allow all events.

Sorry if this was not clear and feel free to ask if you want to know more about this.
I also plan to give a presentation about the tau embedding method in the next RECO meeting.

Copy link
Contributor

@mmusich mmusich Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the HLT step now runs on these events, it makes perfect sense that some filters that expect a busy background will not work properly in this empty detector with only the tau decays.

OK, but in this PR we're discussing specifically these two filters: hltPixelTrackerHVOn and hltStripTrackerHVOn.
These filters don't care about event occupancy. That you have 10k tracks or none at all, it doesn't matter.
What matters is the aggregate DCS state of the detector which can be fully ON, partially ON or fully OFF.
Now, in MC events the filter is designed to accept all the events, see

} else {
detectorOn_ = true;
nSelectedEvents_++;
if (verbose_) {
edm::LogInfo("DetectorStatusFilter") << "Total MC Events " << nEvents_ << " Selected Events " << nSelectedEvents_
<< " Detector States " << detectorOn_ << std::endl;
}

so you wouldn't need to bypass them.
In real data, it depends on what was the actual state of the detector in that particular event, but assuming you are able to reconstruct taus in it, it looks to me very unlikely that any of the tracker partitions was OFF.
So all in all, I am sorry, but this doesn't make any sense to me.

Now coming back to this:

I'm not sure what you want to achieve with a cmsDriver command.

what I want to achieve is to see for myself the answer to the question I asked at #47299 (comment)

In the embedded event from where do the unpacked online metadata digis come from? From the real data event or the embedded MC event?

as I haven't received any answer.

Copy link
Author

@winterchristian winterchristian Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, in MC events the filter is designed to accept all the events, ... In real data, it depends on what was the actual state of the detector in that particular event, but assuming you are able to reconstruct taus in it, it looks to me very unlikely that any of the tracker partitions was OFF. So all in all, I am sorry, but this doesn't make any sense to me.

Ah, now this makes sense. Thanks for pointing that out. While looking at the code you linked, I realized that this is decided by evt.isRealData(), which calls a function in EventAuxiliary. I looked into our embedding files and there isRealData is always True. So that could be the reason this trigger performs not as expected.

what I want to achieve is to see for myself the answer to the question I asked at #47299 (comment)

Sorry for this, I'm not an expert on triggers and didn't understand your goal with this.
I will try to answer at my understanding:

But more in general can you explain what goes wrong with the regular module we use at HLT?

I guess it is connected to the evt.isRealData() as described above.

In the embedded event from where do the unpacked online metadata digis come from? From the real data event or the embedded MC event?

Can you please tell me what are unpacked online metadata digis?

And for the cmsDriver commands: I have described how to produce embedding samples in my presentation at a TAU POG meeting last week. You can find the slides here and on slide 25 the cmsDriver command for the HLT step. As input file for the HLT step you can use root://xrootd-cms.infn.it//store/user/cwinter/run3_embedding/2022postEE/MuMu/gensim/0_gensim.root.

And also thanks a lot for your effort to get to the ground of our problems here, I'm very grateful of your help with all this.

* Revert changes of commit 064da79 and replace detector state checks with simple 'True' filters

* add a explaining comment

---------

Co-authored-by: Chris Winter <[email protected]>
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47299/43634

  • Found files with invalid states:
    • TauAnalysis/MCEmbeddingTools/plugins/EmbeddingDetectorStateFilter.cc:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @winterchristian for master.

It involves the following packages:

  • TauAnalysis/MCEmbeddingTools (simulation)

@civanch, @cmsbuild, @kpedro88, @mdhildreth can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor

please test

@kpedro88
Copy link
Contributor

The current policy/procedure is to try to retain backward compatibility. Removing Run 2 collections would potentially break that. In the offline workflows, we handle per-run changes using Eras. I am not sure if it should be done that way here because of the interplay with HLT. @mmusich ?

@mmusich
Copy link
Contributor

mmusich commented Feb 10, 2025

I am not sure if it should be done that way here because of the interplay with HLT.

TSG doesn't guarantee one given HLT menu to run in any other release other than it's native release while taking data.
Having said that I don't see removal of HLT collections in this PR.

@civanch
Copy link
Contributor

civanch commented Feb 11, 2025

@kpedro88 , should we agree with this PR?

@winterchristian
Copy link
Author

winterchristian commented Feb 11, 2025

Before you merge, I would like to implement what @kpedro88 suggested. So that we could use the same code for different eras. Therefore, I will convert it to draft until I implemented these small changes.

@winterchristian winterchristian marked this pull request as draft February 11, 2025 16:55
@kpedro88
Copy link
Contributor

thanks @winterchristian . It is still not clear to me if anything will break with the current PR version, but it's always better to make sure.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47299/43670

  • Found files with invalid states:
    • TauAnalysis/MCEmbeddingTools/plugins/EmbeddingDetectorStateFilter.cc:

@cmsbuild
Copy link
Contributor

Pull request #47299 was updated.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47299/43681

  • Found files with invalid states:
    • TauAnalysis/MCEmbeddingTools/plugins/EmbeddingDetectorStateFilter.cc:

@cmsbuild
Copy link
Contributor

Pull request #47299 was updated.

@winterchristian winterchristian marked this pull request as ready for review February 13, 2025 16:48
@winterchristian
Copy link
Author

I thought I have implemented it as @kpedro88 suggested, but it doesn't work. All toModify() functions are executed even if I use the newest CMMSW version with run3 conditions. Can you please tell me what I've done wrong?

I'm using the following cmsDriver command:

cmsDriver.py TauAnalysis/MCEmbeddingTools/python/EmbeddingPythia8Hadronizer_cfi.py \
    --step GEN,SIM,DIGI,L1,DIGI2RAW \
    --mc \
    --beamspot Realistic25ns13p6TeVEarly2022Collision \
    --geometry DB:Extended \
    --era Run3 \
    --conditions auto:phase1_2022_realistic_postEE \
    --eventcontent RAWSIM \
    --datatier RAWSIM \
    --customise \
    TauAnalysis/MCEmbeddingTools/customisers.customiseGenerator_preHLT \
    --customise_commands 'process.generator.HepMCFilter.filterParameters.MuMuCut = cms.string("(Mu.Pt > 18 && Had.Pt > 18 && Mu.Eta < 2.2 && Had.Eta < 2.4)");process.generator.HepMCFilter.filterParameters.Final_States = cms.vstring("MuHad");process.generator.nAttempts = cms.uint32(1000);' \
    --filein file:lhe_and_cleaned.root \
    --fileout file:simulated_and_cleaned_prehlt.root \
    -n -1 \
    --python_filename generator_preHLT.py

and it already fails before loading the input file with the following error: AttributeError: 'PSet' object has no attribute 'castor'

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47299/43685

  • Found files with invalid states:
    • TauAnalysis/MCEmbeddingTools/plugins/EmbeddingDetectorStateFilter.cc:

@cmsbuild
Copy link
Contributor

Pull request #47299 was updated. @civanch, @cmsbuild, @kpedro88, @mdhildreth can you please check and sign again.

@kpedro88
Copy link
Contributor

@winterchristian the Eras are defined in a sequential manner, so Run3 builds on Run2_2018: https://github.com/cms-sw/cmssw/blob/master/Configuration/Eras/python/Era_Run3_cff.py; and Run2_2018 builds on Run2_2017: https://github.com/cms-sw/cmssw/blob/master/Configuration/Eras/python/Era_Run2_2018_cff.py; etc. If you want something to happen only for Run2, then you either have to use one of the specific modifiers that gets dropped in Run3, or you have to do something like (run2_common & ~run3_common).toModify(...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants