Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use central phi functions instead LST ones #146

Merged

Conversation

VourMa
Copy link
Collaborator

@VourMa VourMa commented Jan 20, 2025

This is the follow up for #142. I have created it as a draft PR, as, to be submitted to cms-sw, we would need to first merge #145 (for proper naming of the reducePhiRange functions) and #141 (so that we can have meaningful tests). Having said that, I think we can start discussing and testing this internally.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 20, 2025

/run all

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 21, 2025

@ariostas Sorry, I forgot, did we move to 15_0? Could you tell me the exact version, so that I can write it on our repo, and then update this PR appropriately?

@ariostas
Copy link
Member

@VourMa I set it up so that now it always uses the latest release, so it's using 15_0_0_pre2. For some reason AlpakaMath didn't make it into that release, so I'll just make the CI check out that package so that this PR works

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 21, 2025

@VourMa I set it up so that now it always uses the latest release, so it's using 15_0_0_pre2. For some reason AlpakaMath didn't make it into that release, so I'll just make the CI check out that package so that this PR works

Oops, I didn't think of that. Feel free to just run the tests when you sort it out. Thank you for taking care of it!

@ariostas
Copy link
Member

It should work now.

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.9    399.0    187.8    151.7    146.5    548.9    122.8    233.5    153.8      3.1    1992.9    1398.2+/- 387.5     529.6   explicit[s=4] (target branch)
   avg     48.5    377.1    188.7    151.7    166.9    702.5    122.4    226.7    177.8      3.5    2166.0    1414.9+/- 395.6     574.0   explicit[s=4] (this PR)

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf
Copy link
Member

ALPAKA_FN_ACC ALPAKA_FN_INLINE float delta_phi(const float phi1, const float phi2) {
float delta = phi1 - phi2;
// Adjust delta to be within the range [-M_PI, M_PI]
if (delta > kPi) {
delta -= 2 * kPi;
} else if (delta < -kPi) {
delta += 2 * kPi;
}

Can you remove this one in the inference code I added as well? Should probably check that it doesn't affect the performance plots. I tried using another implementation of the delta phi function and it gave some weird results, not sure if it was just a bug in my old code.

@VourMa VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from eec20cb to 42dd567 Compare January 21, 2025 23:09
@VourMa
Copy link
Collaborator Author

VourMa commented Jan 21, 2025

ALPAKA_FN_ACC ALPAKA_FN_INLINE float delta_phi(const float phi1, const float phi2) {
float delta = phi1 - phi2;
// Adjust delta to be within the range [-M_PI, M_PI]
if (delta > kPi) {
delta -= 2 * kPi;
} else if (delta < -kPi) {
delta += 2 * kPi;
}

Can you remove this one in the inference code I added as well? Should probably check that it doesn't affect the performance plots. I tried using another implementation of the delta phi function and it gave some weird results, not sure if it was just a bug in my old code.

Sure, I replaced it in the new version I pushed. Let's see if it works out.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 21, 2025

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.3    396.2    189.6    153.9    149.5    551.2    124.8    235.2    151.9      3.5    2001.9    1404.4+/- 387.5     529.4   explicit[s=4] (target branch)
   avg     46.8    379.0    190.3    153.3    160.4    708.8    124.4    226.5    180.3      3.1    2173.1    1417.5+/- 394.8     573.2   explicit[s=4] (this PR)

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf
Copy link
Member

Looks good, thanks.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 26, 2025

Now that cms-sw#47154 is merged and given that I have followed up on the comments of this PR and the tests have passed, I am rebasing and making the PR to cms-sw.

@VourMa VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from 42dd567 to d585909 Compare January 26, 2025 12:21
@VourMa VourMa marked this pull request as ready for review January 26, 2025 12:21
@VourMa
Copy link
Collaborator Author

VourMa commented Jan 26, 2025

/run all

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 26, 2025

This is all because of cms-sw#47119, so LST cannot run in master, unless we update (but then we won't be able to rely on a pre-release for running - the next pre-release is due on February 4th, so quite a few days away). We need to discuss how to proceed.

@slava77
Copy link

slava77 commented Jan 28, 2025

This is all because of cms-sw#47119, so LST cannot run in master, unless we update (but then we won't be able to rely on a pre-release for running - the next pre-release is due on February 4th, so quite a few days away). We need to discuss how to proceed.

I think that as we did once, we should move the CI to a recent IB, the alpaka version update entered in CMSSW_15_0_X_2025-01-21-2300. Our master now is from Jan 24.
IIUC, the CI starts with setting up 15_0_0_pre2; so, we depend on its alpaka version in the CI.

@slava77
Copy link

slava77 commented Jan 28, 2025

@ariostas
it may be useful to have a way to pass the CMSSW version to the /run command
This would imply that the version is at least as recent as a known default release
Do we still depend in the setup.sh or similar on explicitly decoded paths of needed externals?

@ariostas
Copy link
Member

it may be useful to have a way to pass the CMSSW version to the /run command
This would imply that the version is at least as recent as a known default release
Do we still depend in the setup.sh or similar on explicitly decoded paths of needed externals?

Currently it uses the latest release (bypassing whatever is in setup.sh). I'll changing it to use the latest IB release, since those get updated pretty often. That should solve things, right?

@ariostas
Copy link
Member

Should work now

/run all

@slava77
Copy link

slava77 commented Jan 29, 2025

Currently it uses the latest release (bypassing whatever is in setup.sh). I'll changing it to use the latest IB release, since those get updated pretty often. That should solve things, right?

Yes, selecting the latest should work most of the time.
Occasionally there is a broken IB, but that's usually solved in one cycle (12hrs or less)

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

@slava77
Copy link

slava77 commented Jan 29, 2025

There was a problem while building and running in standalone mode. The logs can be found here.

apparently we are missing the complexity of patch build IBs; some paths needs updating

@slava77
Copy link

slava77 commented Jan 29, 2025

apparently we are missing the complexity of patch build IBs; some paths needs updating

CMSSW_FULL_RELEASE_BASE should be added so that it's checked after CMSSW_RELEASE_BASE

@ariostas
Copy link
Member

Thank you, Slava. To play it safe I'll just restrict to non-patch releases and see how it goes. If we see a need to include patch releases I'll do it.

/run standalone

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.8    397.6    190.7    153.4    166.2    745.4    123.1    233.1    176.7      3.8    2236.7    1444.6+/- 401.3     589.6   explicit[s=4] (target branch)
   avg     47.3    378.7    189.8    152.3    165.5    709.6    124.4    227.8    178.4      3.2    2176.8    1419.9+/- 395.6     574.5   explicit[s=4] (this PR)

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 29, 2025

Thanks for all the updates, @ariostas! I will fix the conflict, rerun the tests and make the PR to cms-sw.

@VourMa VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from d585909 to 717caf8 Compare January 29, 2025 17:56
@VourMa
Copy link
Collaborator Author

VourMa commented Jan 29, 2025

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     49.3    396.1    191.8    154.6    169.7    695.6    131.0    244.8    177.5      3.5    2213.9    1469.0+/- 405.7     585.2   explicit[s=4] (target branch)
   avg     47.3    381.5    191.1    155.9    156.3    699.8    131.8    254.5    179.7      3.1    2201.0    1453.9+/- 402.5     582.1   explicit[s=4] (this PR)

Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 29, 2025

There was a problem while building and running with CMSSW. The logs can be found here.

The error seems to be:

ModuleNotFoundError: No module named 'HLTrigger.Configuration.HLTrigger_EventContent_cff'

Glitch? Because I can see the file there in the master:
image

@ariostas
Copy link
Member

Glitch? Because I can see the file there in the master

I'm not sure. #148 failed in the same way

@GNiendorf
Copy link
Member

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

@ariostas
Copy link
Member

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

The release already includes that file, so it shouldn't be necessary. I'll look into it

@slava77
Copy link

slava77 commented Jan 29, 2025

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

The release already includes that file, so it shouldn't be necessary. I'll look into it

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.
Oddly enough, the cmssw test worked before the update to 23-1100 in #146 (comment) as seen in the job outputs

@slava77
Copy link

slava77 commented Jan 29, 2025

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.

actually, after I followed more literally, it indeed breaks

@slava77
Copy link

slava77 commented Jan 29, 2025

/run cmssw

@slava77
Copy link

slava77 commented Jan 29, 2025

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.

actually, after I followed more literally, it indeed breaks

it looks like something is glitchy

Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@slava77
Copy link

slava77 commented Jan 29, 2025

I think the problem starts at least as early as with

------- copying files from src/HLTrigger/Configuration/scripts -------

@GNiendorf
Copy link
Member

GNiendorf commented Jan 29, 2025

I think the problem starts at least as early as with

------- copying files from src/HLTrigger/Configuration/scripts -------

Are you saying this because it's empty (ie there's no files copied below that)? I am able to reproduce this locally. If I don't include git cms-addpkg HLTrigger/Configuration then it is empty. If I do include it then I get the following:

------- copying files from src/HLTrigger/Configuration/scripts -------
>> copied edmPluginCoverage
>> copied hltCheckPrescaleModules
>> copied hltCompareResults
>> copied hltConfigFromDB
>> copied hltDumpStream
>> copied hltFindDuplicates
>> copied hltGetConfiguration
>> copied hltIntegrationTests
>> copied hltListPaths
>> copied hltPhase2UpgradeIntegrationTests
>> copied hltPrintMenuVersions

Oh but then it just goes to the next missing one:

  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02874/el9_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-28-2300/src/L1Trigger/L1TCalorimeter/python/simDigis_cff.py", line 43, in <module>
    from L1Trigger.L1TCaloLayer1.simCaloStage2Layer1Digis_cfi import simCaloStage2Layer1Digis
ModuleNotFoundError: No module named 'L1Trigger.L1TCaloLayer1.simCaloStage2Layer1Digis_cfi'

@slava77
Copy link

slava77 commented Jan 29, 2025

Oh but then it just goes to the next missing one:

I have a fix proposed in SegmentLinking/TrackLooper-actions#19

@slava77
Copy link

slava77 commented Jan 29, 2025

Oh but then it just goes to the next missing one:

I have a fix proposed in SegmentLinking/TrackLooper-actions#19

I'm not sure how our setup worked before; the apparent symptoms are that a bunch of directories with only .gitignore file (I counted 17 packages) are pulled to the current directory: scram thinks that these are cleaned up packages and diligently doesn't allow to pick up files available in the base release.

@ariostas
Copy link
Member

Thank you, Slava! Let's see if it works now.

/run cmssw

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@VourMa
Copy link
Collaborator Author

VourMa commented Jan 30, 2025

Thank you all for making it work! Submitting this to cms-sw now.

@@ -162,7 +162,7 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE::lst {
float eta2 = __H2F(quintuplets.eta()[jx]);
float phi2 = __H2F(quintuplets.phi()[jx]);
float dEta = alpaka::math::abs(acc, eta1 - eta2);
float dPhi = calculate_dPhi(phi1, phi2);
float dPhi = reco::deltaPhi(phi1, phi2);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VourMa
remind me please the rationale of using reco::deltaPhi(T phi1, T phi2) vs cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closely following what was there before. That said, I am not sure why we ended up with calculate_dPhi(phi1, phi2) and not something alpaka related.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm, but isn't the implementation different (the old was a single 2π shift vs new using somewhat pseudo-constexpr reducedRange),
unless by "closely" you mean "it didn't have acc" before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About following closely, it is the latter.
About your first point, my understanding is that using reduceRange is safer, as it accounts for multiple "wraparounds" of φ, but let me know if I didn't get it right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd aim to use the same code that defines dphi from x,y "directly" and after computing phi. My concern is that the computation path x,y -> phi followed by a call to dPhi for a pair of items is different from computing it directly from a pair of two x,ys in the same parts of the code base. ... also I'm still bothered by pseudo-constexpr in the reco case as was discussed with Matti.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, my bad, I thought it was pT5s we were looking at.

Do you want to change this? If so, do you want to do it in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; if we have to cache phi, as in this case of T5s or MDs, they should be cached. I mean, eventually, that cms::alpakatools::deltaPhi should be preferred, especially if the inputs are likely also computed with cms::alpakatools:: or accelerator-dependent code

Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to change this? If so, do you want to do it in this PR?

it's better in this PR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?

I stayed away from bringing this up in the cms-sw PR due to the review availability issues for heterogeneous, where weeks long delays are typical

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I will force push the reco::deltaPhi(T phi1, T phi2) -> cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2) change a bit later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants