Use central phi functions instead LST ones #146

VourMa · 2025-01-20T23:04:57Z

This is the follow up for #142. I have created it as a draft PR, as, to be submitted to cms-sw, we would need to first merge #145 (for proper naming of the reducePhiRange functions) and #141 (so that we can have meaningful tests). Having said that, I think we can start discussing and testing this internally.

VourMa · 2025-01-20T23:05:15Z

/run all

github-actions · 2025-01-20T23:11:12Z

There was a problem while building and running in standalone mode. The logs can be found here.

github-actions · 2025-01-20T23:22:58Z

There was a problem while building and running with CMSSW. The logs can be found here.

VourMa · 2025-01-21T08:51:38Z

@ariostas Sorry, I forgot, did we move to 15_0? Could you tell me the exact version, so that I can write it on our repo, and then update this PR appropriately?

ariostas · 2025-01-21T14:28:59Z

@VourMa I set it up so that now it always uses the latest release, so it's using 15_0_0_pre2. For some reason AlpakaMath didn't make it into that release, so I'll just make the CI check out that package so that this PR works

VourMa · 2025-01-21T14:32:22Z

@VourMa I set it up so that now it always uses the latest release, so it's using 15_0_0_pre2. For some reason AlpakaMath didn't make it into that release, so I'll just make the CI check out that package so that this PR works

Oops, I didn't think of that. Feel free to just run the tests when you sort it out. Thank you for taking care of it!

ariostas · 2025-01-21T14:33:55Z

It should work now.

/run all

github-actions · 2025-01-21T14:55:35Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.9    399.0    187.8    151.7    146.5    548.9    122.8    233.5    153.8      3.1    1992.9    1398.2+/- 387.5     529.6   explicit[s=4] (target branch)
   avg     48.5    377.1    188.7    151.7    166.9    702.5    122.4    226.7    177.8      3.5    2166.0    1414.9+/- 395.6     574.0   explicit[s=4] (this PR)

github-actions · 2025-01-21T16:48:27Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-01-21T16:59:33Z

cmssw/RecoTracker/LSTCore/src/alpaka/NeuralNetwork.h

Lines 41 to 48 in eec20cb

    
           ALPAKA_FN_ACC ALPAKA_FN_INLINE float delta_phi(const float phi1, const float phi2) { 
        
             float delta = phi1 - phi2; 
        
             // Adjust delta to be within the range [-M_PI, M_PI] 
        
             if (delta > kPi) { 
        
               delta -= 2 * kPi; 
        
             } else if (delta < -kPi) { 
        
               delta += 2 * kPi; 
        
             }

Can you remove this one in the inference code I added as well? Should probably check that it doesn't affect the performance plots. I tried using another implementation of the delta phi function and it gave some weird results, not sure if it was just a bug in my old code.

VourMa · 2025-01-21T23:11:17Z

cmssw/RecoTracker/LSTCore/src/alpaka/NeuralNetwork.h

Lines 41 to 48 in eec20cb

ALPAKA_FN_ACC ALPAKA_FN_INLINE float delta_phi(const float phi1, const float phi2) {

float delta = phi1 - phi2;

// Adjust delta to be within the range [-M_PI, M_PI]

if (delta > kPi) {

delta -= 2 * kPi;

} else if (delta < -kPi) {

delta += 2 * kPi;

}

Can you remove this one in the inference code I added as well? Should probably check that it doesn't affect the performance plots. I tried using another implementation of the delta phi function and it gave some weird results, not sure if it was just a bug in my old code.

Sure, I replaced it in the new version I pushed. Let's see if it works out.

VourMa · 2025-01-21T23:11:23Z

/run all

github-actions · 2025-01-21T23:27:03Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.3    396.2    189.6    153.9    149.5    551.2    124.8    235.2    151.9      3.5    2001.9    1404.4+/- 387.5     529.4   explicit[s=4] (target branch)
   avg     46.8    379.0    190.3    153.3    160.4    708.8    124.4    226.5    180.3      3.1    2173.1    1417.5+/- 394.8     573.2   explicit[s=4] (this PR)

github-actions · 2025-01-22T00:49:16Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-01-22T02:06:27Z

Looks good, thanks.

VourMa · 2025-01-26T12:06:34Z

Now that cms-sw#47154 is merged and given that I have followed up on the comments of this PR and the tests have passed, I am rebasing and making the PR to cms-sw.

VourMa · 2025-01-26T12:21:48Z

/run all

github-actions · 2025-01-26T12:26:27Z

There was a problem while building and running in standalone mode. The logs can be found here.

github-actions · 2025-01-26T12:31:05Z

There was a problem while building and running with CMSSW. The logs can be found here.

VourMa · 2025-01-26T12:47:47Z

This is all because of cms-sw#47119, so LST cannot run in master, unless we update (but then we won't be able to rely on a pre-release for running - the next pre-release is due on February 4th, so quite a few days away). We need to discuss how to proceed.

slava77 · 2025-01-28T23:40:13Z

This is all because of cms-sw#47119, so LST cannot run in master, unless we update (but then we won't be able to rely on a pre-release for running - the next pre-release is due on February 4th, so quite a few days away). We need to discuss how to proceed.

I think that as we did once, we should move the CI to a recent IB, the alpaka version update entered in CMSSW_15_0_X_2025-01-21-2300. Our master now is from Jan 24.
IIUC, the CI starts with setting up 15_0_0_pre2; so, we depend on its alpaka version in the CI.

slava77 · 2025-01-28T23:43:58Z

@ariostas
it may be useful to have a way to pass the CMSSW version to the /run command
This would imply that the version is at least as recent as a known default release
Do we still depend in the setup.sh or similar on explicitly decoded paths of needed externals?

ariostas · 2025-01-29T13:50:59Z

it may be useful to have a way to pass the CMSSW version to the /run command
This would imply that the version is at least as recent as a known default release
Do we still depend in the setup.sh or similar on explicitly decoded paths of needed externals?

Currently it uses the latest release (bypassing whatever is in setup.sh). I'll changing it to use the latest IB release, since those get updated pretty often. That should solve things, right?

ariostas · 2025-01-29T13:57:27Z

Should work now

/run all

slava77 · 2025-01-29T13:59:40Z

Currently it uses the latest release (bypassing whatever is in setup.sh). I'll changing it to use the latest IB release, since those get updated pretty often. That should solve things, right?

Yes, selecting the latest should work most of the time.
Occasionally there is a broken IB, but that's usually solved in one cycle (12hrs or less)

github-actions · 2025-01-29T14:04:10Z

There was a problem while building and running in standalone mode. The logs can be found here.

slava77 · 2025-01-29T14:15:02Z

There was a problem while building and running in standalone mode. The logs can be found here.

apparently we are missing the complexity of patch build IBs; some paths needs updating

slava77 · 2025-01-29T14:18:44Z

apparently we are missing the complexity of patch build IBs; some paths needs updating

CMSSW_FULL_RELEASE_BASE should be added so that it's checked after CMSSW_RELEASE_BASE

ariostas · 2025-01-29T15:06:46Z

Thank you, Slava. To play it safe I'll just restrict to non-patch releases and see how it goes. If we see a need to include patch releases I'll do it.

/run standalone

github-actions · 2025-01-29T15:21:44Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.8    397.6    190.7    153.4    166.2    745.4    123.1    233.1    176.7      3.8    2236.7    1444.6+/- 401.3     589.6   explicit[s=4] (target branch)
   avg     47.3    378.7    189.8    152.3    165.5    709.6    124.4    227.8    178.4      3.2    2176.8    1419.9+/- 395.6     574.5   explicit[s=4] (this PR)

github-actions · 2025-01-29T15:29:49Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

VourMa · 2025-01-29T17:40:38Z

Thanks for all the updates, @ariostas! I will fix the conflict, rerun the tests and make the PR to cms-sw.

VourMa · 2025-01-29T17:56:44Z

/run all

github-actions · 2025-01-29T18:15:27Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     49.3    396.1    191.8    154.6    169.7    695.6    131.0    244.8    177.5      3.5    2213.9    1469.0+/- 405.7     585.2   explicit[s=4] (target branch)
   avg     47.3    381.5    191.1    155.9    156.3    699.8    131.8    254.5    179.7      3.1    2201.0    1453.9+/- 402.5     582.1   explicit[s=4] (this PR)

github-actions · 2025-01-29T18:23:27Z

There was a problem while building and running with CMSSW. The logs can be found here.

VourMa · 2025-01-29T18:28:31Z

There was a problem while building and running with CMSSW. The logs can be found here.

The error seems to be:

ModuleNotFoundError: No module named 'HLTrigger.Configuration.HLTrigger_EventContent_cff'

Glitch? Because I can see the file there in the master:

ariostas · 2025-01-29T18:42:55Z

Glitch? Because I can see the file there in the master

I'm not sure. #148 failed in the same way

GNiendorf · 2025-01-29T18:46:05Z

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

ariostas · 2025-01-29T19:00:02Z

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

The release already includes that file, so it shouldn't be necessary. I'll look into it

slava77 · 2025-01-29T19:43:11Z

Do you have to do git cms-addpkg HLTrigger/Configuration? Similar to how you told me in my PR to get it to run locally?

The release already includes that file, so it shouldn't be necessary. I'll look into it

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.
Oddly enough, the cmssw test worked before the update to 23-1100 in #146 (comment) as seen in the job outputs

slava77 · 2025-01-29T19:59:08Z

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.

actually, after I followed more literally, it indeed breaks

slava77 · 2025-01-29T20:38:18Z

/run cmssw

slava77 · 2025-01-29T20:44:50Z

I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK.

actually, after I followed more literally, it indeed breaks

it looks like something is glitchy

github-actions · 2025-01-29T21:10:03Z

There was a problem while building and running with CMSSW. The logs can be found here.

slava77 · 2025-01-29T22:13:07Z

I think the problem starts at least as early as with

------- copying files from src/HLTrigger/Configuration/scripts -------

GNiendorf · 2025-01-29T23:01:24Z

I think the problem starts at least as early as with
------- copying files from src/HLTrigger/Configuration/scripts -------

Are you saying this because it's empty (ie there's no files copied below that)? I am able to reproduce this locally. If I don't include git cms-addpkg HLTrigger/Configuration then it is empty. If I do include it then I get the following:

------- copying files from src/HLTrigger/Configuration/scripts -------
>> copied edmPluginCoverage
>> copied hltCheckPrescaleModules
>> copied hltCompareResults
>> copied hltConfigFromDB
>> copied hltDumpStream
>> copied hltFindDuplicates
>> copied hltGetConfiguration
>> copied hltIntegrationTests
>> copied hltListPaths
>> copied hltPhase2UpgradeIntegrationTests
>> copied hltPrintMenuVersions

Oh but then it just goes to the next missing one:

  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02874/el9_amd64_gcc12/cms/cmssw/CMSSW_15_0_X_2025-01-28-2300/src/L1Trigger/L1TCalorimeter/python/simDigis_cff.py", line 43, in <module>
    from L1Trigger.L1TCaloLayer1.simCaloStage2Layer1Digis_cfi import simCaloStage2Layer1Digis
ModuleNotFoundError: No module named 'L1Trigger.L1TCaloLayer1.simCaloStage2Layer1Digis_cfi'

slava77 · 2025-01-29T23:49:29Z

Oh but then it just goes to the next missing one:

I have a fix proposed in SegmentLinking/TrackLooper-actions#19

slava77 · 2025-01-29T23:53:28Z

Oh but then it just goes to the next missing one:

I have a fix proposed in SegmentLinking/TrackLooper-actions#19

I'm not sure how our setup worked before; the apparent symptoms are that a bunch of directories with only .gitignore file (I counted 17 packages) are pulled to the current directory: scram thinks that these are cleaned up packages and diligently doesn't allow to pick up files available in the base release.

ariostas · 2025-01-30T02:50:58Z

Thank you, Slava! Let's see if it works now.

/run cmssw

github-actions · 2025-01-30T04:37:46Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

VourMa · 2025-01-30T10:54:02Z

Thank you all for making it work! Submitting this to cms-sw now.

slava77 · 2025-01-30T13:34:36Z

RecoTracker/LSTCore/src/alpaka/Kernels.h

@@ -162,7 +162,7 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE::lst {
            float eta2 = __H2F(quintuplets.eta()[jx]);
            float phi2 = __H2F(quintuplets.phi()[jx]);
            float dEta = alpaka::math::abs(acc, eta1 - eta2);
-            float dPhi = calculate_dPhi(phi1, phi2);
+            float dPhi = reco::deltaPhi(phi1, phi2);


@VourMa
remind me please the rationale of using reco::deltaPhi(T phi1, T phi2) vs cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2)

Closely following what was there before. That said, I am not sure why we ended up with calculate_dPhi(phi1, phi2) and not something alpaka related.

uhm, but isn't the implementation different (the old was a single 2π shift vs new using somewhat pseudo-constexpr reducedRange),
unless by "closely" you mean "it didn't have acc" before

About following closely, it is the latter.
About your first point, my understanding is that using reduceRange is safer, as it accounts for multiple "wraparounds" of φ, but let me know if I didn't get it right.

I'd aim to use the same code that defines dphi from x,y "directly" and after computing phi. My concern is that the computation path x,y -> phi followed by a call to dPhi for a pair of items is different from computing it directly from a pair of two x,ys in the same parts of the code base. ... also I'm still bothered by pseudo-constexpr in the reco case as was discussed with Matti.

Ok, my bad, I thought it was pT5s we were looking at.

Do you want to change this? If so, do you want to do it in this PR?

no; if we have to cache phi, as in this case of T5s or MDs, they should be cached. I mean, eventually, that cms::alpakatools::deltaPhi should be preferred, especially if the inputs are likely also computed with cms::alpakatools:: or accelerator-dependent code

Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?

Do you want to change this? If so, do you want to do it in this PR?

it's better in this PR

Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?

I stayed away from bringing this up in the cms-sw PR due to the review availability issues for heterogeneous, where weeks long delays are typical

Ok, then I will force push the reco::deltaPhi(T phi1, T phi2) -> cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2) change a bit later today.

VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from eec20cb to 42dd567 Compare January 21, 2025 23:09

VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from 42dd567 to d585909 Compare January 26, 2025 12:21

VourMa marked this pull request as ready for review January 26, 2025 12:21

Use central phi functions instead LST ones

717caf8

VourMa force-pushed the CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed branch from d585909 to 717caf8 Compare January 29, 2025 17:56

slava77 reviewed Jan 30, 2025

View reviewed changes

slava77 mentioned this pull request Jan 30, 2025

Dynamic Memory Limits for LST Objects #148

Open

slava77 merged commit 59c5d67 into master Jan 31, 2025
3 checks passed

VourMa mentioned this pull request Jan 31, 2025

Switching reco::deltaPhi(T phi1, T phi2) to cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2) #150

Open

Use central phi functions instead LST ones #146

Use central phi functions instead LST ones #146

Conversation

VourMa commented Jan 20, 2025

VourMa commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

VourMa commented Jan 21, 2025

ariostas commented Jan 21, 2025

VourMa commented Jan 21, 2025

ariostas commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

GNiendorf commented Jan 21, 2025

VourMa commented Jan 21, 2025

VourMa commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 22, 2025

GNiendorf commented Jan 22, 2025

VourMa commented Jan 26, 2025

VourMa commented Jan 26, 2025

github-actions bot commented Jan 26, 2025

github-actions bot commented Jan 26, 2025

VourMa commented Jan 26, 2025

slava77 commented Jan 28, 2025

slava77 commented Jan 28, 2025 • edited Loading

ariostas commented Jan 29, 2025

ariostas commented Jan 29, 2025

slava77 commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

slava77 commented Jan 29, 2025

slava77 commented Jan 29, 2025 • edited Loading

ariostas commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

VourMa commented Jan 29, 2025

VourMa commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

VourMa commented Jan 29, 2025

ariostas commented Jan 29, 2025

GNiendorf commented Jan 29, 2025

ariostas commented Jan 29, 2025

slava77 commented Jan 29, 2025

slava77 commented Jan 29, 2025

slava77 commented Jan 29, 2025

slava77 commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

slava77 commented Jan 29, 2025

GNiendorf commented Jan 29, 2025 • edited Loading

slava77 commented Jan 29, 2025

slava77 commented Jan 29, 2025

ariostas commented Jan 30, 2025

github-actions bot commented Jan 30, 2025

VourMa commented Jan 30, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slava77 commented Jan 28, 2025 •

edited

Loading

slava77 commented Jan 29, 2025 •

edited

Loading

GNiendorf commented Jan 29, 2025 •

edited

Loading