-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use central phi functions instead LST ones #146
Use central phi functions instead LST ones #146
Conversation
/run all |
There was a problem while building and running in standalone mode. The logs can be found here. |
There was a problem while building and running with CMSSW. The logs can be found here. |
@ariostas Sorry, I forgot, did we move to 15_0? Could you tell me the exact version, so that I can write it on our repo, and then update this PR appropriately? |
@VourMa I set it up so that now it always uses the latest release, so it's using 15_0_0_pre2. For some reason |
Oops, I didn't think of that. Feel free to just run the tests when you sort it out. Thank you for taking care of it! |
It should work now. /run all |
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots. The full set of validation and comparison plots can be found here. Here is a timing comparison:
|
The PR was built and ran successfully with CMSSW. Here are some plots. OOTB All TracksThe full set of validation and comparison plots can be found here. |
cmssw/RecoTracker/LSTCore/src/alpaka/NeuralNetwork.h Lines 41 to 48 in eec20cb
Can you remove this one in the inference code I added as well? Should probably check that it doesn't affect the performance plots. I tried using another implementation of the delta phi function and it gave some weird results, not sure if it was just a bug in my old code. |
eec20cb
to
42dd567
Compare
Sure, I replaced it in the new version I pushed. Let's see if it works out. |
/run all |
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots. The full set of validation and comparison plots can be found here. Here is a timing comparison:
|
The PR was built and ran successfully with CMSSW. Here are some plots. OOTB All TracksThe full set of validation and comparison plots can be found here. |
Looks good, thanks. |
Now that cms-sw#47154 is merged and given that I have followed up on the comments of this PR and the tests have passed, I am rebasing and making the PR to cms-sw. |
42dd567
to
d585909
Compare
/run all |
There was a problem while building and running in standalone mode. The logs can be found here. |
There was a problem while building and running with CMSSW. The logs can be found here. |
This is all because of cms-sw#47119, so LST cannot run in master, unless we update (but then we won't be able to rely on a pre-release for running - the next pre-release is due on February 4th, so quite a few days away). We need to discuss how to proceed. |
I think that as we did once, we should move the CI to a recent IB, the alpaka version update entered in CMSSW_15_0_X_2025-01-21-2300. Our master now is from Jan 24. |
@ariostas |
Currently it uses the latest release (bypassing whatever is in |
Should work now /run all |
Yes, selecting the latest should work most of the time. |
There was a problem while building and running in standalone mode. The logs can be found here. |
apparently we are missing the complexity of patch build IBs; some paths needs updating |
CMSSW_FULL_RELEASE_BASE should be added so that it's checked after CMSSW_RELEASE_BASE |
Thank you, Slava. To play it safe I'll just restrict to non-patch releases and see how it goes. If we see a need to include patch releases I'll do it. /run standalone |
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots. The full set of validation and comparison plots can be found here. Here is a timing comparison:
|
The PR was built and ran successfully with CMSSW. Here are some plots. OOTB All TracksThe full set of validation and comparison plots can be found here. |
Thanks for all the updates, @ariostas! I will fix the conflict, rerun the tests and make the PR to cms-sw. |
d585909
to
717caf8
Compare
/run all |
The PR was built and ran successfully in standalone mode. Here are some of the comparison plots. The full set of validation and comparison plots can be found here. Here is a timing comparison:
|
There was a problem while building and running with CMSSW. The logs can be found here. |
The error seems to be:
|
I'm not sure. #148 failed in the same way |
Do you have to do |
The release already includes that file, so it shouldn't be necessary. I'll look into it |
I tried to follow the steps from run.sh on cgpu-1 and the cmsDriver command ran OK. |
actually, after I followed more literally, it indeed breaks |
/run cmssw |
it looks like something is glitchy |
There was a problem while building and running with CMSSW. The logs can be found here. |
I think the problem starts at least as early as with
|
Are you saying this because it's empty (ie there's no files copied below that)? I am able to reproduce this locally. If I don't include
Oh but then it just goes to the next missing one:
|
I have a fix proposed in SegmentLinking/TrackLooper-actions#19 |
I'm not sure how our setup worked before; the apparent symptoms are that a bunch of directories with only |
Thank you, Slava! Let's see if it works now. /run cmssw |
The PR was built and ran successfully with CMSSW. Here are some plots. OOTB All TracksThe full set of validation and comparison plots can be found here. |
Thank you all for making it work! Submitting this to cms-sw now. |
@@ -162,7 +162,7 @@ namespace ALPAKA_ACCELERATOR_NAMESPACE::lst { | |||
float eta2 = __H2F(quintuplets.eta()[jx]); | |||
float phi2 = __H2F(quintuplets.phi()[jx]); | |||
float dEta = alpaka::math::abs(acc, eta1 - eta2); | |||
float dPhi = calculate_dPhi(phi1, phi2); | |||
float dPhi = reco::deltaPhi(phi1, phi2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VourMa
remind me please the rationale of using reco::deltaPhi(T phi1, T phi2)
vs cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closely following what was there before. That said, I am not sure why we ended up with calculate_dPhi(phi1, phi2)
and not something alpaka
related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm, but isn't the implementation different (the old was a single 2π shift vs new using somewhat pseudo-constexpr reducedRange),
unless by "closely" you mean "it didn't have acc" before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About following closely, it is the latter.
About your first point, my understanding is that using reduceRange
is safer, as it accounts for multiple "wraparounds" of φ, but let me know if I didn't get it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd aim to use the same code that defines dphi from x,y "directly" and after computing phi. My concern is that the computation path x,y -> phi followed by a call to dPhi for a pair of items is different from computing it directly from a pair of two x,y
s in the same parts of the code base. ... also I'm still bothered by pseudo-constexpr in the reco case as was discussed with Matti.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my bad, I thought it was pT5s we were looking at.
Do you want to change this? If so, do you want to do it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no; if we have to cache phi, as in this case of T5s or MDs, they should be cached. I mean, eventually, that
cms::alpakatools::deltaPhi
should be preferred, especially if the inputs are likely also computed withcms::alpakatools::
or accelerator-dependent code
Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to change this? If so, do you want to do it in this PR?
it's better in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Would you like to make a comment in the cms-sw PR to mention that we'd better use the alpaka version, and then I can follow up on that?
I stayed away from bringing this up in the cms-sw PR due to the review availability issues for heterogeneous, where weeks long delays are typical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I will force push the reco::deltaPhi(T phi1, T phi2)
-> cms::alpakatools::deltaPhi(TAcc const& acc, T phi1, T phi2)
change a bit later today.
This is the follow up for #142. I have created it as a draft PR, as, to be submitted to cms-sw, we would need to first merge #145 (for proper naming of the
reducePhiRange
functions) and #141 (so that we can have meaningful tests). Having said that, I think we can start discussing and testing this internally.