Dynamic Memory Limits for LST Objects #148

GNiendorf · 2025-01-27T16:19:25Z

Dynamic Memory Limits

I compute a dynamic max for each module that is the max possible number of objects that can be created given the number of lower level objects (so for example the number of possible T5's for a given module given the total number of T3's in modules where two T3's can be connected). This upper bound provides a large reduction in the occupancies. This is also accomplished with no new truncations, since this dynamic max represents the maximum number of possible objects that can be created.

For the segments, I had to rearrange the addPixelSegmentToEvent() function so that the segments container is instantiated after the MDs container is filled. Right now the segments object is instantiated during the pLS loading, which is done before the MDs container is filled.

Object Size Reduction

Reduction in object sizes (total occupancy basically) at current max occupancies (averaged over 10 events, 0.8 GeV):

682,404 -> 303,891 (55% decrease) for MD's.
1,930,994->857,880 (56% decrease) for Segments.
1,980,235 -> 737,502 (63% decrease) for T3's.
502,053 -> 212,953 (58% decrease) for T5's.

Total Memory Decrease

For single stream, 100 events at current occupancies:

1695MiB -> 1275MiB (25% decrease, 0.8 GeV)

GNiendorf · 2025-01-28T20:10:46Z

Performance plots look unchanged, and negligible changes in timing.

slava77 · 2025-01-28T23:29:12Z

/run all

github-actions · 2025-01-28T23:35:21Z

There was a problem while building and running with CMSSW. The logs can be found here.

github-actions · 2025-01-28T23:36:19Z

There was a problem while building and running in standalone mode. The logs can be found here.

GNiendorf · 2025-01-28T23:36:48Z

Should I rebase to master?

slava77 · 2025-01-28T23:59:41Z

Should I rebase to master?

I don't think it will help by itself; the CI is not pulling in the right version of alpaka.
... I guess if the target branch is not master but instead something older than Jan 21, the tests may work

GNiendorf · 2025-01-29T01:51:28Z

/run all

github-actions · 2025-01-29T01:56:09Z

There was a problem while building and running with CMSSW. The logs can be found here.

github-actions · 2025-01-29T02:06:39Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.5    403.0    189.1    150.9    147.1    552.7    123.4    236.1    152.7      3.8    2005.4    1406.1+/- 384.0     528.4   explicit[s=4] (target branch)
   avg     31.3    390.5    183.2    142.2    138.4    551.8    121.3    232.4    151.2      2.2    1944.5    1361.4+/- 387.2     518.3   explicit[s=4] (this PR)

slava77 · 2025-01-29T13:43:28Z

/run all

this almost worked with switching to some_branch; the CMSSW part failed due to a missing HeterogeneousCore/AlpakaMath I thought that addpkg was not throwing.
A better guess in this context is to use CMSSW_14_2_0_pre4_workflowsAndGeneralFunctions_squashed to run the tests ... although that's a Dec 4 master and dynamic_occ is from Dec 16 and can pull in some extra stuff.
Still, it's a hack and it's better to get the CI fixed.

github-actions · 2025-01-29T13:51:54Z

There was a problem while building and running with CMSSW. The logs can be found here.

github-actions · 2025-01-29T13:58:03Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.3    396.6    188.5    154.4    148.1    547.7    124.5    238.0    152.0      3.4    1999.6    1405.5+/- 392.0     531.5   explicit[s=4] (target branch)
   avg     32.0    389.3    184.2    143.7    137.9    555.4    123.3    230.8    150.7      2.0    1949.4    1361.9+/- 387.5     520.0   explicit[s=4] (this PR)

ariostas · 2025-01-29T14:14:16Z

I changed the target branch because I thought that the CI should work now that it uses the latest IB release, but I just noticed that it failed for another PR. I'll look into it.

ariostas · 2025-01-29T15:03:46Z

/run cmssw

github-actions · 2025-01-29T15:09:42Z

There was a problem while building and running with CMSSW. The logs can be found here.

ariostas · 2025-01-29T15:17:05Z

I'll change the CI so that it does the merge before checking out the packages

ariostas · 2025-01-29T16:11:03Z

There's going to have to be some (hopefully minor) rebasing now that my PR was merged into CMSSW. I'll update our master to see if there are conflicts.

slava77 · 2025-01-29T16:22:44Z

There's going to have to be some (hopefully minor) rebasing now that my PR was merged into CMSSW. I'll update our master to see if there are conflicts.

5 files with conflicts now

github-actions · 2025-01-29T16:23:10Z

There was a problem while building and running with CMSSW. The logs can be found here.

GNiendorf · 2025-01-29T16:27:00Z

Should I just rebase to master and fix the conflicts? Seems like the CMSSW run failed for reasons unrelated to this PR.

GNiendorf · 2025-01-29T16:32:56Z

Just squashed the commits, will rebase now to fix the conflicts if that is fine.

slava77 · 2025-01-29T16:47:09Z

Should I just rebase to master and fix the conflicts? Seems like the CMSSW run failed for reasons unrelated to this PR.

I'd rebase (to master or to SegmentLinking:ariostas/integration_pr_followups should work)

github-actions · 2025-01-29T18:04:16Z

There was a problem while building and running with CMSSW. The logs can be found here.

GNiendorf · 2025-01-30T03:37:14Z

/run cmssw

github-actions · 2025-01-30T05:04:59Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-01-30T13:08:51Z

@slava77 Do you have any comments for this PR, or should I submit it to CMSSW?

slava77 · 2025-01-30T14:06:59Z

@slava77 Do you have any comments for this PR, or should I submit it to CMSSW?

I missed the part when PixelSegmentSoA was added and will need some time to check the code.
I'm not sure I follow the comment in the PR description It looks like the process of moving the segments over to this method resulted in a net increase in memory usage, while above that the segments are stated to decrease.
Please review the description to cover the above (and be self-consistent)

what's the plan with #147 ? I thought that you wanted that more or less together with this.
Does this PR interfere with changes in #146 ?

ariostas · 2025-01-30T15:20:11Z

I went through the changes and they look good to me

GNiendorf · 2025-01-30T16:04:49Z

I missed the part when PixelSegmentSoA was added and will need some time to check the code.

I'm not sure I follow the comment in the PR description It looks like the process of moving the segments over to this method resulted in a net increase in memory usage, while above that the segments are stated to decrease.

I moved some arrays (the pLS hit indices and their delta phi) onto the GPU in the pixelSegments SoA but corrected this with a later update. I removed the edit from the PR description and just list the current total memory savings for 0.8 GeV at 100 events to be self-consistent.

what's the plan with #147 ? I thought that you wanted that more or less together with this. Does this PR interfere with changes in #146 ?

I think it makes sense to merge this PR separately from #147. I think it's still unclear what the occupancies should be set to, and that will involve its own efficiency and memory changes. Therefore I think it's better to merge this in first before addressing the occupancy tuning in that PR.

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h

RecoTracker/LSTCore/src/alpaka/Segment.h

RecoTracker/LSTCore/src/alpaka/LSTEvent.h

slava77

this will also likely need a rebase very soon, once the upstream variant of #146 is merged

RecoTracker/LSTCore/src/alpaka/Quintuplet.h

RecoTracker/LSTCore/src/alpaka/Segment.h

RecoTracker/LSTCore/src/alpaka/Triplet.h

slava77 · 2025-01-30T20:08:36Z

For the segments, I had to rearrange the createPixelSegmentToEvent() function so that the segments object is created after the mini-doublets are created. Right now in the code the segments are created during the pLS loading, which is done before the MDs are created.

mostly for the purpose of the CMSSW PR, to have more clarity perhaps instead of "MDs are created" say "the MD container is instantiated". Our create* methods in most cases also run the algorithms and populate relevant containers.

RecoTracker/LSTCore/src/alpaka/Kernels.h

GNiendorf · 2025-01-31T15:18:49Z

/run all

github-actions · 2025-01-31T15:36:15Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     49.1    382.4    193.8    160.3    158.9    699.0    132.8    253.4    177.7      3.8    2211.0    1463.0+/- 402.1     584.0   explicit[s=4] (target branch)
   avg     33.4    382.9    189.2    154.4    165.2    703.8    130.9    245.8    175.7      1.9    2183.2    1446.0+/- 403.3     576.1   explicit[s=4] (this PR)

slava77 · 2025-01-31T16:05:51Z

@GNiendorf
please run timing tests on GPU. The host-side per-object loops may show some significant slowdown

github-actions · 2025-01-31T16:55:08Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

ariostas · 2025-01-31T18:19:37Z

Working with IB releases definitely makes building the standalone part a bit more confusing. If we keep having to use IB builds we might need to tweak the setup.sh script.

Here's a timing comparison on cgpu-1.

This PR (2967c14)
Total Timing Summary
Average time for map loading = 761.337 ms
Average time for input loading = 8295.82 ms
Average time for lst::Event creation = 0.00356492 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      1.4      0.5      0.4      1.6      1.3      0.5      0.9      0.6      1.1      0.0       8.2       6.2+/-  1.4       9.3   explicit[s=1]
   avg      1.3      0.7      0.6      1.9      1.5      0.6      1.2      0.7      1.4      0.0       9.9       8.0+/-  1.9       6.1   explicit[s=2]
   avg      3.1      1.1      1.0      2.4      2.0      1.1      2.2      1.3      2.2      0.0      16.5      12.3+/-  3.8       5.8   explicit[s=4]
   avg      3.6      1.9      1.7      3.6      3.3      1.5      3.3      1.5      3.7      0.0      24.2      19.1+/-  5.8       4.6   explicit[s=6]
   avg      5.5      2.7      2.5      4.8      4.3      2.0      4.2      2.0      4.8      0.0      32.8      25.3+/-  6.7       4.5   explicit[s=8]

master (7905921)
Total Timing Summary
Average time for map loading = 574.156 ms
Average time for input loading = 7868.15 ms
Average time for lst::Event creation = 0.00286102 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      2.7      0.4      0.3      1.5      1.2      0.4      1.3      1.0      1.1      0.0      10.1       7.1+/-  1.4      11.1   explicit[s=1]
   avg      1.6      0.6      0.6      1.8      1.4      0.4      1.2      0.7      1.5      0.0       9.9       7.8+/-  1.6       6.1   explicit[s=2]
   avg      3.1      1.0      0.8      2.7      2.2      0.6      2.2      1.1      2.7      0.0      16.4      12.7+/-  2.4       4.7   explicit[s=4]
   avg      6.0      1.5      1.1      3.9      3.2      0.7      3.4      1.8      3.9      0.0      25.5      18.8+/-  4.7       4.7   explicit[s=6]
   avg      9.1      2.1      1.4      4.8      4.2      0.8      4.2      2.2      4.9      0.0      33.7      23.8+/-  4.9       4.6   explicit[s=8]

GNiendorf · 2025-01-31T18:24:31Z

Thank you @ariostas!

slava77 · 2025-01-31T18:25:32Z

Here's a timing comparison on cgpu-1.

thank you; for the single-stream perhaps I should ignore the hits column

GNiendorf mentioned this pull request Jan 27, 2025

Updated Occupancies #147

Open

GNiendorf force-pushed the dynamic_occ branch from fb65216 to 1faa5f8 Compare January 27, 2025 21:00

GNiendorf marked this pull request as ready for review January 28, 2025 19:47

GNiendorf requested a review from slava77 January 28, 2025 19:47

GNiendorf changed the base branch from master to some_branch January 29, 2025 01:51

ariostas changed the base branch from some_branch to master January 29, 2025 14:09

GNiendorf force-pushed the dynamic_occ branch from a3fb77d to aaa5e63 Compare January 29, 2025 16:32

GNiendorf force-pushed the dynamic_occ branch from aaa5e63 to 71b7485 Compare January 29, 2025 17:14

ariostas mentioned this pull request Jan 29, 2025

Use central phi functions instead LST ones #146

Merged

slava77 reviewed Jan 30, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h Outdated Show resolved Hide resolved

GNiendorf changed the title ~~Dynamic Memory Allocation~~ Dynamic Memory Limits for LST Objects Jan 30, 2025

slava77 reviewed Jan 30, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/Segment.h Show resolved Hide resolved

slava77 reviewed Jan 30, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/LSTEvent.h Outdated Show resolved Hide resolved

slava77 reviewed Jan 30, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/LSTEvent.h Outdated Show resolved Hide resolved

slava77 requested changes Jan 30, 2025

View reviewed changes

slava77 reviewed Jan 31, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/Kernels.h Outdated Show resolved Hide resolved

GNiendorf force-pushed the dynamic_occ branch 3 times, most recently from d9ef1f9 to c7b126c Compare January 31, 2025 15:14

Dynamic Memory Allocation for LST Objects

2967c14

GNiendorf force-pushed the dynamic_occ branch from c7b126c to 2967c14 Compare January 31, 2025 15:18

SegmentLinking deleted a comment from github-actions bot Jan 31, 2025

slava77 approved these changes Jan 31, 2025

View reviewed changes

Dynamic Memory Limits for LST Objects #148

Are you sure you want to change the base?

Dynamic Memory Limits for LST Objects #148

Conversation

GNiendorf commented Jan 27, 2025 • edited Loading