Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

GNiendorf · 2023-05-31T19:28:26Z

This PR moves the memory management for the codebase over to Alpaka. It also removes unused memory functions in a few files, and gets rid of the unused PrintUtil files in SDL (along with a couple of unused print functions). Lastly, by including Alpaka in the main makefile it moves some of the required Alpaka statements from Event.cuh to Constants.cuh. See timing and performance plots below. Note that the timing is slower because the Alpaka caching allocator is not yet in place.

GNiendorf · 2023-06-01T20:33:08Z

Timing (Ignore it being slower - this is without the alpaka caching allocator!)

Validation Plots - Here
Comparison to Master (pre-DNN) - Here

GNiendorf · 2023-06-03T16:34:45Z

I fixed a small bug with the last commit where segmentsInGPU was not being deleted when the Event destructor was called at the end of a run. It led to a mismatch in the number of cudaMallocs vs cudaFrees.

GNiendorf · 2023-06-07T23:36:43Z

Putting this back as a Draft PR since moving over all of the files to Alpaka memory shouldn't take long.

Here's the current timing and validation plots:

Validation Plots - Here
Comparison to Master - Here

GNiendorf · 2023-06-27T21:20:04Z

Timing (Now on CGPU-1! So not able to compare to previous timing yet)

GNiendorf · 2023-07-05T16:18:33Z

I got things to work on lnx7188 again. Here are the current timing results (keep in mind the caching allocator is still bugging out so take these timings with a grain of salt for now. Notice how the TC timing goes crazy with the caching allocator because there is a super long cuda stream sync being called for no apparent reason. Also the caching allocator version throws a runtime error during ntuple writing.)

Relevant Master Timing (Only CUDA w/ CUDA Caching Allocator)

This PR, All Alpaka, No Caching Allocator

This PR, All Alpaka, with Alpaka Caching Allocator

Current Alpaka Branch (With Alpaka Kernels but CUDA memory and CUDA caching allocator)

slava77 · 2023-07-05T17:19:07Z

Notice how the TC timing goes crazy with the caching allocator because there is a super long cuda stream sync being called for no apparent reason

do you happen to know when it's called? before or after the TC kernel or related alloc/read/writes

GNiendorf · 2023-07-05T22:08:33Z

Notice how the TC timing goes crazy with the caching allocator because there is a super long cuda stream sync being called for no apparent reason

do you happen to know when it's called? before or after the TC kernel or related alloc/read/writes

It's hanging the removeDupQuintupletsInGPUBeforeTC kernel. I have to look into it more.

GNiendorf · 2023-07-05T22:31:43Z

Fixed it. New timing, looks great!:

Another timing run. This is the first time we can see the full effect of the caching allocator.

GNiendorf · 2023-07-10T17:53:29Z

Ntuple writing runtime error was not fixed by moving to a newer version of the caching allocator unfortunately. Looking into it more, but I may open the PR for review even if I can't find a solution since it only affects ntuple writing with the caching allocator turned on.

SDL/MiniDoublet.cuh

GNiendorf · 2023-07-14T17:02:57Z

Good to go now @VourMa.

Timings below are averages over 10 runs and are shown next to their corresponding standard uncertainties.

Timing - lnx7188

Timing - cgpu1

Validation Plots - Here
Comparison to Relevant Master - Here

GNiendorf · 2023-07-16T16:50:52Z

Turning off the half precision code allowed the compiler to find two unused variables in one of the smaller kernels.

GNiendorf · 2023-07-17T20:34:08Z

After speaking with @VourMa over skype we've agreed to merge this PR first into the Alpaka branch, rebase to the current master, and then do a final review of the Alpaka branch before merging it into the master branch.

VourMa

First set of comments is there - to be continued...

I have a general question: What are the files under code/alpaka_interface supposed to do? Are they helper functions for the Alpaka memory management? If so, could you add a small description of what's their purpose?

VourMa · 2023-07-18T20:21:51Z

setup_cgpu.sh

Please note that this file is obsolete, replaced by setup_ucsd.sh - not sure if this was meant to be fixed in the rebase or it was missed.

Also, has the whole setup been tested in any of the UCSD machines?

Also, I see that the setup_lnx7188.sh has switched to el8. Could the two files be reconciled once again?

VourMa · 2023-07-18T20:33:35Z

cpu/PrintUtil.h

Since part of the Alpaka migration is the cleanup of the cpu code, is this file (and its corresponding implementation) being used anywhere or can they be removed (instead of moved)?

VourMa · 2023-07-18T20:42:17Z

Sorry, wrong PR, copying to the proper one...

GNiendorf added 10 commits May 18, 2023 19:12

remove unused memory functions

cacd983

first working segment memory w/o ntuple

efa60af

temporary fix for ntuple writing

8ef1c92

remove unused print util files

11372df

move remaining buffers to wrapper

b937556

debug cleanup

a68cdb9

more debug removal

0286d43

generalize to host allocations

fbccc37

templated buffer type

9a61b02

working ntuple writing with templated segments

96d3b8b

GNiendorf added 3 commits June 1, 2023 18:04

formatting fixes

7670310

remove extra alpaka flags

955c3f0

move elementsPerThread to Constants.cuh

e4827f6

GNiendorf marked this pull request as ready for review June 2, 2023 15:44

GNiendorf requested a review from VourMa June 2, 2023 15:44

GNiendorf linked an issue Jun 2, 2023 that may be closed by this pull request

Considerations on memory management #287

Closed

3 tasks

GNiendorf mentioned this pull request Jun 2, 2023

Considerations on memory management #287

Closed

3 tasks

fix cuda Malloc/Free bug, formatting fixes

1813b98

GNiendorf added 2 commits June 4, 2023 17:57

first working hits.cu to alpaka memory

d7d466e

Move Hit.cu + Segments.cu to inheritance technique

c2a6046

GNiendorf changed the title ~~Segments.cu Alpaka Memory~~ Segments.cu + Hit.cu Alpaka Memory Jun 7, 2023

GNiendorf marked this pull request as draft June 7, 2023 23:09

GNiendorf changed the title ~~Segments.cu + Hit.cu Alpaka Memory~~ Move to Alpaka Memory Jun 7, 2023

GNiendorf added 2 commits June 7, 2023 19:11

Move objectranges to Alpaka memory

d42ebb0

remove debug

a3f89d1

GNiendorf linked an issue Jun 8, 2023 that may be closed by this pull request

Incorrect and Ambiguous Uses of Cuda Memset #293

Closed

Move triplets to Alpaka memory

b6324bc

GNiendorf changed the title ~~Move to Alpaka Memory~~ Move to Alpaka Memory + Higher Version of Alpaka Jun 27, 2023

GNiendorf changed the title ~~Move to Alpaka Memory + Higher Version of Alpaka~~ Move to Alpaka Memory + Alpaka 0.9 Jun 27, 2023

GNiendorf added 5 commits June 27, 2023 18:56

beginning integration of cmssw alpaka interface/caching allocator

03948d7

bring back caching allocator toggle

8105c8d

full alpaka caching allocator

ad2048b

cleanup

8d907e4

setup for lnx7188

dd3b92a

GNiendorf changed the title ~~Move to Alpaka Memory + Alpaka 0.9~~ Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator Jul 5, 2023

fix caching allocator bug

aaf1e60

move to most recent cmssw alpaka interface + newer alpaka version

54c1103

GNiendorf commented Jul 13, 2023

View reviewed changes

SDL/MiniDoublet.cuh Outdated Show resolved Hide resolved

GNiendorf commented Jul 13, 2023

View reviewed changes

SDL/MiniDoublet.cuh Outdated Show resolved Hide resolved

GNiendorf added 2 commits July 13, 2023 21:33

remove no_host_acc_warnings and cleanup

37d4217

turn off half precision code

37cd75f

GNiendorf marked this pull request as ready for review July 14, 2023 17:02

GNiendorf added 2 commits July 14, 2023 13:21

remove unused wrapper functions

8b966b7

removed unused score variables

9a84b03

group addXtoEventExplicit functions

46b91b8

GNiendorf merged commit e73ab28 into SegmentLinking:alpaka_move Jul 17, 2023

VourMa reviewed Jul 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

GNiendorf commented May 31, 2023 •

edited

Loading

GNiendorf commented Jun 1, 2023 •

edited

Loading

GNiendorf commented Jun 3, 2023

GNiendorf commented Jun 7, 2023

GNiendorf commented Jun 27, 2023

GNiendorf commented Jul 5, 2023 •

edited

Loading

slava77 commented Jul 5, 2023

GNiendorf commented Jul 5, 2023

GNiendorf commented Jul 5, 2023 •

edited

Loading

GNiendorf commented Jul 10, 2023

GNiendorf commented Jul 14, 2023 •

edited

Loading

GNiendorf commented Jul 16, 2023

GNiendorf commented Jul 17, 2023

VourMa left a comment

VourMa Jul 18, 2023

VourMa Jul 18, 2023

VourMa commented Jul 18, 2023

Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

Move to Alpaka Memory + Alpaka 0.9 + CMSSW Alpaka Caching Allocator #292

Conversation

GNiendorf commented May 31, 2023 • edited Loading

GNiendorf commented Jun 1, 2023 • edited Loading

GNiendorf commented Jun 3, 2023

GNiendorf commented Jun 7, 2023

GNiendorf commented Jun 27, 2023

GNiendorf commented Jul 5, 2023 • edited Loading

slava77 commented Jul 5, 2023

GNiendorf commented Jul 5, 2023

GNiendorf commented Jul 5, 2023 • edited Loading

GNiendorf commented Jul 10, 2023

GNiendorf commented Jul 14, 2023 • edited Loading

GNiendorf commented Jul 16, 2023

GNiendorf commented Jul 17, 2023

VourMa left a comment

Choose a reason for hiding this comment

VourMa Jul 18, 2023

Choose a reason for hiding this comment

VourMa Jul 18, 2023

Choose a reason for hiding this comment

VourMa commented Jul 18, 2023

GNiendorf commented May 31, 2023 •

edited

Loading

GNiendorf commented Jun 1, 2023 •

edited

Loading

GNiendorf commented Jul 5, 2023 •

edited

Loading

GNiendorf commented Jul 5, 2023 •

edited

Loading

GNiendorf commented Jul 14, 2023 •

edited

Loading