Remove device initialization from library #385

ariostas · 2024-04-04T20:33:27Z

Currently, a device and queue are initialized within the library as global variables. This can cause various issues.

Since the device is initialized as a global variable, it is done every time the library is loaded. This is what's currently causing edmWriteConfigs to crash when compiling the LST package for ROCm since the machines we use don't have any AMD cards.
The queue part was mostly fine, but we were still creating a new queue that was not assigned to us by CMSSW. It's probably better to stick with the queue that they give us.

As far as I understand, these two things could result in using a higher (or different) computational resources than what CMSSW assigned for the job, so it can end up being killed.

This PR is still a bit rough, and I haven't worked on the CMSSW side. I'll give a better description of the changes once it's ready.

slava77 · 2024-04-04T20:57:40Z

SDL/EndcapGeometry.cc

 }

-void SDL::EndcapGeometry<SDL::Dev>::load(std::string filename) {
+template <typename TQueue>
+void SDL::EndcapGeometry<SDL::Dev>::load(TQueue& queue, std::string filename) {


does this need to be a template? can SDL::QueueAcc be used instead?

Yeah, I was also thinking about this. I think now we don't really need any templates since we could just use SDL::Dev, SDL::DevAcc and SDL::QueueAcc. So I was deciding whether to sticking with templated functions like in most of the code.

slava77 · 2024-04-04T21:12:23Z

SDL/LST.cc

@@ -74,7 +76,8 @@ void SDL::LST<SDL::Acc>::loadAndFillES(alpaka::QueueCpuBlocking& queue, struct m
                           pLStoLayer);
 }

-void SDL::LST<SDL::Acc>::run(SDL::QueueAcc& queue,
+void SDL::LST<SDL::Acc>::run(SDL::Dev& devAccIn,
+                             SDL::QueueAcc& queue,


does it make sense to have a queue and a device separately? Isn't a queue uniquely attached to a device?

That's a good point. I'll tidy things up.

ariostas · 2024-04-22T17:42:20Z

This should be working now, but I'll check if I there are things to clean up.

/run standalone
/run cmssw 21

github-actions · 2024-04-22T17:57:35Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

github-actions · 2024-04-22T18:57:06Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

slava77 · 2024-04-23T12:44:56Z

@ariostas
what else remains to be done to make this non-draft?

ariostas · 2024-04-23T15:21:11Z

This is ready for review. I'll elaborate a bit more regarding the changes.

As the PR title suggests, the devices (both host and accelerator) are no longer initialized in the library and are no longer global variables. Instead, accelerator queues are passed as arguments both in standalone and CMSSW. When the host device is needed, it now uses a CMSSW function to get it.

Apart from that, I also made some changes so that we use types defined in CMSSW instead of defining our own types. They mostly match, but SDL::Idx changed from size_t to uint32_t, which causes a bunch of annoyances. Lines like alpaka::memcpy(queue, dst, src, 1); failed to compile since the size now explicitly needs to be unsigned, so it would have to be replaced by 1u or simply omitted. So I took this as a chance to do a bit of cleanup and remove the size argument when both buffers have the same size and it is obvious what it should be from the context. This make the code a bit shorter and less prone to errors if at some point some size is changed.

slava77 · 2024-04-23T17:07:34Z

/run cmssw 21

I think the last commit in 21 was after the last test was triggered

github-actions · 2024-04-23T18:16:08Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

ariostas added 2 commits April 4, 2024 12:13

Removed device initialization from library

0ca32be

Standalone now constructs device and queue outside library

fb69c00

slava77 reviewed Apr 4, 2024

View reviewed changes

ariostas added 8 commits April 5, 2024 09:02

A bit of cleanup

87664ad

Merge branch 'master' into move_device_initialization

0cac5cd

Removed global host device

a7aabe1

Switched to using types defined in cmssw

74fbc80

Format code

c5d4d8c

Get device from queue

fe64b69

Minor makefile fix

4a229a0

Merge branch 'master' into move_device_initialization

e059a86

ariostas mentioned this pull request Apr 22, 2024

Remove device initialization from library (CMSSW side) SegmentLinking/cmssw#21

Merged

Fix loadAndFillES inputs

41d85f6

ariostas marked this pull request as ready for review April 23, 2024 15:21

slava77 approved these changes Apr 23, 2024

View reviewed changes

slava77 merged commit 0d5fafa into master Apr 23, 2024
3 checks passed

ariostas deleted the move_device_initialization branch April 30, 2024 13:42

ariostas mentioned this pull request May 3, 2024

LSTCore development SegmentLinking/cmssw#23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove device initialization from library #385

Remove device initialization from library #385

ariostas commented Apr 4, 2024

slava77 Apr 4, 2024

ariostas Apr 5, 2024 •

edited

Loading

slava77 Apr 4, 2024

ariostas Apr 5, 2024

ariostas commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

slava77 commented Apr 23, 2024

ariostas commented Apr 23, 2024

slava77 commented Apr 23, 2024

github-actions bot commented Apr 23, 2024

Remove device initialization from library #385

Remove device initialization from library #385

Conversation

ariostas commented Apr 4, 2024

slava77 Apr 4, 2024

Choose a reason for hiding this comment

ariostas Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

slava77 Apr 4, 2024

Choose a reason for hiding this comment

ariostas Apr 5, 2024

Choose a reason for hiding this comment

ariostas commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

github-actions bot commented Apr 22, 2024

slava77 commented Apr 23, 2024

ariostas commented Apr 23, 2024

slava77 commented Apr 23, 2024

github-actions bot commented Apr 23, 2024

ariostas Apr 5, 2024 •

edited

Loading