Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Remove device initialization from library #385

Merged
merged 11 commits into from
Apr 23, 2024
Merged

Conversation

ariostas
Copy link
Member

@ariostas ariostas commented Apr 4, 2024

Currently, a device and queue are initialized within the library as global variables. This can cause various issues.

  • Since the device is initialized as a global variable, it is done every time the library is loaded. This is what's currently causing edmWriteConfigs to crash when compiling the LST package for ROCm since the machines we use don't have any AMD cards.
  • The queue part was mostly fine, but we were still creating a new queue that was not assigned to us by CMSSW. It's probably better to stick with the queue that they give us.

As far as I understand, these two things could result in using a higher (or different) computational resources than what CMSSW assigned for the job, so it can end up being killed.

This PR is still a bit rough, and I haven't worked on the CMSSW side. I'll give a better description of the changes once it's ready.

}

void SDL::EndcapGeometry<SDL::Dev>::load(std::string filename) {
template <typename TQueue>
void SDL::EndcapGeometry<SDL::Dev>::load(TQueue& queue, std::string filename) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a template? can SDL::QueueAcc be used instead?

Copy link
Member Author

@ariostas ariostas Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also thinking about this. I think now we don't really need any templates since we could just use SDL::Dev, SDL::DevAcc and SDL::QueueAcc. So I was deciding whether to sticking with templated functions like in most of the code.

SDL/LST.cc Outdated
@@ -74,7 +76,8 @@ void SDL::LST<SDL::Acc>::loadAndFillES(alpaka::QueueCpuBlocking& queue, struct m
pLStoLayer);
}

void SDL::LST<SDL::Acc>::run(SDL::QueueAcc& queue,
void SDL::LST<SDL::Acc>::run(SDL::Dev& devAccIn,
SDL::QueueAcc& queue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have a queue and a device separately? Isn't a queue uniquely attached to a device?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll tidy things up.

@ariostas
Copy link
Member Author

This should be working now, but I'll check if I there are things to clean up.

/run standalone
/run cmssw 21

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77
Copy link
Contributor

slava77 commented Apr 23, 2024

@ariostas
what else remains to be done to make this non-draft?

@ariostas
Copy link
Member Author

This is ready for review. I'll elaborate a bit more regarding the changes.

As the PR title suggests, the devices (both host and accelerator) are no longer initialized in the library and are no longer global variables. Instead, accelerator queues are passed as arguments both in standalone and CMSSW. When the host device is needed, it now uses a CMSSW function to get it.

Apart from that, I also made some changes so that we use types defined in CMSSW instead of defining our own types. They mostly match, but SDL::Idx changed from size_t to uint32_t, which causes a bunch of annoyances. Lines like alpaka::memcpy(queue, dst, src, 1); failed to compile since the size now explicitly needs to be unsigned, so it would have to be replaced by 1u or simply omitted. So I took this as a chance to do a bit of cleanup and remove the size argument when both buffers have the same size and it is obvious what it should be from the context. This make the code a bit shorter and less prone to errors if at some point some size is changed.

@ariostas ariostas marked this pull request as ready for review April 23, 2024 15:21
@slava77
Copy link
Contributor

slava77 commented Apr 23, 2024

/run cmssw 21

I think the last commit in 21 was after the last test was triggered

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77 slava77 merged commit 0d5fafa into master Apr 23, 2024
3 checks passed
@ariostas ariostas deleted the move_device_initialization branch April 30, 2024 13:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants