Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][NFC] Move SYCL Module Splitting to library. Part 1 #13054

Merged
merged 6 commits into from
Mar 27, 2024

Conversation

maksimsab
Copy link
Contributor

@maksimsab maksimsab commented Mar 18, 2024

This patch is a part 1 of moving sycl-post-link functionality to library for New Offloading initiative.
This patch replaces std::vector<char> with MemoryBuffer as a container for accessing device image in the SYCLImage class. MemoryBuffer uses mmap that allows not to hold the whole device image in the RAM.

This patch is a part 1 of moving sycl-post-link functionality to library for
New Offloading initiative.
This patch consist of introducing LazyMemoryBuffer that allows to keep
MemoryBuffer on the Disk for RAM economy.
@maksimsab maksimsab requested a review from a team as a code owner March 18, 2024 13:27
@asudarsa
Copy link
Contributor

Hi @maksimsab

Can you please provide a high-level description of why we need LazyMemoryBuffer and how it will be used in the SYCL compilation flow?

Thanks

@maksimsab
Copy link
Contributor Author

Sure.
Currently, sycl-post-link has a trick where all splitted modules are being saved on the Disk because the total amount of data is too large to keep it in RAM.
clang-linker-wrapper is not an exception in that matter and the current interface of Offload Wrapping requires keeping all data in RAM while it is impossible for big images. This patch changes the interface taking into account this case.

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it seems like something that should maybe go directly to the upstream

@maksimsab
Copy link
Contributor Author

Btw, I could try to add Lazy mode to MemoryBuffer apart from the work of upstreaming.
The current LazyMemoryBuffer is useless in upstream without its usage which is SYCL offload wrapping.

@AlexeySachkov
Copy link
Contributor

Btw, I could try to add Lazy mode to MemoryBuffer apart from the work of upstreaming. The current LazyMemoryBuffer is useless in upstream without its usage which is SYCL offload wrapping.

Don't we want to unify the paths taken for both SYCL and non-SYCL offloading? I mean, it is possible to maybe switch existing linker-wrapper code to use lazy buffers? Would it make sense?

@maksimsab
Copy link
Contributor Author

Probably, we could accept usual MemoryBuffer as they already do here:

/// Creates the object file containing the device image and runtime
/// registration code from the device images stored in \p Images.
Expected<StringRef>
wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
const ArgList &Args, OffloadKind Kind) {

Because, MemoryBuffer uses mmap and file's content is not loaded in the beginning.

@AlexeySachkov
Copy link
Contributor

Btw, I could try to add Lazy mode to MemoryBuffer apart from the work of upstreaming. The current LazyMemoryBuffer is useless in upstream without its usage which is SYCL offload wrapping.

Don't we want to unify the paths taken for both SYCL and non-SYCL offloading? I mean, it is possible to maybe switch existing linker-wrapper code to use lazy buffers? Would it make sense?

To clarify here what I mean: my idea was to introduce LazyMemoryBuffer to the upstream and start using it in CUDA & HIP flows right away to justify addition of the components. I'm not saying that we should rewrite everything in SYCL flow right away.

@maksimsab
Copy link
Contributor Author

My concern is that MemoryBuffer is already satisfies performance requirement that I've tried to achieve with LazyMemoryBuffer. Because MemoryBuffer uses mmap here.

std::unique_ptr<MB> Result(
new (NamedBufferAlloc(Filename)) MemoryBufferMMapFile<MB>(
RequiresNullTerminator, FD, MapSize, Offset, EC));

And this is how CUDA and HIP flows are potentially do not experience RAM issues.
It isn't clear from the MemoryBuffer's doxygen page. It is clear from the source code though.

@bader
Copy link
Contributor

bader commented Mar 18, 2024

And this is how CUDA and HIP flows are potentially do not experience RAM issues.

Just me speculating as I don't know for sure.
CUDA and HIP flows might experience RAM issues in fullLTO mode, but they also support thinLTO, which consumes much less memory. In addition to that they do not "split" module. Each "split" duplicates parts of LLVM module, which makes SYCL flow to demand more memory comparing to CUDA and HIP flows.

@maksimsab maksimsab changed the title Move SYCL Module Splitting to library. Part 1 [SYCL][NFC] Move SYCL Module Splitting to library. Part 1 Mar 21, 2024
@maksimsab
Copy link
Contributor Author

@asudarsa @AlexeySachkov
Updated PR and description. Please, take a look.

@@ -30,7 +31,20 @@ enum class SYCLBinaryImageFormat {
};

struct SYCLImage {
Copy link
Contributor

@asudarsa asudarsa Mar 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse or extend llvm::OffloadBinary::OffloadingImage data structure for this? On cursory look, it seems feasible. ImageKind is the same as Format here. OffloadKind is OFK_SYCL. Entries and Properties could be moved into StringData.
https://llvm.org/docs/doxygen/structllvm_1_1object_1_1OffloadBinary_1_1OffloadingImage.html

I am Ok with adding a TODO here and doing this in part 2 of changes.

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I will think through it after this patch.

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of clarifications requested. Please check the comment about 'assert' before merging this change.

Thanks

@asudarsa
Copy link
Contributor

Test failures not due to this change. Merging this. Thanks

@asudarsa asudarsa merged commit bb48303 into intel:sycl Mar 27, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants