-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Docs] Add sycl_ext_oneapi_syclbin extension #16784
base: sycl
Are you sure you want to change the base?
Conversation
This commit adds the sycl_ext_oneapi_syclbin for loading a new SYCLBIN file format to `kernel_bundle`. Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
_Throws:_ | ||
|
||
* An `exception` with the `errc::invalid` error code if the contents of `bytes` | ||
is not in the SYCLBIN format, as defined by the SYCL implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also say that the exception is thrown if the SYCLBIN format "state" does not match State
.
My current thinking is that each SYCLBIN will define a single "state" (input, object, or executable). When reading a SYCLBIN, I think the state needs to exactly match the State
of the kernel_bundle
. Possibly, we could allow certain mismatches. For example, it might be legal to read an "input" SYCLBIN into an "executable" kernel_bundle
by JIT compiling the IR. However, when I was sketching out the algorithm, this was getting complicated in some cases. In any case, certain combinations will never be possible. For example, it will never be possible to read an "input" kernel_bundle
from an "executable" SYCLBIN.
FWIW, here is the algorithm I sketched out for this operation:
- Find the set of unique device architectures ARCHes from “devs”
- If State == input and “bytes” is input
- For each ARCH
- For each abstract module that uses features compatible with ARCH
- For each IR module type that is compatible with ARCH
- urProgramCreateWithIL to create a ur_program_handle_t from IR.
- Remember this module for later serialization
- For each IR module type that is compatible with ARCH
- Abstract modules that use features incompatible with ARCH are silently skipped.
- For each abstract module that uses features compatible with ARCH
- Notes
- Here and below, if kernel bundle is serialized, serialization contains IR modules (device images) noted as “remembered for later serialization”.
- For each ARCH
- If State == executable and “bytes” is executable
- For each ARCH
- For each abstract module in “bytes” that uses features compatible with ARCH
- If there is a device image whose arch == ARCH
- urProgramCreateWithBinary to create a ur_program_handle_t from native code.
- Remember this device image for later serialization
- For each IR module whose format is compatible with ARCH
- Remember this IR module for later serialization
- Else if there is an IR module whose format is compatible with ARCH
- urProgramCreateWithIL to create a ur_program_handle_t from IR
- Compile it with urProgramCompile
- Remember this IR module for later serialization
- If there is a device image whose arch == ARCH
- Abstract modules that use features incompatible with ARCH are silently skipped
- If “bytes” has the global “link-all” metadata
- If the ARCH device does not support linking of native device code images, the operation fails
- Note that the SYCLBIN file is only generated with “link-all” set to true if all devices support linking of device code native images. Therefore, this error would only occur if the SYCL RT that wrote the SYCLBIN had that ability and the SYCL RT that reads it does not. Perhaps this could happen if a SYCLBIN from a new SYCL RT is read by an old SYCL RT?
- Call urProgramLink to link all the ur_program_handle_t together. If there are any undefined symbols, the operation fails.
- If the ARCH device does not support linking of native device code images, the operation fails
- Else
- Use the list of imported / exported SYCL_EXTERNAL symbols to link the ur_program_handle_t together as per the shared library design. If there are any undefined symbols, the operation fails.
- For each abstract module in “bytes” that uses features compatible with ARCH
- Notes
- In the case when there is no native code device image for ARCH and also no IR module, this algorithm silently skips ARCH, and the bundle just doesn’t contain any device images for it. I think this is what we want, but it might be confusing to some users. Users will still get an error if they try to get any kernel from the bundle for ARCH.
- For each ARCH
- If State == object and “bytes” is object
- For each ARCH
- For each abstract module in “bytes” that uses features compatible with ARCH
- If there is a device image whose arch == ARCH
- urProgramCreateWithBinary to create a ur_program_handle_t from native code.
- Remember this device image for later serialization
- For each IR module whose format is compatible with ARCH
- urProgramCreateWithIL to create a ur_program_handle_t from IR
- (Do not call urProgramCompile)
- Remember this module for later serialization
- If there is a device image whose arch == ARCH
- Abstract modules that use features incompatible with ARCH are silently skipped
- For each abstract module in “bytes” that uses features compatible with ARCH
- Notes
- We do not call urProgramCompile because the user might want fast-link later. Calling urProgramCompile in that case would be unnecessary and slow.
- The call to urProgramCreateWithIL is also unnecessary in the fast-link case. The call to urProgramCreateWithBinary is unnecessary when not using fast-link. Should we delay these also? If the operations are fast, there’s no need to delay them.
- For each ARCH
- Otherwise
- Fail: The kernel bundle cannot be created from “bytes”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this ASAP.
In the meantime, what would be the case be for having a SYCLBIN in executable state? Seems to me like the layout of a SYCLBIN file in executable state is the same as one in object state, given there would be no difference in the binaries. The user can always load them directly into a kernel_bundle
with executable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also say that the exception is thrown if the SYCLBIN format "state" does not match
State
.My current thinking is that each SYCLBIN will define a single "state" (input, object, or executable). When reading a SYCLBIN, I think the state needs to exactly match the
State
of thekernel_bundle
. Possibly, we could allow certain mismatches. For example, it might be legal to read an "input" SYCLBIN into an "executable"kernel_bundle
by JIT compiling the IR. However, when I was sketching out the algorithm, this was getting complicated in some cases. In any case, certain combinations will never be possible. For example, it will never be possible to read an "input"kernel_bundle
from an "executable" SYCLBIN.
I agree with this from an implementation point-of-view. It also fits well with how normal kernel_bundle
s work, i.e. the binaries need to be in compatible states to be loaded into the corresponding kernel_bundle
. That said, since the SYCLBIN format is defined by the implementation, I don't think it makes sense to talk about the "state" the SYCLBIN contents is in. I've added an exception case that simply says:
* An `exception` with the `errc::invalid` error code if the contents of the file
specified by `filename` is incompatible with `State`.
It is a little vague, but not much different from what the specification says about loading regular kernel_bundle
I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime, what would be the case be for having a SYCLBIN in executable state?
I think this is a common use case for CUDA modules. For example, an application might want to cache a module (SYCLBIN) in executable state that was RTC generated, so that the application can just load the module (SYCLBIN) next time without recompiling it with RTC.
I don't think it makes sense to talk about the "state" the SYCLBIN contents is in
I think your wording is an improvement, but I'm not sure the "state" of a SYCLBIN should be undocumented. If you call ext_oneapi_get_content
on a kernel bundle in (e.g.) "input" state, there should be a guarantee that the resulting SYCLBIN is also in "input" state. Therefore, there is an expectation that you can read that SYCLBIN later with get_kernel_bundle
to construct an "input" state kernel bundle. Therefore, I think a SYCLBIN file does have a documented "state".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a common use case for CUDA modules. For example, an application might want to cache a module (SYCLBIN) in executable state that was RTC generated, so that the application can just load the module (SYCLBIN) next time without recompiling it with RTC.
Discussed offline. Makes sense to me to have an executable state in SYCLBIN now.
I think your wording is an improvement, but I'm not sure the "state" of a SYCLBIN should be undocumented. If you call
ext_oneapi_get_content
on a kernel bundle in (e.g.) "input" state, there should be a guarantee that the resulting SYCLBIN is also in "input" state. Therefore, there is an expectation that you can read that SYCLBIN later withget_kernel_bundle
to construct an "input" state kernel bundle. Therefore, I think a SYCLBIN file does have a documented "state".
I'm reluctant for the extension to say too much about the format, but I see what you mean. I've mentioned the state in the APIs, but I don't know if there is a place in the overview or elsewhere that would benefit from talking about this state too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to add something to the Overview, you could say something like:
Since SYCLBIN files can be created from a
kernel_bundle
, each SYCLBIN file has an associated state which matches the kernel bundle's state. These possible states are "input", "object", and "executable".
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
This commit adds the sycl_ext_oneapi_syclbin for loading a new SYCLBIN file format to
kernel_bundle
.