Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Docs] Add sycl_ext_oneapi_syclbin extension #16784

Open
wants to merge 4 commits into
base: sycl
Choose a base branch
from

Conversation

steffenlarsen
Copy link
Contributor

This commit adds the sycl_ext_oneapi_syclbin for loading a new SYCLBIN file format to kernel_bundle.

This commit adds the sycl_ext_oneapi_syclbin for loading a new SYCLBIN
file format to `kernel_bundle`.

Signed-off-by: Larsen, Steffen <[email protected]>
@steffenlarsen steffenlarsen requested a review from a team as a code owner January 27, 2025 11:19
Signed-off-by: Larsen, Steffen <[email protected]>
_Throws:_

* An `exception` with the `errc::invalid` error code if the contents of `bytes`
is not in the SYCLBIN format, as defined by the SYCL implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also say that the exception is thrown if the SYCLBIN format "state" does not match State.

My current thinking is that each SYCLBIN will define a single "state" (input, object, or executable). When reading a SYCLBIN, I think the state needs to exactly match the State of the kernel_bundle. Possibly, we could allow certain mismatches. For example, it might be legal to read an "input" SYCLBIN into an "executable" kernel_bundle by JIT compiling the IR. However, when I was sketching out the algorithm, this was getting complicated in some cases. In any case, certain combinations will never be possible. For example, it will never be possible to read an "input" kernel_bundle from an "executable" SYCLBIN.

FWIW, here is the algorithm I sketched out for this operation:


  • Find the set of unique device architectures ARCHes from “devs”
  • If State == input and “bytes” is input
    • For each ARCH
      • For each abstract module that uses features compatible with ARCH
        • For each IR module type that is compatible with ARCH
          • urProgramCreateWithIL to create a ur_program_handle_t from IR.
          • Remember this module for later serialization
      • Abstract modules that use features incompatible with ARCH are silently skipped.
    • Notes
      • Here and below, if kernel bundle is serialized, serialization contains IR modules (device images) noted as “remembered for later serialization”.
  • If State == executable and “bytes” is executable
    • For each ARCH
      • For each abstract module in “bytes” that uses features compatible with ARCH
        • If there is a device image whose arch == ARCH
          • urProgramCreateWithBinary to create a ur_program_handle_t from native code.
          • Remember this device image for later serialization
          • For each IR module whose format is compatible with ARCH
            • Remember this IR module for later serialization
        • Else if there is an IR module whose format is compatible with ARCH
          • urProgramCreateWithIL to create a ur_program_handle_t from IR
          • Compile it with urProgramCompile
          • Remember this IR module for later serialization
      • Abstract modules that use features incompatible with ARCH are silently skipped
      • If “bytes” has the global “link-all” metadata
        • If the ARCH device does not support linking of native device code images, the operation fails
          • Note that the SYCLBIN file is only generated with “link-all” set to true if all devices support linking of device code native images. Therefore, this error would only occur if the SYCL RT that wrote the SYCLBIN had that ability and the SYCL RT that reads it does not. Perhaps this could happen if a SYCLBIN from a new SYCL RT is read by an old SYCL RT?
        • Call urProgramLink to link all the ur_program_handle_t together. If there are any undefined symbols, the operation fails.
      • Else
        • Use the list of imported / exported SYCL_EXTERNAL symbols to link the ur_program_handle_t together as per the shared library design. If there are any undefined symbols, the operation fails.
    • Notes
      • In the case when there is no native code device image for ARCH and also no IR module, this algorithm silently skips ARCH, and the bundle just doesn’t contain any device images for it. I think this is what we want, but it might be confusing to some users. Users will still get an error if they try to get any kernel from the bundle for ARCH.
  • If State == object and “bytes” is object
    • For each ARCH
      • For each abstract module in “bytes” that uses features compatible with ARCH
        • If there is a device image whose arch == ARCH
          • urProgramCreateWithBinary to create a ur_program_handle_t from native code.
          • Remember this device image for later serialization
        • For each IR module whose format is compatible with ARCH
          • urProgramCreateWithIL to create a ur_program_handle_t from IR
          • (Do not call urProgramCompile)
          • Remember this module for later serialization
      • Abstract modules that use features incompatible with ARCH are silently skipped
    • Notes
      • We do not call urProgramCompile because the user might want fast-link later. Calling urProgramCompile in that case would be unnecessary and slow.
      • The call to urProgramCreateWithIL is also unnecessary in the fast-link case. The call to urProgramCreateWithBinary is unnecessary when not using fast-link. Should we delay these also? If the operations are fast, there’s no need to delay them.
  • Otherwise
    • Fail: The kernel bundle cannot be created from “bytes”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this ASAP.

In the meantime, what would be the case be for having a SYCLBIN in executable state? Seems to me like the layout of a SYCLBIN file in executable state is the same as one in object state, given there would be no difference in the binaries. The user can always load them directly into a kernel_bundle with executable state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also say that the exception is thrown if the SYCLBIN format "state" does not match State.

My current thinking is that each SYCLBIN will define a single "state" (input, object, or executable). When reading a SYCLBIN, I think the state needs to exactly match the State of the kernel_bundle. Possibly, we could allow certain mismatches. For example, it might be legal to read an "input" SYCLBIN into an "executable" kernel_bundle by JIT compiling the IR. However, when I was sketching out the algorithm, this was getting complicated in some cases. In any case, certain combinations will never be possible. For example, it will never be possible to read an "input" kernel_bundle from an "executable" SYCLBIN.

I agree with this from an implementation point-of-view. It also fits well with how normal kernel_bundles work, i.e. the binaries need to be in compatible states to be loaded into the corresponding kernel_bundle. That said, since the SYCLBIN format is defined by the implementation, I don't think it makes sense to talk about the "state" the SYCLBIN contents is in. I've added an exception case that simply says:

* An `exception` with the `errc::invalid` error code if the contents of the file
  specified by `filename` is incompatible with `State`.

It is a little vague, but not much different from what the specification says about loading regular kernel_bundle I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, what would be the case be for having a SYCLBIN in executable state?

I think this is a common use case for CUDA modules. For example, an application might want to cache a module (SYCLBIN) in executable state that was RTC generated, so that the application can just load the module (SYCLBIN) next time without recompiling it with RTC.

I don't think it makes sense to talk about the "state" the SYCLBIN contents is in

I think your wording is an improvement, but I'm not sure the "state" of a SYCLBIN should be undocumented. If you call ext_oneapi_get_content on a kernel bundle in (e.g.) "input" state, there should be a guarantee that the resulting SYCLBIN is also in "input" state. Therefore, there is an expectation that you can read that SYCLBIN later with get_kernel_bundle to construct an "input" state kernel bundle. Therefore, I think a SYCLBIN file does have a documented "state".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a common use case for CUDA modules. For example, an application might want to cache a module (SYCLBIN) in executable state that was RTC generated, so that the application can just load the module (SYCLBIN) next time without recompiling it with RTC.

Discussed offline. Makes sense to me to have an executable state in SYCLBIN now.

I think your wording is an improvement, but I'm not sure the "state" of a SYCLBIN should be undocumented. If you call ext_oneapi_get_content on a kernel bundle in (e.g.) "input" state, there should be a guarantee that the resulting SYCLBIN is also in "input" state. Therefore, there is an expectation that you can read that SYCLBIN later with get_kernel_bundle to construct an "input" state kernel bundle. Therefore, I think a SYCLBIN file does have a documented "state".

I'm reluctant for the extension to say too much about the format, but I see what you mean. I've mentioned the state in the APIs, but I don't know if there is a place in the overview or elsewhere that would benefit from talking about this state too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to add something to the Overview, you could say something like:

Since SYCLBIN files can be created from a kernel_bundle, each SYCLBIN file has an associated state which matches the kernel bundle's state. These possible states are "input", "object", and "executable".

Signed-off-by: Larsen, Steffen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants