-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does it mean for an atomic_ref to have work_item scope? #665
Comments
This is an area of the consistency model that we should probably clarify. OpenCL defines some of these things a little more precisely, but I don't think we should adopt the OpenCL wording.
I think the behavior should be: 1) the operation is performed atomically; 2) there are no ordering constraints across work-items.
Probably, but sequential consistency across threads can be achieved using either an
In OpenCL, Regardless, I don't think we would want to adopt exactly the same wording. OpenCL says you can't use
I think this may be implementation- and device-specific, so it's hard to answer in general. But if an implementation can guarantee atomicity and ordering within a work-item without use of special instructions, |
@Pennycook: We talked about this a bit in the Friday meeting after you left. It seems like the key question is whether the Your response above:
seems to imply that you think the If we decide that We think there is probably a similar issue with OpenCL, but we haven't checked to see what OpenCL says here. Regardless, this is something we should clarify in the SYCL spec. Does |
It's really difficult to talk about any of this stuff precisely, because the memory consistency model is underspecified. We still have this note in the specification:
I agree we should try and improve things here, but I think any discussions we have about this will involve a lot of hand-waving until we can find somebody with both the time and necessary expertise to help us formalize the model. But, to try and answer your question... I don't think we want to get in a situation where only some operations performed via an
When I said that the accesses were atomic, above, I didn't mean to imply that they were (necessarily) visible to other work-items. I really just meant that any accesses via For example, if we say that image operations initiated by a work-item are also "atomic", then the purpose of a work-item scoped atomic operation would be to ensure that an application sees either the results of updating memory via some image operation or the results of updating it via an Rather than trying to incorporate scope into the meaning of "atomic", I think what we want to do here is modify the definition of terms like "data race" and "synchronizes-with", to say something like:
I'm hand-waving my way through "compatible scopes" because I'm not sure what compatibility means. Things are very easy to define when the scopes are equal, but there are some cases that should still be well-defined even if the scopes are different (e.g., I'd expect a release operation with work-group scope to synchronize-with an acquire operation with device-scope, if the work-items executing the operations were in the same work-group). It looks like CUDA took a similar approach (see here) but I'm not sure about their wording. |
Lets consider a concrete example:
Note that the memory order is relaxed, so this is not a synchronization operation. In C++, Also note that the memory scope is Does the code snippet above guarantee that each work-item gets a unique value for I think @bashbaug and I are arguing that the answer should be "no". Are you saying "yes"? |
I'm also saying "No", but I think we might disagree on how to describe the behavior. I think we should describe your snippet as performing atomic operations -- via an atomic reference of the memory location pointed to by I think you and Ben are suggesting that we say something like "atomics with |
I'm not suggesting that we write that into the spec. I think the spec should say that the I think an outcome of this wording is that |
I think we're in agreement about which behaviors are invalid, but I'm still not sure that atomicity is necessarily only relevant between different work-items -- I don't understand why OpenCL says that there's no guarantee of read-write coherence within a work-item when images are involved, and I think we need to understand that before deciding I want to continue discussing the wording later, but I know what you mean and agree with the intent. |
I looked into this once. OpenCL has special rules around the APIs that read and write images. The API that writes an image is not guaranteed to be coherent with subsequent APIs that read the same image, even if the write and read operations are in the same work-item. If you want the effect of a write operation to be visible to a subsequent read, you need to make a special call to I don't know the history of this limitation. I'm guessing that some older hardware didn't provide coherence between writes and reads to image memory. I'm not sure if SYCL needs anything similar here. It could be that SYCL is intended to run on newer, more capable hardware. I think this is a separate topic, though. The issue in OpenCL with image writes/reads isn't really related to atomic operations as far a I can tell. The fact that it uses an API with the name "atomic" in it seems like a misnomer to me. If we do need some operation like this in SYCL, I think we should find a better name -- maybe something like |
While on the SYCL WG call today I made a point that I thought I had already raised here, but I can't see it anywhere... Even if we decide that the work-item scope is kind of weird in OpenCL, we still need it in SYCL because of generic programming and the way that we're defining things like the Group concept. The KHR group interface (#638) defines a |
What is the expected behaviour of an
atomic_ref
withwork_item
scope? Does it make sense to apply a per-thread atomic?The memory ordering enabled by the
atomic_ref
may still provide some benefit to individual threads, however, couldn't the same be achieved withatomic_fence
s instead of using anatomic_ref
?Are there any cases where
work_item
scoped atomics are meaningful, other than the aforementioned case?How should an implementation handle
work_item
scoped atomics?The text was updated successfully, but these errors were encountered: