-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] OpenMP offload directives for #196 #206
base: refresh_openmp_inv_trans
Are you sure you want to change the base?
[DRAFT] OpenMP offload directives for #196 #206
Conversation
!$OMP& SHARED(IBEG,IEND,IINC,KF_UV,R,OFFSET_VAR,G,D,KF_FS,KUV_OFFSET,F,PREEL_COMPLEX) & | ||
!$OMP& MAP(TO:IBEG,IEND,IINC,KF_UV,OFFSET_VAR,KF_FS,KUV_OFFSET,ZACHTE2) | ||
!$OMP& PRIVATE(IGLG,IOFF_LAT,IOFF_UV,ZACHTE2,JM,JF,KGL) & | ||
!$OMP& FIRSTPRIVATE(IBEG,IEND,IINC,OFFSET_VAR,KF_UV,KUV_OFFSET,KF_FS) & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to have these variables as FIRSTPRIVATE
instead of SHARED
? I ask because none of the variables in my SHARED
clause are modified in the loop below, therefore is it not safe to have them shared rather than have every thread make its own copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep OpenACC and OpenMP as similar as possible with regards to overall behavior, but yes SHARED
here would be safe as well. I'm reviewing this module more carefully after realizing this comment: #206 (comment)
src/trans/gpu/internal/fsc_mod.F90
Outdated
@@ -116,9 +120,9 @@ SUBROUTINE FSC(ALLOCATOR,HFSC,PREEL_COMPLEX, KF_FS, KF_UV, KF_SCALARS, KUV_OFFSE | |||
!* 1.1 U AND V. | |||
#ifdef OMPGPU | |||
!$OMP TARGET TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(3) DEFAULT(NONE) & | |||
!$OMP& PRIVATE(IGLG,IOFF_LAT,IOFF_UV,ZACHTE2) & | |||
!$OMP& SHARED(IBEG,IEND,IINC,KF_UV,R,OFFSET_VAR,G,D,KF_FS,KUV_OFFSET,F,PREEL_COMPLEX) & | |||
!$OMP& MAP(TO:IBEG,IEND,IINC,KF_UV,OFFSET_VAR,KF_FS,KUV_OFFSET,ZACHTE2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removal of MAP
, how do we ensure IBEG
etc. are mapped from host to device properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I believe I mistakenly assumed OMP's FIRSTPRIVATE
here handles implicit mapping of variables like OpenACC's FIRSTPRIVATE
. After reviewing documentation, explicit maps will likely be required. I will address this.
src/trans/gpu/internal/leinv_mod.F90
Outdated
#ifdef _CRAYFTN | ||
!$OMP& | ||
#else | ||
!$OMP& NOWAIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we just remove all NOWAIT
clauses for now, until we have numerical correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'm going through all my changes and making comments as TODO items. I am disabling all async directives
src/trans/gpu/internal/ltinv_mod.F90
Outdated
@@ -386,6 +386,7 @@ SUBROUTINE LTINV(ALLOCATOR,HLTINV,KF_UV,KF_SCALARS,& | |||
ENDIF | |||
|
|||
#ifdef OMPGPU | |||
!$OMP TASKWAIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be paired with a NOWAIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I guess I must have thrown this in due to the presence of ACC WAIT(1)
in the corresponding directive below, but your comment also applies to the OpenACC branch, as I do not see any ASYNC
statements here. I'm disabling this but leaving a comment for later.
5044bb5
to
1177da8
Compare
This PR adds and restores OMP Target directives in the relevant modules for building and running the test program exercising the inverse transform, see #196 for reference