-
Notifications
You must be signed in to change notification settings - Fork 5
Examples of shared memory bugs (with fixes)
This page lists some examples of shared-memory related bugs that have been found and fixed, with explanations of their causes.
From charge_exchange.jl
(comments removed for brevity)
@loop_s is begin
if is ∈ composition.neutral_species_range
isn = is
@loop_z iz begin
if moments.evolve_ppar
wpa_shift_norm = moments.vth[iz,isn]
else
wpa_shift_norm = 1.0
end
for isi ∈ composition.ion_species_range
wpa_shift = (fvec_in.upar[iz,isi] - fvec_in.upar[iz,isn])/wpa_shift_norm
for ivpa ∈ 1:vpa.n
vpa.scratch[ivpa] = vpa.grid[ivpa] + wpa_shift
end
vpa.scratch .= interpolate_to_grid_vpa(vpa.scratch, view(fvec_in.pdf,:,iz,isn), vpa, spectral)
for ivpa ∈ 1:vpa.n
f_out[ivpa,iz,isi] += dt*charge_exchange_frequency*fvec_in.density[iz,isn] * ## <== bug here
(vpa.scratch[ivpa] - fvec_in.pdf[ivpa,iz,isi])
end
end
end
elseif is ∈ composition.ion_species_range
isi = is
@loop_z iz begin
if moments.evolve_ppar
wpa_shift_norm = moments.vth[iz,isi]
else
wpa_shift_norm = 1.0
end
for isn ∈ composition.neutral_species_range
wpa_shift = (fvec_in.upar[iz,isn] - fvec_in.upar[iz,isi])/wpa_shift_norm
for ivpa ∈ 1:vpa.n
vpa.scratch[ivpa] = vpa.grid[ivpa] + wpa_shift
end
vpa.scratch .= interpolate_to_grid_vpa(vpa.scratch, view(fvec_in.pdf,:,iz,isi), vpa, spectral)
for ivpa ∈ 1:vpa.n
f_out[ivpa,iz,isn] += dt*charge_exchange_frequency*fvec_in.density[iz,isi] * ## <== bug here
(vpa.scratch[ivpa]- fvec_in.pdf[ivpa,iz,isn])
end
end
end
end
end
The bug results in errors from debug_test/sound_wave_tests.jl
like
Shared memory array written at CartesianIndex(1, 1, 1) from multiple ranks between calls to _block_synchronize(). Array was created at:
moment_kinetics.communication.DebugMPISharedArray(array::Array{Float64, 3}) at communication.jl:78
macro expansion at communication.jl:209 [inlined]
macro expansion at debugging.jl:47 [inlined]
allocate_shared(T::Type, dims::Tuple{Int64, Int64, Int64}) at communication.jl:207
allocate_shared_float(::Int64, ::Vararg{Int64}) at array_allocation.jl:38
setup_scratch_arrays(moments::moment_kinetics.velocity_moments.moments, pdf_in::moment_kinetics.communication.DebugMPISharedArray{Float64, 3}, n_rk_stages::Int64) at time_advance.jl:244
setup_time_advance!(pdf::moment_kinetics.initial_conditions.pdf_struct, vpa::moment_kinetics.coordinates.coordinate, z::moment_kinetics.coordinates.coordinate, composition::moment_kinetics.input_structs.species_composition, drive_input::moment_kinetics.input_structs.drive_input, moments::moment_kinetics.velocity_moments.moments, t_input::moment_kinetics.input_structs.time_input, collisions::moment_kinetics.input_structs.collisions_input, species::Vector{moment_kinetics.input_structs.species_parameters}) at time_advance.jl:112
... etc ...
showing that the problem is multiple writes to the same shared-memory array element from different ranks (race condition because the order of those writes is undefined, and they might even happen at the same time causing undefined behaviour), and the affected array is the distribution function (which is the thing created at time_advance.jl:112
as part of the scratch_pdf
structs).
The array being written to is f_out
. In this region of the code, the loops over species is
and spatial dimension iz
are split among processes and each process should write to only a subset of the points in each dimension. However, the 'split-up' index is
is used to index the neutrals in the first branch isn = is
, while f_out
is written at isi
, which takes all values in composition.ion_species_range
. The result is that index isi
can take values within the species dimension that do not 'belong to' the process, which allows multiple processes to write to the same array location - an error!
The second (else
) branch has the same issue, but with roles of ions and neutrals swapped.
Rearrange the species loops (swapping the order of loops over ion and neutral species) so that the index set to is
is always used to access the array being written to, f_out
. This does require moving the initialization of wpa_shift_norm
one level down in the nested loop, which is slightly sub-optimal (but very slightly since it is just a scalar value look-up and the loop it moved inside is a species loop which has few iterations). The fixed code is
@loop_s is begin
if is ∈ composition.ion_species_range ## <== fix here
isi = is ## <== here
@loop_z iz begin
for isn ∈ composition.neutral_species_range ## <== and here
if moments.evolve_ppar
wpa_shift_norm = moments.vth[iz,isn]
else
wpa_shift_norm = 1.0
end
wpa_shift = (fvec_in.upar[iz,isi] - fvec_in.upar[iz,isn])/wpa_shift_norm
for ivpa ∈ 1:vpa.n
vpa.scratch[ivpa] = vpa.grid[ivpa] + wpa_shift
end
vpa.scratch .= interpolate_to_grid_vpa(vpa.scratch, view(fvec_in.pdf,:,iz,isn), vpa, spectral)
for ivpa ∈ 1:vpa.n
f_out[ivpa,iz,isi] += dt*charge_exchange_frequency*fvec_in.density[iz,isn] *
(vpa.scratch[ivpa] - fvec_in.pdf[ivpa,iz,isi])
end
end
end
elseif is ∈ composition.neutral_species_range ## <== fix here
isn = is ## <== here
@loop_z iz begin
for isi ∈ composition.ion_species_range ## <== and here
if moments.evolve_ppar
wpa_shift_norm = moments.vth[iz,isi]
else
wpa_shift_norm = 1.0
end
wpa_shift = (fvec_in.upar[iz,isn] - fvec_in.upar[iz,isi])/wpa_shift_norm
for ivpa ∈ 1:vpa.n
vpa.scratch[ivpa] = vpa.grid[ivpa] + wpa_shift
end
vpa.scratch .= interpolate_to_grid_vpa(vpa.scratch, view(fvec_in.pdf,:,iz,isi), vpa, spectral)
for ivpa ∈ 1:vpa.n
f_out[ivpa,iz,isn] += dt*charge_exchange_frequency*fvec_in.density[iz,isi] *
(vpa.scratch[ivpa]- fvec_in.pdf[ivpa,iz,isn])
end
end
end
end
end