I am getting a memory access error after a few steps in the PIC update on GPUs:
0: ../build-gpu-dbg/mfix() [0x107c916]
amrex::BLBackTrace::print_backtrace_info(_IO_FILE*)
.../subprojects/amrex/Src/Base/AMReX_BLBackTrace.cpp:205:25
1: ../build-gpu-dbg/mfix() [0x107c444]
amrex::BLBackTrace::handler(int)
.../subprojects/amrex/Src/Base/AMReX_BLBackTrace.cpp:103:7
2: ../build-gpu-dbg/mfix() [0xe3827a]
amrex::ParallelDescriptor::Abort(int, bool)
.../subprojects/amrex/Src/Base/AMReX_ParallelDescriptor.cpp:225:21
3: ../build-gpu-dbg/mfix() [0xdd6774]
amrex::Error_host(char const*, char const*)
.../subprojects/amrex/Src/Base/AMReX.cpp:261:1
4: ../build-gpu-dbg/mfix() [0xdd66b2]
amrex::Abort(char const*) inlined at .../subprojects/amrex/Src/Base/AMReX.cpp:232:6 in amrex::Abort(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
.../subprojects/amrex/Src/Base/AMReX.H:176:1
amrex::Abort(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
.../subprojects/amrex/Src/Base/AMReX.cpp:232:6
5: ../build-gpu-dbg/mfix() [0xe936a9]
amrex::Gpu::Device::synchronize()
.../subprojects/amrex/Src/Base/AMReX_GpuDevice.cpp:675:437
6: ../build-gpu-dbg/mfix() [0x5dc28a]
amrex::Gpu::synchronize()
.../subprojects/amrex/Src/Base/AMReX_GpuDevice.H:234:1
7: ../build-gpu-dbg/mfix() [0xc56831]
MFIXParticleContainer::PICHydroStep(int, bool, bool, bool, double, double, amrex::RealVect&, amrex::Vector<std::array<amrex::MultiFab*, 3ul>, std::allocator<std::array<amrex::MultiFab*, 3ul> > >&, amrex::MultiFab&, std::array<amrex::MultiFab*, 3ul>&, amrex::MultiFab const*, amrex::FabArray<amrex::EBCellFlagFab> const*, amrex::EBFArrayBoxFactory*, int, amrex::MultiFab const*)
.../src/des/pic/mfix_pc_deposit_pic.cpp:251:50
8: ../build-gpu-dbg/mfix() [0xc5a495]
mfix::pic_iteration(bool, bool, bool, double, double, amrex::RealVect&, amrex::Vector<std::array<amrex::MultiFab*, 3ul>, std::allocator<std::array<amrex::MultiFab*, 3ul> > >&, amrex::Vector<amrex::MultiFab*, std::allocator<amrex::MultiFab*> >&, amrex::EBFArrayBoxFactory*, int, amrex::MultiFab const*)
.../src/des/pic/mfix_pic_iteration.cpp:90:19
9: ../build-gpu-dbg/mfix() [0xc2d8d6]
mfix::EvolveParcels(double, double, amrex::RealVect&, int, LoadBalance*)
.../src/des/pic/mfix_evolve_parcels.cpp:150:26
10: ../build-gpu-dbg/mfix() [0xbddb5b]
mfix::EvolveSolids(double, double, int&, double&, double&, amrex::Vector<amrex::MultiFab*, std::allocator<amrex::MultiFab*> > const&, amrex::Vector<amrex::MultiFab const*, std::allocator<amrex::MultiFab const*> > const&)
.../src/timestepping/mfix_evolve_solids.cpp:143:6
11: ../build-gpu-dbg/mfix() [0xbd03fb]
mfix::Evolve()
.../src/timestepping/mfix_evolve.cpp:203:126
12: ../build-gpu-dbg/mfix() [0x5c44af]
main
.../src/main.cpp:213:48
13: /lib64/libc.so.6(+0x3feb0) [0x146e46c3feb0]
14: /lib64/libc.so.6(__libc_start_main+0x80) [0x146e46c3ff60]
15: ../build-gpu-dbg/mfix() [0x415265]
_start at ??:?
It’s likely I have overlooked something setting this up causing a PIC parcel to depart the domain—could one of you advise on what might be wrong here?
cuda700mwe.zip (15.9 KB)