Something wrong about running my project with dmp

solar_03_2022-08-09T013222.922157.zip (200.4 KB)

`Dear developers,
something wrongs happened when I use dmp solver to run my project

BUG REPORT

Type of issue
solver crash

Description
When I use dmp solver to run my project,there are some wrongs:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Could not print backtrace:
unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace:
unrecognized DWARF version in .debug_info at 6
#0 0x7f26e9ee82ed in ???
#1 0x7f26e9ee7503 in ???
#2 0x7f273780af0f in ???
#3 0x7f26ca646a1f in ???
#4 0x7f26ca5f92d2 in ???
#5 0x7f26ca5f532a in ???
#6 0x7f26ca72cb0d in ???
#7 0x7f273706f4c2 in ???
#8 0x7f2737066910 in ???
#9 0x7f273705f020 in ???
#10 0x7f26cc7bf63a in ???
#11 0x7f27375581dd in ???
#12 0x7f27387d027d in ???
#13 0x7f27387f12aa in ???
#14 0x7f26ea2f0967 in ???
#15 0x7f26ea599f08 in __parallel_mpi_MOD_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/pymfix/parallel_mpi_mod.f90:46
#16 0x7f26ea58c2f9 in f2py_rout_mfixsolver_parallel_mpi_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/f2pywrappers/mfixsolvermodule.c:1827

Could not print backtrace: unrecognized DWARF version in .debug_info at 6
#0 0x7fcf7588e2ed in ???
#1 0x7fcf7588d503 in ???
#2 0x7fcfc31b0f0f in ???
#3 0x7fcf55feca1f in ???
#4 0x7fcf55f9f2d2 in ???
#5 0x7fcf55f9b32a in ???
#6 0x7fcf560d2b0d in ???
#7 0x7fcfc2a154c2 in ???
#8 0x7fcfc2a0c910 in ???
#9 0x7fcfc2a05020 in ???
#10 0x7fcf5816563a in ???
#11 0x7fcfc2efe1dd in ???
#12 0x7fcfc417627d in ???
#13 0x7fcfc41972aa in ???
#14 0x7fcf75c96967 in ???
#15 0x7fcf75f3ff08 in __parallel_mpi_MOD_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/pymfix/parallel_mpi_mod.f90:46
#16 0x7fcf75f322f9 in f2py_rout_mfixsolver_parallel_mpi_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/f2pywrappers/mfixsolvermodule.c:1827

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6
#0 0x7f6f544392ed in ???
#1 0x7f6f54438503 in ???
#2 0x7f6fa1d62f0f in ???
#3 0x7f6f34b97a1f in ???
#4 0x7f6f34b4a2d2 in ???
#5 0x7f6f34b4632a in ???
#6 0x7f6f34c7db0d in ???
#7 0x7f6fa15c74c2 in ???
#8 0x7f6fa15be910 in ???
#9 0x7f6fa15b7020 in ???
#10 0x7f6f36d1063a in ???
#11 0x7f6fa1ab01dd in ???
#12 0x7f6fa2d2827d in ???
#13 0x7f6fa2d492aa in ???
#14 0x7f6f54841967 in ???
#15 0x7f6f54aeaf08 in __parallel_mpi_MOD_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/pymfix/parallel_mpi_mod.f90:46
#16 0x7f6f54add2f9 in f2py_rout_mfixsolver_parallel_mpi_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/f2pywrappers/mfixsolvermodule.c:1827

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6

Could not print backtrace: unrecognized DWARF version in .debug_info at 6
#0 0x7f2d3eec22ed in ???
#1 0x7f2d3eec1503 in ???
#2 0x7f2d8c7e4f0f in ???
#3 0x7f2d1b508a1f in ???
#4 0x7f2d1b4bb2d2 in ???
#5 0x7f2d1b4b732a in ???
#6 0x7f2d1b5eeb0d in ???
#7 0x7f2d8c0494c2 in ???
#8 0x7f2d8c040910 in ???
#9 0x7f2d8c039020 in ???
#10 0x7f2d2179963a in ???
#11 0x7f2d8c5321dd in ???
#12 0x7f2d8d7aa27d in ???
#13 0x7f2d8d7cb2aa in ???
#14 0x7f2d3f2ca967 in ???
#15 0x7f2d3f573f08 in __parallel_mpi_MOD_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/pymfix/parallel_mpi_mod.f90:46
#16 0x7f2d3f5662f9 in f2py_rout_mfixsolver_parallel_mpi_parallel_init
at /home/u/solar_03_2022-07-14T094752.541719/build/f2pywrappers/mfixsolvermodule.c:1827
#17 0x560e748d7707 in ???
at /usr/local/src/conda/python-3.10.5/Objects/call.c:215
#18 0x560e748d3216 in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:112
#19 0x560e748de84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#20 0x560e748ce0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#21 0x560e74983521 in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#22 0x560e74983466 in ???
at /usr/local/src/conda/python-3.10.5/Python/ceval.c:1134
#23 0x560e7498a62e in ???
at /usr/local/src/conda/python-3.10.5/Python/bltinmodule.c:1056
#24 0x560e748dea40 in ???
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:430
#25 0x560e748ce0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#26 0x560e748de84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#27 0x560e748ce0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#28 0x560e748de84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#29 0x560e749a843a in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:297
#30 0x560e749a7d70 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:585
#31 0x560e749767a8 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:1090
#32 0x7f27377edc86 in ???
#33 0x560e749766b0 in ???
#17 0x55b4328e2707 in ???
at /usr/local/src/conda/python-3.10.5/Objects/call.c:215
#18 0x55b4328de216 in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:112
#19 0x55b4328e984e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#20 0x55b4328d90bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#21 0x55b43298e521 in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#22 0x55b43298e466 in ???
at /usr/local/src/conda/python-3.10.5/Python/ceval.c:1134
#23 0x55b43299562e in ???
at /usr/local/src/conda/python-3.10.5/Python/bltinmodule.c:1056
#24 0x55b4328e9a40 in ???
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:430
#25 0x55b4328d90bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#26 0x55b4328e984e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#27 0x55b4328d90bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#28 0x55b4328e984e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#29 0x55b4329b343a in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:297
#30 0x55b4329b2d70 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:585
#31 0x55b4329817a8 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:1090
#32 0x7f6fa1d45c86 in ???
#33 0x55b4329816b0 in ???
#17 0x55f5a193e707 in ???
at /usr/local/src/conda/python-3.10.5/Objects/call.c:215
#18 0x55f5a193a216 in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:112
#19 0x55f5a194584e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#20 0x55f5a19350bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#21 0x55f5a19ea521 in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#22 0x55f5a19ea466 in ???
at /usr/local/src/conda/python-3.10.5/Python/ceval.c:1134
#23 0x55f5a19f162e in ???
at /usr/local/src/conda/python-3.10.5/Python/bltinmodule.c:1056
#24 0x55f5a1945a40 in ???
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:430
#25 0x55f5a19350bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#26 0x55f5a194584e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#27 0x55f5a19350bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#28 0x55f5a194584e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#29 0x55f5a1a0f43a in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:297
#17 0x55f04c7f5707 in ???
at /usr/local/src/conda/python-3.10.5/Objects/call.c:215
#18 0x55f04c7f1216 in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:112
#19 0x55f04c7fc84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#20 0x55f04c7ec0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#21 0x55f04c8a1521 in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#22 0x55f04c8a1466 in ???
at /usr/local/src/conda/python-3.10.5/Python/ceval.c:1134
#23 0x55f04c8a862e in ???
at /usr/local/src/conda/python-3.10.5/Python/bltinmodule.c:1056
#24 0x55f04c7fca40 in ???
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:430
#25 0x55f04c7ec0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#26 0x55f04c7fc84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#27 0x55f04c7ec0bc in ???
at /usr/local/src/conda/python-3.10.5/Include/cpython/abstract.h:114
#28 0x55f04c7fc84e in ???
at /usr/local/src/conda/python-3.10.5/Include/internal/pycore_ceval.h:46
#29 0x55f04c8c643a in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:297
#30 0x55f04c8c5d70 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:585
#31 0x55f04c8947a8 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:1090
#32 0x7f2d8c7c7c86 in ???
#33 0x55f04c8946b0 in ???
#30 0x55f5a1a0ed70 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:585
#31 0x55f5a19dd7a8 in ???
at /usr/local/src/conda/python-3.10.5/Modules/main.c:1090
#32 0x7fcfc3193c86 in ???
#33 0x55f5a19dd6b0 in ???
Floating point exception (core dumped)
Floating point exception (core dumped)

Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[44245,1],2]
Exit code: 136

I don’t know why these wrongs happened,and I don’t know how to solve these wrongs.I have modified some codes to achieve my goal,but it run well without dmp solver.I support my bug report.
Thank you very much!

The job didn’t even start for me in DMP mode. It immediately exited with this error:

Process exit mpirun --use-hwthread-cpus -mca mpi_warn_on_fork 0 -np 4 /tmp/solar_03_2022-08-09T013222.922157/mfixsolver -s -f /tmp/solar_03_2022-08-09T013222.922157/solar_05.mfx
Abort was called at 39 line in file:
/var/tmp/portage/dev-libs/intel-compute-runtime-21.46.21636-r1/work/compute-runtime-21.46.21636/shared/source/gmm_helper/client_context/gmm_client_context.cpp

I have not seen this error before. I’m looking into it now, will let you know what I find.

– Charles

I have tried both interactive and batch solver on Linux (gnu8.4), both run fine.

I updated the intel-compute-runtime package on my development machine and now I can no longer reproduce the error.

I don’t know why it don’t running on my project,that’s so strange.Is the problem about mpi?

zxc - yes, it looks like some sort of MPI problem

Thank you very much!Charles,I solve it by reinstalling mpi