Hi, everyone, I am currently experiencing a problem with MFIX-22.3 running on linux. After running for about 10 minutes, the software automatically stops and exits with an error message like the one shown below. I am not sure what caused it now. If you have encountered such mistakes, please help me reply, thank you very much!
I am able to reproduce this problem. When the program crashes, we get a core file. Examining that file:
mfix-22.4:) file core
core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style,
from '/usr/bin/mpirun --use-hwthread-cpus -mca mpi_warn_on_fork 0 -np 8 /tmp/12345678', real uid: 103, effective uid: 103, real gid: 1000, effective gid: 1000, execfn: '/usr/bin/mpirun', platform: 'x86_64'
That is a bit unusual - the core file is not from mfix itself but from mpirun. I’m not used to seeing this. And the bad_alloc message indicates memory allocation problem. But not from MFiX itself - that’s a C++ exception. Did mpirun itself run out of memory?
I should also point out that the NetworkReply message is a red herring - this just means that the solver exited and the GUI was not able to contact the solver process.
Running this again, and looking at top, sorted by resident set size, I think I see the problem:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6433 cgw 23 3 8151968 3.9g 193988 R 64.6 25.5 19:21.77 mfix
6583 cgw 23 3 1780288 174608 24196 S 89.4 1.1 28:21.70 python
6585 cgw 23 3 1614356 155728 21008 S 92.4 1.0 27:35.13 python
6587 cgw 23 3 1614352 155384 20808 S 92.4 1.0 27:26.48 python
6595 cgw 23 3 1609104 154560 20936 S 95.0 1.0 27:51.38 python
6589 cgw 23 3 1609104 154368 20988 S 84.1 1.0 27:55.36 python
6592 cgw 23 3 1609104 154360 20808 S 88.4 1.0 27:45.71 python
6593 cgw 23 3 1609104 154120 20712 S 95.7 1.0 27:45.15 python
6596 cgw 23 3 1609104 154020 20700 S 76.5 1.0 27:38.98 python
6577 cgw 23 3 175576 18656 14936 S 0.3 0.1 0:04.91 mpirun
The process listed as mfix is the GUI itself, the 8 python jobs are the jobs kicked off by mpirun. Neither the solver jobs nor mpirun is using much memory, but over time the memory used by the GUI is growing (memory leak). I’m not sure why this is happening, but eventually the GUI uses so much memory that mpirun is not able to allocate buffers.
I will follow up on this, but since we’re all out of the office next week for the holidays, it might be a while.
and since all the messages in the console are sitting in memory, this is adding up to gigabytes - if I flush the console (x button) the memory usage goes down from 4 to 2GB.
This is fine for debugging but for real runs you need to disable this. If you remove or comment out these lines, you will see that the job runs without problems.
Perhaps we should add code to the GUI to drop messages when the size gets too big. However this could result in losing important information from the beginning of the run so I hesitate to do so.