Pepiot & Desjardins benchmark

I saw a LinkedIn post from Professor Bruno Blais this weekend using his higher-order FEM CFD-DEM code on a simple benchmark problem from Pepiot & Desjardins (2012). I’ve actually never set this case up in exa and figured, let’s do it. Basics are:

  • dp = 0.21 mm
  • ρp = 3300 kg/m3
  • ρg = 1.13 kg/m3
  • μg = 1.77 × 10-5 Pa-s
  • domain: 0.075 0.225 0.075 m
  • mesh: 192 576 192 (level 0, static; ≈21M cells)
  • doubly periodic in x and z
  • mi/po in y
  • Uin = 0.555 m/s
  • ic: φ = 0.4 for y ≤ 0.075 m (≈30M particles)

A couple of notes:

  • One of the biggest parameters affecting the time to solution will be the DEM dt. It seems like we won’t be able to get a one-to-one comparison here due to differences in the numerical method. Here, I’ve set (DEM) dt = 2 × 10-6 s with 32 subcycling steps which gives a spring constant of kn = 19.3 N/m.
  • Originally the domain had an aspect ratio of 2 in the reference paper. I followed Bruno’s setup w/ aspect ratio of 3. Extending to 4 might be even better at this high superficial velocity.
  • Drag law is BVK2 (Tang et al. 2015)
  • I wasn’t sure what the collision parameters were supposed to be so I guessed 0.9 for restitution, 0.3 for kinetic friction and ignored rolling friction. The first two ended up being correct but he used rolling friction, coefficient of 0.2 with the EPSD model (:thinking: I’m not sure what that is…)
  • It looks like he reflected particles at the outlet to maintain inventory, so I did that too.
  • Particles were seeded in hcp (hex close packed) which produced about 36.4M.
  • The grid decomposition is 32 16 32 which gives 1296 grids which is 1:1 mapped to 1296 CPUs. He mentioned running on approx. 1000 cores, but we need nice powers of base 2.
  • I’m not diffusing solids information, just using the “standard” (to MFIX-Exa) 8 cell, tri-linear deposition.

inputs_pdfb.txt (8.4 KB)
:backhand_index_pointing_up: Here’s the inputs. Case was run to a simulation time of 1 s. Checkpointing every 4 hours; not saving plot files; not pulling any data through monitors; ascent vis is the only output. Time to solution was 59580 s ≈ 16.5 hours. Here’s the animation :backhand_index_pointing_down:. I also blindly decomposed this into grids of 64 cubed and ran on one GPU node (8 H100s managing about 10 grids each). The time to solution was 39454 s ≈ 11 hrs. Not bad.

It’s interesting that, for us, the particles get stuck at the outlet because they slammed into it so forcefully and uniformly. I’ve seen this a number of times before. Often leads to a crash. I’m a little surprised the code chewed through this. Even beyond plugging up the po, the dynamics of the remaining bed look quite a bit different qualitatively than predicted by Lethe. It would be nice to have some quantitative metrics to compare.


As a bit of a follow up, I couldn’t help but notice this is not how I would have set the problem up. Specifically, I think the grid resolution is finer than it needs to be for unresolved CFD-DEM, dx/dp ≈ 1.86. If we reduce the mesh to 128 384 128 we still have a reasonable resolution, dx/dp ≈ 2.8, with nice base mesh and the slightly larger dx should give a slightly larger fluid dt which is CFL limited to 0.9 with MFIX-Exa’s default Godunov scheme. I did a quick and dirty strong scaling for just the initial 10 steps using only cubic grids from 128 to 8. The sweet spot for CPUs is usually about 16 cubed, but I just wanted to see how far up the Yupeng hill that put me for this case.


As expected, 16 cubed is off ideal but still reasonable. You can still get a almost 2x speed up going down to 8 cubed but now you’re running on 12k CPUs, a considerable expense. So, I ran this case w/ 16 cubed grids 1:1 mapped to 1536 CPUs. At first glance this might seem like a lot more CPUs than before, but it perfectly fills 12 nodes (128 CPUs per node for us), whereas the previous distribution still needed 11 with some unused cores. When I first ran this case, it crashed when the slug slammed into the po :backhand_index_pointing_down:. Note that this happened on both CPU and GPU (see below) for this mesh resolution.

It’s a bit unfortunate that this case is so dependent on the initial configuration (:eyes: @Sarah {ps shame on you for not having an MFS account!}) which is throw away data anyway. There’s a number of ways to overcome this, the simplest, which is what I did, is to add a little perturbation to the particle array with

ic.init-bed.granular_temperature = 1.0e-2

This worked. It ran. The time to solution is reduced to 45127 s ≈ 12.5 hrs. It’s not substantially faster, but a bit of an improvement for only one extra node.

I also did some quick GPU timing for the 10 initial steps. (Because the original case above was already running, I only had one node/8H100s to use for this testing.) The results are summarized in the table below.

gpus grids nx ny nz time (s)
1 1 128 384 128 72.2
2 2 128 384 64 46.8
3 3 128 128 128 71.1
6 6 128 64 128 44.8
4 4 64 384 64 33.0
8 24 64 64 64 56.0

Not surprisingly, a 1:1 distribution mapping is better than making the GPUs manage multiple grids. I ran this case (including the initial perturbation) on 4 H100s with the 2-1-2 decomposition. The time to solution was 33205 s ≈ 9.2 hrs. Even if we just said the time between CPU and GPU solution was approximately the same, I wish I could put a $ value on 1536 AMD CPUs vs 4 nVidia GPUs, both MSRP and cost of ownership, to know what’s the better solution strategy. But that’s beyond my purview. Oh well. Animation below. Notice no slug sticking to po this time :grin:.

2 Likes

Very nice Will! Can you please post the ascent yaml file ascent_actions.yaml as well?

Jeff- thanks, sure:

-
  action: "add_scenes"
  scenes:
    s1:
      plots:
        p1:
          type: "pseudocolor"
          field: "vely"
          points:
            radius:   105.0e-6
          min_value:  -1.0
          max_value:   1.0
          color_table:
            name: "Blue to Orange" #"Inferno"
            reverse: "false"
            annotation: "false"
      renders:
        r1:
          image_width:   1080
          image_height:  2400
          image_prefix: "./pngs/vis_p_v_%06d"
          camera:
            position: [ 0.389024, 0.250646, 0.389808]
            look_at:  [ 0.0243748,  0.113295,  0.0221593]
            up: [-0.184551,  0.966559, -0.178055]
            zoom: 2.2
          bg_color: [ 0.0,  0.0,  0.0]
          fg_color: [ 1.0,  1.0,  1.0]
          world_annotations: "false"
          render_bg: "true"
          shading: "enabled"

Very cool animation and very nice results! I am the author of the linked in post you mentioned. I agree with you that the benchmark is not very well articulated because of the dependence on the initial condition and the triggering of the Rayleigh-Taylor instability.

Maybe it’s an indication that we should come up with new large scale CFD-DEM benchmark problems :)!

I’d be more than happy to work with you folks towards that. I think the entire CFD-DEM ecosystem would greatly benefit from these kind of things.

1 Like

Hello Bruno, welcome to our mfix forum :waving_hand:. Yes, another benchmark problem would be great. IMO it should be something that

  • is simple to set up (as this benchmark was)
  • is relatively quick to run and fast to lose dependence on initial conditions
  • characterized by easily processed quantities of interest, e.g., mean & std Dp’s, particle velocity distribution function

Obviously, having experimental data would be nice to have too, but we’ve been burned by that before.

Hello William,

I fully agree with you. Maybe a system that is slightly smaller or with slightly larger particles (and a system for which the particles do not hit the top wall). We could also define an initial condition for the particles that break down the instability quickly. Then we could establish comparison on first- and second-order statistics for both the particle and the fluid phase. That would be very interesting.

I think some of the differences you and I observe in the Desjardins case might arise from the way we filter the equations. We explicitely filter the equation using a cell-independent filter (in our case a top-hat, but could be a Gaussian) and we apply the same process for the particle-fluid forces. I presume in MFIX it’s a particle-in-cell coupling?

Anyhow, maybe we could have a chat on this topic in a few weeks if you are interested and we could set-up a benchmark. I think from a scientific perspective this is very relevant, without it being a competition exercise :slight_smile:.

I am not sure about experiments. It’s always very tricky to get any form of “local” measurements in these type of systems I fear. We’ve tried our luck in the past with liquid fluidized bed and it was not that conclusive.

Cheers!

Bruno

1 Like