Skip to content

Conversation

@jeremyfirst22
Copy link
Contributor

@jeremyfirst22 jeremyfirst22 commented Oct 24, 2025

Bugfixes

  • add8c60: CRITICAL: Extension of GPU race condition in MPI communication #98 to other places in ScaLBL.cpp.
  • 1681cb6: The timestep of the Restart.db was never read, leading to a restart starting with timestep = 0 and overwriting previous output files.
  • 38a21f7: The min_steady_timesteps database parameter was never used to exit upon converged Ca, since Ca_previous was never updated, causing ((fabs((Ca - Ca_previous) / Ca) < tolerance) to always be false. Added a break to exit once the steady point is achieved to avoid continuing to write steady point information until the EXIT_TIMESTEP is achieved.

Features

  • a4dee77: Added a lbpm_color_unsteady_simulator module to run an unsteady simulation until the water saturation converges below the supplied tolerance (default = 0, i.e., always run to timestepMax)
  • 1ee867e: Adjusted body force search algorithm: In the steady state calculations, the body force is proportionally adjusted to achieve the desired capillary number, but because these properties are non-linearly related, the proportional scaling was prone to 'straddling' the target and infinitely bouncing back-and-forth (as identified in #75). Now, once the proportional scaling brackets the target, we switch to the bisection method for guaranteed convergence. Resolves Endless overshoot/undershoot Capillary number using Fractional Flow Protocol #75.

Cleanup

  • ebe2ae5: Switched Ca and body_force print statements to use scientific notation
  • 7e060fc: Consolidated the actual LBM run logic into a ForwardStep function to share between the two Run functions.
  • 4f3be0c: Removed vestige variables flagged by compiler warnings

I've targeted this PR to dev, but the bugfixes may better be merged into master (particularly the first). Please let me know if you'd like me to split this PR and merge the bugfixes directly into master.

@diogosiebert
Copy link
Collaborator

Hello @jeremyfirst22,

Thanks for the pull requests — we really appreciate your effort in contributing to the improvement of LBPM.

When you submitted the first bug fix, it occurred to me that the best approach might be to include the barrier code inside the PACK function, which would ensure the same behavior as in the CPU code.
Additionaly this would avoid the need of a ScaLBL_DeviceBarrier() after each ScaLBL_D3Q19_Pack() call.

For future pull requests, I just ask that you separate them by individual feature or bug fix so we can discuss and evaluate each change more easily.

For example, in your latest submission, you proposed three different bug fixes. It would be much better if each of these fixes were submitted as separate pull requests, which would make the discussion and testing of each change simpler and more effective.

Best Regards,

Diogo Nardelli Siebert

@JamesEMcClure
Copy link
Collaborator

JamesEMcClure commented Nov 4, 2025

As implemented there is no race condition in ScaLBL_D3Q19_Pack() because the data that packed into the communication buffers at a given timestep is unique (i.e. there is a unique mapping between the distributions and the buffer locations). My recommendation would be to add synchronization within MPI_COMM_SCALBL.Isend because the race condition is introduced at this point (i.e. because MPI is not ensuring the data is synchronized). The ideal solution would be for the Isend command to be augmented to take a build-time flag to determine whether or not device synchronization is needed or not (since this is dependent on the MPI implementation and underlying communication details)

@jeremyfirst22
Copy link
Contributor Author

@diogosiebert The reason I did not put the cudaDeviceSynchronize call within ScaLBL_D3Q19_Pack itself is because that would require the buffer packing kernel to complete before exiting the function. This might not be ideal, as there could be other work that could be done simultaneously, or multiple packing kernels could be launched for simultaneous packing using different cudaStreams.

It could be just as easily be nested within the MPI_COMM_SCALBL.Isend routine itself (as @JamesEMcClure suggests), which would keep the code open for overlapping work with communication. But I do not think there is any benefit to wrapping the DeviceBarrier in pragmas like #ifdef USE_GPU. The ScaLBL_DeviceBarrier itself is already overloaded for the GPUs, so there should be no need for build-time flags, right?

And noted about splitting each bugfix into a separate PR. I will close this PR and submit a PR per feature/bugfix. Thank you both for the comments!

-- Jeremy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants