This project solve Poisson's equation on a 2D grid using either:
- A parallel openMP/openACC implementation of the Gauss-Seidel method
- or LAPACK
The goal was to practice openMP and openACC parallelization. This was a learning exercise and is not intended to serve as a comparison between LAPACK and GS method.
There is a build script scripts/build.sh.
The binary bin/solver outputs the discrete grid u_grid to a fort file fort.10. You can check that both methods produce the
same answer.
- I have only tested on NVIDIA GPUs
- For NVIDIA GPUs and openMP offloading a GPU with Compute Capability >= 7.0 is required
For problem size:
ugrid[200,200]tol = 1e-11(GS only)
For this test run I compiled the code with ifort version 2021.7.0 (oneapi) but the code also works with GNU compiler
gfortran.
The test was run on CPU Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz with 16 Gb RAM.
Walltime (s)
| Procs | LAPACK | Gauss-Seidel |
|---|---|---|
| 1 | 823 | 25 |
| 2 | 434 | 16 |
| 4 | 265 | 11 |
Maximum Memory
| LAPACK | Gauss-Seidel |
|---|---|
| 11.6Gb | 22Mb |
The test was run on arm CPU Neoverse-N1 with 512 Gb RAM.
Problem size u_grid dims = [300,300]
number of iterations = 192018
| Cores | Time (s) |
|---|---|
| 80 | 15.4646 |
| 64 | 14.1203 |
| 32 | 15.4587 |
| 16 | 21.6295 |
| 8 | 38.4833 |
| 4 | 70.8669 |
| 2 | 136.7466 |
| 1 | 270.7637 |
The test was run on nvidia GPU NVIDIA A100 with 40 Gb RAM connected to the arm system above.
Problem size u_grid dims = [300,300]
number of iterations = 192018
| GPU | Time (s) |
|---|---|
| na | 17.7678 |
Continue to profile openACC version. Check for unneccessary memory transfers.