CI: self hosting by Maxwell-Rosen · Pull Request #941 · ammarhakim/gkeyll

Maxwell-Rosen · 2026-02-04T19:44:59Z

Documentation Changes

Purpose

The current Mac CI build fails intermittently and inconsistently. Rather than fixing that CI file, we can solve this issue permanently by hosting CI on a dedicated machine. @JunoRavin has volunteered to host this on his super server at PPPL. By enabling local hosting, we can gain enhanced control and testing on this computer.

CI is expanded to ensure all unit tests pass, including those that trigger compiler warnings and errors.

Both CPU and GPU builds are tested.

Affected Areas

.github/workflows and files located within for modifying the CI.

Failing unit tests are commented out so we can establish a baseline. I am not going to spend time doing the hard work of fixing the unit test or identifying why it is failing. That work is left to the individuals to whom these unit tests are relevant. There is an open issue about this (#845)

Failing unit tests are:

gyrokinetic/unit/ctest_gyrokinetic_cross_prim_moms_bgk.c test_1x2v_p1
gyrokinetic/unit/ctest_mom_gyrokinetic.c test_2x2v_p1 test_2x2v_p1_cu
gyrokinetic/unit/ctest_rescale_ghost_jacf.c test_1x1v_ho test_2x2v_ho and device versions
gyrokinetic/unit/ctest_dg_gyrokinetic_kern_tm.c -- all tests removed and fail.
gyrokinetic/unit/ctest_dg_interpolate.c -- all tests related to _gk_ fail.
gyrokinetic/unit/ctest_dg_rad_gyrokinetic.c -- All tests under _Li1 have erroneous print statements and warnings that they are not set up correctly.
moments/unit/ctest_gr_spacetime.c test_gr_schwarzschild test_gr_kerr
moments/unit/ctest_wv_gr_mhd.c test_gr_mhd_waves_schwarzschild
moments/unit/ctest_wv_gr_mhd_tetrad.c test_gr_mhd_tetrad_waves_schwarzschild
moments/unit/ctest_wv_gr_ultra_rel_euler.c test_gr_ultra_rel_euler_waves_schwarzschild
moments/unit/ctest_wv_gr_ultra_rel_euler_tetrad.c test_gr_ultra_rel_euler_tetrad_waves_schwarzschild
vlasov/unit/ctest_hyper3x_dg.c -- Has some print statements which are removed
vlasov/unit/ctest_dg_em_vars.c -- All gpu tests are commented out, as well as test_2x and 3x tensor p2.
core/unit/ctest_cudss.cu -- test_simple has some print statements removed
gyrokinetic/unit/ctest_dg_gyrokinetic_kern_tm.c -- all tests removed and fail

Additional Notes

So far, I have decided not to run regression tests to speed up CI.
To enhance CI, a few regression tests can be run to compare with the main; however, these should be a selective sample. Rather than building a main for each CI instance, it would be more efficient to have a cron job to initialize the runregression system after each push to main (or every day, but that seems excessive)

We can add Valgrind testing to the CPU build and/or memory sanitizer checks to the GPU build.

This work is progressing to using @JunoRavin 's super server for enhanced Gkeyll robustness and testing. Future work will include nightly runregression testing, powered through cron jobs.

Relevant issues:
#913
#784
fixes #116 (I just found out that using the keywords of "fixes ###" adds this issue to the "development" tab and that issue will be closed when the PR is merged)

Checklist

I have reviewed the documentation for accuracy.
All technical terms and code examples have been double-checked.
The update aligns with the overall style and tone of the documentation.
The updated documentation builds correctly (if applicable).

… and have every unit test create a null position map

…ing in ctest_gk_geometry_tok which needed to be updated after the filepath of the .geqdsk was moved in a recent PR. It's really difficult to tell if I broke something in this branch because so many unit tests are failing that the errors exceed my terminal context length. I will update the issue about failing unit tests. It's very important that our unit tests pass so that we can have reliable checks that we didn't break anything. It would be really great to have nightly reminders about any unit tests which are broken on main. Also, it's really anoying that when some unit tests fail, it spits out like 10 thousand lines of failures instead of just one line that the unit test failed.

…r build unit, build regression, make check, for all modules

…lt moments

…king baseline for CI. People really need to fix their failing unit tests. These are only the CPU versions, but I'm sure the gpu version fails too. Disable pkpm unit testing because pkpm has zero unit tests.... That's kind of concerning

…new commit is made

…PU will be quick and easy, but the GPU one will take a bit more time. I'm pretty sure I configured the scripts correctly, but I'd like Jimmy to ensure that the standard configure.linux.###.sh works on his server with the correct modules. We do not need to mkdeps, which saves time.

…x build. Maybe the reason it was failing was a timeout error because we were building everything all at once. Also, the maximum number of make -j processes we can use is 3 says https://docs.github.com/en/actions/reference/runners/github-hosted-runners#single-cpu-runners which uses an M1 mac arm64 architecture. Maybe using -j 3 will help this issue too

… says it's because my laptop has bash 4 but CI mac uses bash 3 which didn't support the ^^ logic

…ixes. I think it's important that we remove the logs at the end of each make-module so that we don't hit our storage limits for CI. We are relatively constrained in this and those build logs can be big files

…ore, I was just testing, but the mac build shouldn't launch for drafts. I have it set so only the linux one launches for drafts. We have some limits on how many times per month we can launch the mac build, so we should be more stringent on its use cases, but we can run lots of CI jobs on Jimmys cluster since it can do several at a time

…KPM does not have unit tests, so it doesn't have to make check or make unit. Format the mac build to have consistent indenting

…ments from unit tests. There were a few warnings I was able to fix, but there are a lot that I don't know how to fix and others in the code should fix them on their own time. Failing unit tests should not be a reason that we do not have a working CI baseline. CI that does not work is useless to us all. ctest_cudss.cu has some print statement checks and I'm not sure why they're neccisary. The other cuda unit tests do not check thier accuracy in this way. There are some tests in ctest_dg_rad_gyrokinetic which have a warning print statement deep down so something is wrong with the unit test but I don't have the knowledge to fix them.

…d to do unit tests since the same machine is doing the GPU unit tests. The CPU build can do valgrind checks so that it can compliment the GPU build

…y very not valgrind clean, so I'm disabling it

…y tests which were reading files were not reading them correctly

…e genuinely not valgrind clean and I had to do a few releases. The valcheck takes quite a while, maybe 15 minutes on my laptop, so we should consider making some of the heavier unit tests lighter. dg_em_vars has a very heavy unit test. I made a few of the tests lighter, with less cells, but I didn't achieve much performance. Now, core, moments, and vlasov all pass valcheck

…nd runs in 15 minutes. I did have to merge a fix for position map that has been sitting around for a month in order to get everything valgrind clean.

Maxwell-Rosen added 8 commits December 24, 2025 08:22

Remove the position map null new methods from the geometry generators…

51a44e1

… and have every unit test create a null position map

Remove flags from position map

d77282c

First commit about using a self hosted build for CI. Have the compute…

ca207a6

…r build unit, build regression, make check, for all modules

Move moments after core. I didn't know the order since I've never bui…

931bb10

…lt moments

Cancel previous CI runs in the same branch if they're not done and a …

f9ce33a

…new commit is made

Maxwell-Rosen changed the title ~~CI: Hosting on a local computer~~ CI: self hosting Feb 5, 2026

Maxwell-Rosen added 18 commits February 5, 2026 10:40

CI test

8694085

CI test

67acf73

CI test

c0141a6

Fix a directory label oversight from when I moved files around

9140ef9

Must make the bash scripts executable

1eeea4d

Update make module to correctly scan the syntax of the errors. Claude…

1d6cb3c

… says it's because my laptop has bash 4 but CI mac uses bash 3 which didn't support the ^^ logic

Fix two compiler warnings on the mac build that were single line hotf…

fa4e7dc

…ixes. I think it's important that we remove the logs at the end of each make-module so that we don't hit our storage limits for CI. We are relatively constrained in this and those build logs can be big files

Found a bug where one of the paths to the make-check was not right. P…

f688032

…KPM does not have unit tests, so it doesn't have to make check or make unit. Format the mac build to have consistent indenting

Add valcheck to the cpu build. It was a little silly for the CPU buil…

b088580

…d to do unit tests since the same machine is doing the GPU unit tests. The CPU build can do valgrind checks so that it can compliment the GPU build

Valcheck clean moments. The unit test ctest_wv_gr_euler_tetrad is ver…

8cf64ee

…y very not valgrind clean, so I'm disabling it

For some reason, valcheck-core was being run from inside /core, so an…

5a347ff

…y tests which were reading files were not reading them correctly

Merge branch 'main' into fix-pmap

53b9609

Merge branch 'fix-pmap' into ci-local-host

601b8c9

Valgrind cleaning gyrokinetic. Make -j6 gyrokinetic-valcheck passes a…

216c58b

…nd runs in 15 minutes. I did have to merge a fix for position map that has been sitting around for a month in order to get everything valgrind clean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: self hosting#941

CI: self hosting#941
Maxwell-Rosen wants to merge 26 commits intomainfrom
ci-local-host

Maxwell-Rosen commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Maxwell-Rosen commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Changes

Purpose

Affected Areas

Additional Notes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Maxwell-Rosen commented Feb 4, 2026 •

edited

Loading