Draft
Conversation
… and have every unit test create a null position map
…ing in ctest_gk_geometry_tok which needed to be updated after the filepath of the .geqdsk was moved in a recent PR. It's really difficult to tell if I broke something in this branch because so many unit tests are failing that the errors exceed my terminal context length. I will update the issue about failing unit tests. It's very important that our unit tests pass so that we can have reliable checks that we didn't break anything. It would be really great to have nightly reminders about any unit tests which are broken on main. Also, it's really anoying that when some unit tests fail, it spits out like 10 thousand lines of failures instead of just one line that the unit test failed.
…r build unit, build regression, make check, for all modules
…king baseline for CI. People really need to fix their failing unit tests. These are only the CPU versions, but I'm sure the gpu version fails too. Disable pkpm unit testing because pkpm has zero unit tests.... That's kind of concerning
…new commit is made
…PU will be quick and easy, but the GPU one will take a bit more time. I'm pretty sure I configured the scripts correctly, but I'd like Jimmy to ensure that the standard configure.linux.###.sh works on his server with the correct modules. We do not need to mkdeps, which saves time.
…x build. Maybe the reason it was failing was a timeout error because we were building everything all at once. Also, the maximum number of make -j processes we can use is 3 says https://docs.github.com/en/actions/reference/runners/github-hosted-runners#single-cpu-runners which uses an M1 mac arm64 architecture. Maybe using -j 3 will help this issue too
… says it's because my laptop has bash 4 but CI mac uses bash 3 which didn't support the ^^ logic
…ixes. I think it's important that we remove the logs at the end of each make-module so that we don't hit our storage limits for CI. We are relatively constrained in this and those build logs can be big files
…ore, I was just testing, but the mac build shouldn't launch for drafts. I have it set so only the linux one launches for drafts. We have some limits on how many times per month we can launch the mac build, so we should be more stringent on its use cases, but we can run lots of CI jobs on Jimmys cluster since it can do several at a time
…KPM does not have unit tests, so it doesn't have to make check or make unit. Format the mac build to have consistent indenting
…ments from unit tests. There were a few warnings I was able to fix, but there are a lot that I don't know how to fix and others in the code should fix them on their own time. Failing unit tests should not be a reason that we do not have a working CI baseline. CI that does not work is useless to us all. ctest_cudss.cu has some print statement checks and I'm not sure why they're neccisary. The other cuda unit tests do not check thier accuracy in this way. There are some tests in ctest_dg_rad_gyrokinetic which have a warning print statement deep down so something is wrong with the unit test but I don't have the knowledge to fix them.
…d to do unit tests since the same machine is doing the GPU unit tests. The CPU build can do valgrind checks so that it can compliment the GPU build
…y very not valgrind clean, so I'm disabling it
…y tests which were reading files were not reading them correctly
…e genuinely not valgrind clean and I had to do a few releases. The valcheck takes quite a while, maybe 15 minutes on my laptop, so we should consider making some of the heavier unit tests lighter. dg_em_vars has a very heavy unit test. I made a few of the tests lighter, with less cells, but I didn't achieve much performance. Now, core, moments, and vlasov all pass valcheck
…nd runs in 15 minutes. I did have to merge a fix for position map that has been sitting around for a month in order to get everything valgrind clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Documentation Changes
Purpose
The current Mac CI build fails intermittently and inconsistently. Rather than fixing that CI file, we can solve this issue permanently by hosting CI on a dedicated machine. @JunoRavin has volunteered to host this on his super server at PPPL. By enabling local hosting, we can gain enhanced control and testing on this computer.
CI is expanded to ensure all unit tests pass, including those that trigger compiler warnings and errors.
Both CPU and GPU builds are tested.
Affected Areas
.github/workflowsand files located within for modifying the CI.Failing unit tests are commented out so we can establish a baseline. I am not going to spend time doing the hard work of fixing the unit test or identifying why it is failing. That work is left to the individuals to whom these unit tests are relevant. There is an open issue about this (#845)
Failing unit tests are:
test_1x2v_p1test_2x2v_p1test_2x2v_p1_cutest_1x1v_hotest_2x2v_hoand device versions_gk_fail._Li1have erroneous print statements and warnings that they are not set up correctly.test_gr_schwarzschildtest_gr_kerrtest_gr_mhd_waves_schwarzschildtest_gr_mhd_tetrad_waves_schwarzschildtest_gr_ultra_rel_euler_waves_schwarzschildtest_gr_ultra_rel_euler_tetrad_waves_schwarzschildAdditional Notes
So far, I have decided not to run regression tests to speed up CI.
To enhance CI, a few regression tests can be run to compare with the main; however, these should be a selective sample. Rather than building a main for each CI instance, it would be more efficient to have a cron job to initialize the runregression system after each push to main (or every day, but that seems excessive)
We can add Valgrind testing to the CPU build and/or memory sanitizer checks to the GPU build.
This work is progressing to using @JunoRavin 's super server for enhanced Gkeyll robustness and testing. Future work will include nightly runregression testing, powered through cron jobs.
Relevant issues:
#913
#784
fixes #116 (I just found out that using the keywords of "fixes ###" adds this issue to the "development" tab and that issue will be closed when the PR is merged)
Checklist