Skip to content

Conversation

@edmundnoble
Copy link
Contributor

Pass ParStrat to GHC when we build extra sources, and batch up file paths into GHC invocations, so that GHC can build them in parallel.

This doesn't actually make GHC build these sources in parallel right now, because GHC in one-shot mode doesn't even listen to -j, though it doesn't cause an error or anything. But it hopefully will soon! https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388

An alternative would be to make GHC's --make understand that it has to build foreign sources after Haskell modules. I didn't investigate that possibility too thoroughly.

QA Notes

GHC invocations when building projects with foreign sources should have a) all of those sources in a single GHC invocation and b) -jsem if --semaphore was passed to cabal.

Template Α: This PR modifies cabal behaviour

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

@alt-romes alt-romes self-requested a review April 9, 2024 05:44
Copy link
Collaborator

@alt-romes alt-romes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very welcome improvement.
Great work both here and in GHC @edmundnoble.

Regarding the checklist:

  • Writing this down in the changelog is nice to advertise the performance improvements gained here
  • No documentation needs updating I think
  • I think you could write an automated test.

To test this automatically I suggest you write a simple cabal.test.hs PackageTest for a package with two source files which calls cabal with -v2 and -jsem and checks that the extra-sources ghc invocation line contains both -jsem and the two source files at once. Take a look at other cabal.test.hs files (they're pretty easy to write).

@alt-romes
Copy link
Collaborator

One limitation of this approach is that if you have way too many C sources within many nested folders you may run into command argument size limitations (e.g. see https://www.in-ulm.de/~mascheck/various/argmax/). Though we aren't that careful and may e.g. already reach this limitation when linking together all Haskell modules using the path to every .o file (see GHC.Build.Link's linkLibrary...).

Perhaps we should use a response file if the list of sources is too large, but it is a bit hard to tell because it also depends on the length of the path to the source file.

@mpickering
Copy link
Collaborator

mpickering commented Apr 10, 2024

What has been the testing strategy for this patch? Are we very sure that removing the order dependence doesn't break some packages?

It would also be better to guard passing -j against only compiler versions which support that for one shot mode. Otherwise it will be quite confusing seeing that on the command line.

@edmundnoble
Copy link
Contributor Author

I can come back to this now. FYI I tested this together with the corresponding GHC MR by compiling the librocksdb C++ library with cabal (rocksdb-haskell, see https://github.com/kadena-io/rocksdb-haskell) and it took the compilation time from ~20-30min down to less than 5min.

@edmundnoble
Copy link
Contributor Author

edmundnoble commented Nov 16, 2024

As bgamari pointed out (https://gitlab.haskell.org/ghc/ghc/-/issues/24642#note_557676) totally parallel compilation doesn't work for C++ code using C++ modules; when using C++ modules, compilation units must be compiled in topological order with respect to module dependencies. As an extra note, it looks to me that it might be impossible to compile C++ modules with clang using cabal's native sources feature, because the output files for each module must end in the .pch suffix, but GHC won't do this for you.

I suggest that detecting this topological order and compiling respecting it is feasible, because clang, gcc and Visual C++ all support dumping the module dependencies of C++ files, which we can collect to generate a compilation order. We could hypothetically compute this in cabal and then pass that to ghc.

If this isn't feasible to do this in cabal, perhaps I can at least add a cabal flag allowing the user to disable parallel compilation of any particular type of native sources, or maybe in general, set a different -j flag for them. Performant compilation of C++ modules may be best done with special build system support, either inside GHC by adding it to ghc --make, or by invoking an external build system and bundling the results with the extra-bundled-libraries cabal feature. I will note that it seems a project large enough to benefit from C++ modules for compilation performance is also likely large enough that not using parallelism would be a serious blow for its compilation performance, so I don't know how practical this really is in practice.

Edit: I just noticed that #9938 also exists.

@edmundnoble edmundnoble force-pushed the push-syzkpkzvmomx branch 3 times, most recently from 29362f2 to 393f49b Compare January 9, 2026 17:24
@edmundnoble
Copy link
Contributor Author

edmundnoble commented Jan 9, 2026

@alt-romes @mpickering was able to pick this back up, new changes:

  • automated test
  • only pass -jsem if supported in oneshot mode by GHC
  • now extra-sources compiled as part of foreign-libraries have their object filenames suffixed, as they would for non-foreign-libraries - I was in the area and this was easier than keeping the existing behavior which seems wrong.
  • added changelog entries

So I believe it's ready for review.

Edit: also, I don't think we can really compile C++ modules in cabal anyway. If you look at this example of the command line options involved you can see that you need to both compile each module with special options and produce a pcm file (which I don't think GHC can do at all) and tell each dependent file where to find the modules it uses (which would require different compiler options for each file, which is not really a thing you can do with cabal, and computing this manually would be extremely tedious anyway). If we're going to do this, it's probably got to be done by --make in future or something.

@edmundnoble edmundnoble marked this pull request as ready for review January 9, 2026 17:25
@edmundnoble edmundnoble force-pushed the push-syzkpkzvmomx branch 2 times, most recently from 6776b25 to 79599d9 Compare January 9, 2026 17:39
@edmundnoble
Copy link
Contributor Author

What has been the testing strategy for this patch? Are we very sure that removing the order dependence doesn't break some packages?

It would also be better to guard passing -j against only compiler versions which support that for one shot mode. Otherwise it will be quite confusing seeing that on the command line.

I made the latter change. For the former: is building acme-everything sufficient? What's the usual way I'd do this?

@geekosaur
Copy link
Collaborator

acme-everything seems kinda dubious to me, due to potential conflicts. haskell/clc-stackage is a good start.

@edmundnoble
Copy link
Contributor Author

acme-everything seems kinda dubious to me, due to potential conflicts. haskell/clc-stackage is a good start.
My test plan:
Cherry-picked my GHC changes onto 9.10.2, built it with hadrian,
added with-compiler: /home/edmundn/workspace/ghc/_build/stage1/bin/ghc to clc-stackage/generated/cabal.project,
then ran ./bin/clc-stackage --cabal-path=$CABAL --write-logs=current --cabal-options="--semaphore" --batch 1.

  • --cabal-path was set to my custom cabal build
  • --semaphore added concurrency to give us a chance of seeing any out-of-order compilation issues
  • --batch 1 reduced the package-level concurrency to maximize the file-level concurrency

And all packages successfully built except the 5 termbox packages, which is seemingly caused by awkward-squad/termbox#5.

I know some concurrency is still happening for extra-sources because of the automated test in this PR.

@mpickering @geekosaur I hope that's sufficient.

@edmundnoble edmundnoble changed the title wip: feature: pass ParStrat to build extra sources feature: pass ParStrat to build extra sources Jan 10, 2026
@edmundnoble edmundnoble changed the title feature: pass ParStrat to build extra sources feature: batch build extra sources and use ParStrat if supported Jan 10, 2026
All C sources for a component are now compiled with a single one-shot
(the `-c` option) GHC invocation, rather than one-per-source, and ditto
for each other type of extra-source. In addition, `--semaphore` now
propagates `-jsem` to these GHC invocations if working with a GHC that
supports this (see
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388/ for the
status of that support).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants