-
Notifications
You must be signed in to change notification settings - Fork 725
feature: batch build extra sources and use ParStrat if supported #9872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very welcome improvement.
Great work both here and in GHC @edmundnoble.
Regarding the checklist:
- Writing this down in the changelog is nice to advertise the performance improvements gained here
- No documentation needs updating I think
- I think you could write an automated test.
To test this automatically I suggest you write a simple cabal.test.hs PackageTest for a package with two source files which calls cabal with -v2 and -jsem and checks that the extra-sources ghc invocation line contains both -jsem and the two source files at once. Take a look at other cabal.test.hs files (they're pretty easy to write).
|
One limitation of this approach is that if you have way too many C sources within many nested folders you may run into command argument size limitations (e.g. see https://www.in-ulm.de/~mascheck/various/argmax/). Though we aren't that careful and may e.g. already reach this limitation when linking together all Haskell modules using the path to every .o file (see GHC.Build.Link's Perhaps we should use a response file if the list of sources is too large, but it is a bit hard to tell because it also depends on the length of the path to the source file. |
|
What has been the testing strategy for this patch? Are we very sure that removing the order dependence doesn't break some packages? It would also be better to guard passing |
|
I can come back to this now. FYI I tested this together with the corresponding GHC MR by compiling the librocksdb C++ library with cabal (rocksdb-haskell, see https://github.com/kadena-io/rocksdb-haskell) and it took the compilation time from ~20-30min down to less than 5min. |
|
As bgamari pointed out (https://gitlab.haskell.org/ghc/ghc/-/issues/24642#note_557676) totally parallel compilation doesn't work for C++ code using C++ modules; when using C++ modules, compilation units must be compiled in topological order with respect to module dependencies. As an extra note, it looks to me that it might be impossible to compile C++ modules with clang using cabal's native sources feature, because the output files for each module must end in the I suggest that detecting this topological order and compiling respecting it is feasible, because clang, gcc and Visual C++ all support dumping the module dependencies of C++ files, which we can collect to generate a compilation order. We could hypothetically compute this in cabal and then pass that to ghc. If this isn't feasible to do this in cabal, perhaps I can at least add a cabal flag allowing the user to disable parallel compilation of any particular type of native sources, or maybe in general, set a different Edit: I just noticed that #9938 also exists. |
29362f2 to
393f49b
Compare
|
@alt-romes @mpickering was able to pick this back up, new changes:
So I believe it's ready for review. Edit: also, I don't think we can really compile C++ modules in cabal anyway. If you look at this example of the command line options involved you can see that you need to both compile each module with special options and produce a |
6776b25 to
79599d9
Compare
I made the latter change. For the former: is building acme-everything sufficient? What's the usual way I'd do this? |
|
|
And all packages successfully built except the 5 I know some concurrency is still happening for extra-sources because of the automated test in this PR. @mpickering @geekosaur I hope that's sufficient. |
79599d9 to
9ca7020
Compare
All C sources for a component are now compiled with a single one-shot (the `-c` option) GHC invocation, rather than one-per-source, and ditto for each other type of extra-source. In addition, `--semaphore` now propagates `-jsem` to these GHC invocations if working with a GHC that supports this (see https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388/ for the status of that support).
9ca7020 to
a9fa0f1
Compare
Pass
ParStratto GHC when we build extra sources, and batch up file paths into GHC invocations, so that GHC can build them in parallel.This doesn't actually make GHC build these sources in parallel right now, because GHC in one-shot mode doesn't even listen to
-j, though it doesn't cause an error or anything. But it hopefully will soon! https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12388An alternative would be to make GHC's
--makeunderstand that it has to build foreign sources after Haskell modules. I didn't investigate that possibility too thoroughly.QA Notes
GHC invocations when building projects with foreign sources should have a) all of those sources in a single GHC invocation and b)
-jsemif--semaphorewas passed to cabal.Template Α: This PR modifies
cabalbehaviourInclude the following checklist in your PR: