Skip to content

Use LLVM to target *-pc-windows#120

Draft
MisterDA wants to merge 31 commits intotrunkfrom
cross-compile-clang-cl
Draft

Use LLVM to target *-pc-windows#120
MisterDA wants to merge 31 commits intotrunkfrom
cross-compile-clang-cl

Conversation

@MisterDA
Copy link
Owner

@MisterDA MisterDA commented Feb 1, 2025

TL;DR:

With OCaml 5.4 we’ll have support to build cross-compilers. To target native Windows, one can already use the mingw-w64 project. It turns out that LLVM has a full re-implementation of all the Microsoft Visual Studio toolchain. The OCaml build system needs a few tweaks to work with it, enabling compilation with LLVM either from Windows or using LLVM as a cross-compiler, to support the *-pc-windows target. This PR provides the few tweaks.

The full story:

There are two incompatible ABIs for native Windows code, materialized by the *-pc-windows and the *-w64-mingw32 targets. The first one is provided by the Microsoft Visual Studio (MSVC) toolchain, and the second one by the mingw-w64 project, a port of GCC to Windows. In recent years, the LLVM project has re-implemented the whole Visual Studio toolchain. The new toolchain is used to build Chrome and Firefox.

During the OCaml 5.3 development cycle, we restored the MSVC port of OCaml, and I added support for building the port with clang-cl, an alternative clang front-end that is ABI and API compatible with MSVC. During the OCaml 5.4 development cycle, support for building OCaml cross-compilers was greatly improved. Pursuing these two projects, I propose we extend OCaml’s build system to fully support and recognize the LLVM tools replacing the Microsoft toolchain. This change brings in positive side-effects:

  • use of a free-software toolchain for the *-pc-windows target;
  • cross-compilation for the *-pc-windows target, bringing OCaml on par with e.g., Rust;
  • ability to select the linker used at configure-time, allowing the use of alternative, more efficient linkers;
  • strengthen the OCaml build system by recognizing alternative tools.

Adding support of *-pc-windows as a cross-compiling target also enables users to target it in their preferred development environment or in CI, which can be easier than setting up a Windows development environment. All the required LLVM tools are already packaged by the major distributions.

A minor setback that is also encountered by the Rust ecosystem is that users have to install the Visual Studio toolchain to get Windows system headers and libraries. This can fortunately be automated, even in cross-compiling contexts (provided Wine is available on the build system). For our CI I've used the mstorsjo/msvc-wine project from Martin Storsjö, a mingw-w64 and LLVM maintainer.

Another minor setback is that as of LLVM 20, the llvm-ml64 tool which replaces Microsoft’s assembler ml64.exe for assembling the MASM code in the OCaml runtime doesn’t support all the features required by the existing assembly code. I’ve opened bug reports and sparked the interest of Eric Astor, a Google engineer, who authored and maintains the tool for Chrome. There are good chances that all the fixes will land for LLVM 21. In the meantime, for the cross-compilation scenario, the ml64.exe tool can be automatically downloaded and run with Wine, and used instead of the LLVM assembler. For the non-cross compilation, the Microsoft assembler is already included in the Visual Studio installation. In both cases, it’s possible to validate the whole process without llvm-ml64. I believe this is not a blocker for reviewing this work, and I wrote the commits taking care that the work-around can be easily reverted once llvm-ml64 is fixed.
I'd like to express my deepest thanks to Eric for his enthusiasm in expanding llvm-ml64, and the impressive amount of reverse engineering, digging, hacking, and reviewing that was needed to fix llvm-ml64 for the 710 lines of MASM we have in the runtime!

For the curious, here are the relevant issues and pull requests on the LLVM project:

This adventure also led to patches to FlexDLL that have all landed as of 0.44. For the curious:

Here are a few PRs that I've split off from this patch series, but that were directly found by working with clang and the new CI scripts:

I believe that this PR is best reviewed commit-by-commit. The first commits add fixes for the cross-compilation. Then, it's about adding configure variables (the names are dictated by libtool) and allowing the configure script to discover the LLVM tools. The chosen tools need to be passed to FlexDLL too. Afterwards, two patches to improve support of clang and lld-link, and then the CI scripts.
The hard part is definitely the assembler and linker selection patches. Seeing that the CI's all green, I do hope that they're correct.

Happy reviewing!

@MisterDA MisterDA force-pushed the cross-compile-clang-cl branch 3 times, most recently from 034e162 to 6da5dfc Compare February 3, 2025 10:43
@MisterDA MisterDA force-pushed the cross-compile-clang-cl branch 11 times, most recently from 87ca6df to d0929d5 Compare February 13, 2025 08:47
@MisterDA MisterDA force-pushed the cross-compile-clang-cl branch 14 times, most recently from 683864f to 9284774 Compare February 19, 2025 14:11
MisterDA and others added 28 commits September 18, 2025 01:37
This allow overriding the default resource processor, used during
FlexDLL's bootstrap. The variable name is defined by libtool.

Pass the variable to the flexlink Makefile.
FlexDLL's Makefile has its own mechanism to discover the C
compiler. Override it with the selected C compiler.
This allows using `llvm-mt` instead of `mt.exe`. The MANIFEST_TOOL
variable is defined by libtool.
The intermediate sak.obj file is always produced, clean it as it
doesn't appear in the Makefile targets.
clang-cl always produces a PDB file, unless explicitly asked not to.
Add more dummy files to avoid this error when running 'make
installcross' with a *-pc-windows target on a unix build/host.

    /usr/bin/install: cannot stat 'driver/main.obj': No such file or directory
    /usr/bin/install: cannot stat 'driver/optmain.obj': No such file or directory
    /usr/bin/install: cannot stat 'toplevel/topstart.obj': No such file or directory
- don't check if .gitmodules contains a copyright header;
- AppVeyor may contain very long lines (for RDP access).
- Use Autoconf macros to discover the correct target prefix for
  `as`. Results are cached.
- Call `as` directly, bypassing the C compiler driver, on all
  platforms that expose `as`.
- Allow using `llvm-ml`/`llvm-ml64` when building with clang-cl. Note
  that LLVM 21 is required to assemble the current amd64nt.asm file.
- Remove the ASPP variable by imitating GNU Make implicit rules. Add
  support for ASFLAGS.

      # default
      COMPILE.S = $(CC) $(ASFLAGS) $(CPPFLAGS) -c
      COMPILE.s = $(AS) $(ASFLAGS)
      # default
      PREPROCESS.S = $(CPP) $(CPPFLAGS)

      %.o: %.s
      #  recipe to execute (built-in):
              $(COMPILE.s) -o $@ $<
      %.o: %.S
      #  recipe to execute (built-in):
              $(COMPILE.S) -o $@ $<
      %.s: %.S
      #  recipe to execute (built-in):
              $(PREPROCESS.S) $< > $@
In some cases the linker is called directly instead of through the C
compiler driver. Pass the LD variable to configure to set the linker.

LLVM provides the LLD linker, which has three front-ends:
- lld-link, a drop-in replacement to Microsoft link.exe;
- ld64.lld, a drop-in replacement to Apple's ld64 linker;
- ld.lld, a drop-in replacement ot GNU's linker.
LLD executables don't use the target prefix.
GCC, clang, and clang-cl accept the `-fuse-ld=<ld>` option to call a
specific linker. This can be passed through the LDFLAGS variable when
the linker is invoked through the compiler driver.

When calling the linker through the cl.exe or clang-cl.exe compiler
drivers, linker flags (LDFLAGS) have to be passed after the `/link`
flag. Unfortunately, the `-fuse-ld=<ld>` option of clang-cl has to be
passed to clang-cl *before* `/link`, as this option is needed by the
compiler driver, rather than the linker. This requires a bit of hacks.
LLVM's SHT_LLVM_ADDRSIG Section (address-significance table) is a
linker optimization. It's a list of indexes corresponding to entries
in the symbol table.

Unfortunately, the list can easily get out-of-sync during objcopy-like
operations. FlexDLL doesn't handle addrsig sections, which makes
lld-link choke on object files processed by FlexDLL. Don't emit them.

https://llvm.org/docs/Extensions.html#sht-llvm-addrsig-section-address-significance-table

See `llvm-readobj --addrsig xxx.obj` to decode and inspect the
table.
clang-cl implements GNU C extensions but warns against them. We
notably use "Labels as Values" (so-called "computed gotos") [1], and
arithmetic on void pointers [2] in the interpreter.
See the list of GNU-related warnings: [3].

[1]: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
[2]: https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html
[3]: https://clang.llvm.org/docs/DiagnosticsReference.html#wgnu
lld-link-21 doesn't support it.
As of LLVM 20, llvm-ml (the LLVM MASM assembler) isn't able to
assemble runtime/amd64nt.asm.
Use LLVM 21 snaphost builds.
It seems that flexlink doesn't produce object files supporting
SafeSEH. If Microsoft link.exe encounters an object file without a
SafeSEH section, it will silently fallback to linking an executable
without SafeSEH; whereas lld-link will reject the whole
executable. Explicitely disable SafeSEH when the linker is called
through flexlink for now.
@MisterDA MisterDA force-pushed the cross-compile-clang-cl branch from 7151bfa to d0c64bc Compare September 18, 2025 00:23
AC_CHECK_PROGS([AS], [$search_as], [AC_MSG_ERROR([missing assembler])])
AS_IF([$AS -nologo -help | $FGREP 'LLVM' >/dev/null],
[: ${ASFLAGS:='-c -Fo'}],
[: ${ASFLAGS:='-nologo -quiet -Cp -c -Fo'}])
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing -coff for i686.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants