StatWhy is a software tool for automatically verifying the correctness of statistical hypothesis testing programs. In this file, we present the structure of this repository, the resource requirements for the tool, and show how to verify the programs that implement the hypothesis testing examples addressed in our paper:
StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs.
Yusuke Kawamoto, Kentaro Kobayashi, and Kohei Suenaga. [alphabetical authorship]
Proc. of 37th International Conference on Computer Aided Verification (CAV 2025),
Part II, Lecture Notes in Computer Science, Vol.15932, pp.216-230, July 2025.
root/
├ README.md this file
├ install.sh installation script
├ cameleer source code of *cameleer*
│ └ statwhy source code of *statwhy*
│ ├ lib core library of statwhy, including the specification
│ │ of many hypothesis testing methods
│ └ ...
├ doc
│ └ Statwhy_User_Documentation.pdf user documentation of statwhy
├ examples example codes for replicating the results
│ ├ mlw WhyML codes not explained in this file
│ └ executable_examples OCaml codes executable using pyml and scipy
└ ...
StatWhy has been tested on the following environments:
- MacBook Pro -- M2 Max CPU, 96 GB RAM, macOS 15.3.2
- DELL Precision 5490 -- Intel Core Ultra 9 185H, 64 GB RAM, Windows 11 and Ubuntu 24.04 (via VirtualBox)
We have tested the tool StatWhy by (i) checking that StatWhy can correctly verify various major hypothesis testing methods in a short time in a way that is consistent with standard textbooks on statistics and (ii) proving various lemmas on the properties of epistemic logic (statELHT.mlw) and on those of hypothesis tests and belief Hoare logic (statBHL.mlw).
On Ubuntu 24.04, StatWhy can be installed by running the following commands:
$ chmod +x install.sh
$ ./install.shDuring installation, you will be prompted to enter y or n multiple times.
Press y when prompted.
After installation, restart the machine or log in again to apply the changes made to ~/.profile.
- Install opam
On Ubuntu:
$ sudo apt-get update
$ sudo apt-get install opamOn macOS, install opam with Homebrew:
$ brew install opamAlternatively, if you use MacPorts:
$ port install opam- Install OCaml 5.0.0
To install OCaml 5.0.0, execute the following command:
$ opam switch create 5.0.0
$ eval $(opam env)- Install StatWhy and Cameleer
Download the source code of StatWhy, including an extension of Cameleer. Install \StatWhy{} and \Cameleer{} by running:
$ unzip statwhy-1.3.0.zip
$ cd cameleer
$ opam pin add .This will install Cameleer, StatWhy (included in cameleer/statwhy), and their dependencies.
In the installation of Cameleer, the Why3 platform is automatically installed.
- Install CVC5
On Ubuntu 24.04 LTS, you can install CVC5 by:
$ sudo apt install cvc5Alternatively, you can directly download a binary from the GitHub repository:
$ wget https://github.com/cvc5/cvc5/releases/download/cvc5-1.2.0/cvc5-Linux-x86_64-static.zip
$ unzip cvc5-Linux-x86_64-static.zip
$ sudo cp ./cvc5-Linux-x86_64-static/bin/cvc5 /usr/local/binOn macOS, a Homebrew Tap for CVC5 is also available:
$ brew tap cvc5/homebrew-cvc5
$ brew install cvc5After installing the solver, run the following command to let Why3 detect it:
$ why3 config detectAs an initial test, try running the one-sample t-test program by following the instructions in [Example 1](### Example 1: One-sample t-test (Section 3.1 in our paper)) below.
Note: If you update from an older version of StatWhy or have updated CVC5 since your last StatWhy installation, we recommend deleting the old ~/.statwhy.conf file.
To execute .ml files in examples/executable_examples, you need to install
- pyml (OCaml package)
- scipy (Python library)
Pyml can be installed via opam:
opam install pymlNumpy can be installed via pip:
pip install scipyYou can verify an OCaml code via Cameleer by running the following:
$ statwhy <file-to-be-verified>.mlIf you want to verify a WhyML code:
$ statwhy <file-to-be-verified>.mlwNote that in both cases, you need our extension of Cameleer to load StatWhy.
statwhy <file>.(ml|mlw)
Verify (OCaml|WhyML) program
-L add <dir> to the search path
--genconf generate config file
--debug print debug information
--batch activate batch mode
--extract activate extraction mode
--prover set prover for batch mode
--version print version information
We demonstrate how to verify a program that conducts a one-sample t-test, which is used to compare a population mean with a specified value.
To verify the OCaml program examples/example1.ml, run the following command:
statwhy example1.mlThis will launch the Why3 IDE as follows:
There are two verification conditions (VCs) to be discharged: example_1'vc (the VC for example1) and example1''vc (the VC for example1').
Right-click on the first goal and select 'StatWhy' (Not 'CVC5' or the other items), or press '4' after selecting the goal to make StatWhy try to discharge the goal.
If the prover successfully verifies the goal, a check mark will appear as follows.
The second goal, example1''vc, is similar to the first except that the expression sampled d t_n is missing from the requires-clause, i.e., the specification does not describe the requirement that the data set d is sampled from a normally distributed population.
If this requirement is not satisfied, it is not appropriate for us to use the t-test.
By running StatWhy, you can automatically detect that this requirement is missing in the annnotation of the program. In fact, if you press '4' on example_1samp_t_test_fail'vc, StatWhy will fail to discharge the goal, as shown below. At this point, StatWhy generates sub-goals by applying transformations and attempts to prove them. By examining the results of the sub-goals, you can see which conditions are missing in the annotation of the program.
We emphasize that the above screenshot is the intended result, in which several verification conditions are not discharged due to the lack of the precondition.
In the above screenshot, the condition sampled d (NormalD ...) fails to discharge, which is equivalent to the missing precondition sampled d t_n.
- Approximate execution time: 1.5 min
We have implemented various hypothesis testing methods not presented in the paper. For details, see the User Documentation.


