fowler-lab · DylanAdlard · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026 · Jan 22, 2026
diff --git a/.DS_Store b/.DS_Store
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -1,44 +1,36 @@
-name: Conda CI
+name: CI
 
 on: [push, pull_request]
 
 jobs:
-  build:
+  test:
     runs-on: ubuntu-latest
 
     steps:
-      - name: Check out repository code
-        uses: actions/checkout@v2
+      - name: Checkout repository
+        uses: actions/checkout@v4
 
-      - name: Set up Conda
-        uses: conda-incubator/setup-miniconda@v2
+      - name: Set up Python
+        uses: actions/setup-python@v5
         with:
-          auto-activate-base: false
+          python-version: "3.10"
 
-      - name: Create Conda environment
-        run: conda env create --file env.yml
-
-      - name: Activate Conda environment and install dependencies
+      - name: Upgrade pip
         run: |
-          source $CONDA/bin/activate catomatic
-          pip install -e .
+          python -m pip install --upgrade pip
 
-      - name: Verify Conda environment
+      - name: Install package + dev dependencies
         run: |
-          source $CONDA/bin/activate catomatic
-          conda info --all
-          conda list
-
-      - name: Set PYTHONPATH
-        run: echo "PYTHONPATH=$PYTHONPATH:$(pwd)/src" >> $GITHUB_ENV
+          pip install .[dev]
 
-      - name: Run Pytest and Coverage
+      - name: Run tests with coverage
         run: |
-          source $CONDA/bin/activate catomatic
-          pytest --cov=catomatic src/tests/ --cov-report=xml
+          pytest src/tests/ \
+            --cov=catomatic \
+            --cov-report=xml
 
-      - name: Upload Coverage to Codecov
-        uses: codecov/codecov-action@v2
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v4
         with:
-          files: ./coverage.xml
+          files: coverage.xml
           token: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.github/workflows/mypy.yml b/.github/workflows/mypy.yml
@@ -0,0 +1,24 @@
+name: mypy
+
+on: [push, pull_request]
+
+jobs:
+  type-check:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+
+      - name: Install package + dev deps
+        run: |
+          pip install .[dev]
+
+      - name: Run MyPy
+        run: |
+          mypy src/catomatic/. --pretty
diff --git a/README.md b/README.md
@@ -1,9 +1,12 @@
 [![codecov](https://codecov.io/gh/fowler-lab/catomatic/branch/ecoff/graph/badge.svg?token=8fnOy6rHCd)](https://codecov.io/gh/fowler-lab/catomatic) [![DOI](https://zenodo.org/badge/801462003.svg)](https://doi.org/10.5281/zenodo.14917920)
 
-
 # catomatic
 
-Python code that algorithmically builds antimicrobial resistance catalogues of mutations.
+catomatic is a Python toolkit for algorithmically constructing antimicrobial resistance (AMR) mutation catalogues directly from variant calls generated by read mapping. Rather than relying on alignment-level pattern matching or predefined resistance motifs, the tool infers resistance associations statistically from observed genotype–phenotype relationships, supporting both binary frequentist and regression-based modelling approaches.
+
+This design is particularly well suited to Mycobacterium species, where resistance is primarily driven by chromosomal point mutations, indels, and complex multi-locus interactions, and where horizontal gene transfer is rare. By operating on mapped mutation data rather than alignment outputs, the framework enables transparent evidence tracking, flexible statistical testing, and reproducible catalogue construction tailored to the evolutionary and genomic characteristics of mycobacteria.
+
+For aligment-relevant approaches, see AMRverse.
 
 ## Introduction
 
@@ -12,7 +15,7 @@ This repo contains 2 approaches to build resistance catalogues:
 1. **Definite defectives (solo-based approach)**
 2. **Interval regression**
 
-The first is used in [https://doi.org/10.1101/2025.01.30.635633](https://doi.org/10.1101/2025.01.30.635633), and the second is a Python translation of the method used in [https://doi.org/10.1038/s41467-023-44325-5](https://doi.org/10.1038/s41467-023-44325-5), but is still under development.
+The first is used in [https://doi.org/10.1101/2025.01.30.635633](https://doi.org/10.1101/2025.01.30.635633), and the second is a Python translation of the method used in [https://doi.org/10.1038/s41467-023-44325-5](https://doi.org/10.1038/s41467-023-44325-5).
 
 ---
 
@@ -52,20 +55,28 @@ Contingency tables, proportions, p-values, and Wilson confidence intervals are s
 
 ## Regression Builder
 
-This method is under development and will be released soon with accompanying documentation.
+The Regression Builder implements a mixed-effect interval regression-based approach for catalogue construction to generate predicted mean MICs. It is suitable when the phenotypes are censored or uncesnored MICs.
+
+If whole genome SNPs are provided, agglomerative clustering can compute random effects to control for population structure. Any given number of fixed-effects (such as lineage and lab) can also be defined by supplying additional input columns.
+
+Similarly to the BinaryBuilder, catalogues can be exported as json objects or piezo-compatible tables.
 
 ---
 
 ## Installation
 
-### Using Conda
+### Installation from source
 
-We recommend using Conda for environment and dependency management.
+Assuming in project directory (after git cloning)
 
 ```bash
-conda env create -f env.yml
-conda activate catomatic
-pip install .
+pip install -e .
+```
+
+### Pypy installation
+
+```bash
+pip install catomatic
 ```
 
 ## Running catomatic's Binary Builder
@@ -75,7 +86,7 @@ You need two input DataFrames:
 - **Samples**: one row per sample, with 'R' or 'S' phenotypes (`UNIQUEID`, `PHENOTYPE`)
 - **Mutations**: one row per mutation per sample (`UNIQUEID`, `MUTATION`)
 
-If exporting to Piezo format:
+If exporting to Piezo format (`--to_piezo`):
 
 - The `MUTATION` column must follow GARC1 grammar (`gene@mutation`)
 - A path to a `wildcards.json` file (containing mutation rules) must be provided
@@ -118,7 +129,7 @@ After installation, the simplest way to run the catomatic catalogue builder is v
 #### Export to JSON
 
 ```bash
-python -m catomatic binary \
+catomatic binary \
   --samples path/to/samples.csv \
   --mutations path/to/mutations.csv \
   --to_json \
@@ -128,7 +139,7 @@ python -m catomatic binary \
 #### Export to Piezo format
 
 ```bash
-python -m catomatic binary \
+catomatic binary \
   --samples path/to/samples.csv \
   --mutations path/to/mutations.csv \
   --to_piezo \
@@ -160,14 +171,128 @@ python -m catomatic binary \
 | `--tails`          | `str`   | Tail type for statistical test. One of: `one`, `two`. Optional. Defaults to `two`.             |
 | `--strict_unlock`  | `flag`  | If set, disables classification of susceptible (`S`) mutations unless statistically confident. |
 
+## Running catomatic's Regression Builder
+
+You need two input DataFrames:
+
+- **Samples**: one row per sample, with an MIC column (`UNIQUEID`, `MIC`)
+- **Mutations**: one row per mutation per sample (`UNIQUEID`, `MUTATION`)
+
+If exporting to Piezo format (`--to_piezo`):
+
+- The `MUTATION` column must follow GARC1 grammar (`gene@mutation`)
+- A path to a `wildcards.json` file (containing mutation rules) must be provided
+
+### Python/Jupyter Example
+
+```python
+from catomatic.RegressionCatalogue import RegressionBuilder
+
+# fit the model to generate mutation effects
+model, effects = RegressionBuilder(samples=samples_df, mutations=mutations_df).predict_effects()
+
+# classify effects and generate a catalogue (requires an ecoff)
+catalogue = RegressionBuilder(samples=samples_df, mutations=mutations_df).build(ecoff=1.0)
+
+# View dictionary version
+cat_dict = catalogue.return_catalogue()
+
+# Convert to Piezo-compatible format
+catalogue_df = catalogue.build_piezo(
+    genbank_ref='...',
+    catalogue_name='...',
+    version='...',
+    drug='...',
+    wildcards='path/to/wildcards.json'
+)
+
+# Optionally export to CSV
+catalogue.to_piezo(
+    genbank_ref='...',
+    catalogue_name='...',
+    version='...',
+    drug='...',
+    wildcards='path/to/wildcards.json',
+    outfile='path/to/output.csv'
+)
+```
+
+### CLI
+
+Similarly to BinaryBuilder, one can instantiate RegressionBuilder from the command line:
+
+#### Export to JSON
+
+```bash
+catomatic regression \
+  --samples path/to/samples.csv \
+  --mutations path/to/mutations.csv \
+  --ecoff 1.0 \
+  --to_json \
+  --outfile path/to/output/catalogue.json
+```
+
+#### Export to Piezo format
+
+```bash
+catomatic regression \
+  --samples path/to/samples.csv \
+  --mutations path/to/mutations.csv \
+  --ecoff 1.0 \
+  --to_piezo \
+  --outfile path/to/output/catalogue.csv \
+  --genbank_ref '...' \
+  --catalogue_name '...' \
+  --version '...' \
+  --drug '...' \
+  --wildcards path/to/wildcards.json
+```
+
+### CLI Parameters
+
+### CLI Parameters (Regression Builder)
+
+| Parameter            | Type          | Description & default                                                                                                             |
+| -------------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------- |
+| `--samples`          | `str`         | Path to the samples file (CSV). **Required**.                                                                                     |
+| `--mutations`        | `str`         | Path to the mutations file (CSV). **Required**.                                                                                   |
+| `--genes`            | `str[]`       | List of RAV genes. Required when non-RAV genes appear in the mutations table (e.g. when clustering SNP distances). Default: `[]`. |
+| `--dilution_factor`  | `int`         | Dilution factor used in processing. Default: `2`.                                                                                 |
+| `--censored`         | `flag`        | Treat phenotype data as censored. Default: `False`.                                                                               |
+| `--tail_dilutions`   | `int`         | Tail dilutions to use for uncensored data. Default: `1`.                                                                          |
+| `--frs`              | `float`       | Fraction Read Support threshold. Default: `None`.                                                                                 |
+| `--ecoff`            | `float`       | Epidemiological cutoff value for classification. If `None`, it will be computed. Default: `None`.                                 |
+| `--b_bounds`         | `float,float` | Bounds for beta (fixed-effect) coefficients. Two floats: `(min max)`. Default: `(None, None)`.                                    |
+| `--u_bounds`         | `float,float` | Bounds for random-effect coefficients. Two floats: `(min max)`. Default: `(None, None)`.                                          |
+| `--s_bounds`         | `float,float` | Bounds for sigma (residual variance). Two floats: `(min max)`. Default: `(None, None)`.                                           |
+| `--p`                | `float`       | Significance / confidence level. Default: `0.95`.                                                                                 |
+| `--fixed_effects`    | `str[]`       | Column names to include as fixed effects. Default: `None`.                                                                        |
+| `--random_effects`   | `flag`        | Perform SNP clustering and include cluster as a random effect. Default: `False`.                                                  |
+| `--cluster_distance` | `float`       | Distance threshold for SNP clustering. Default: `1`.                                                                              |
+| `--outfile`          | `str`         | Path to save output JSON or Piezo file. Required with `--to_json` or `--to_piezo`.                                                |
+| `--options`          | `dict`        | Options passed to `scipy.optimize.minimize`. Default: `None`.                                                                     |
+| `--L2_penalties`     | `dict`        | Regularisation penalties for fixed and random effects. Default: `None`.                                                           |
+| `--to_json`          | `flag`        | Export the resulting catalogue to JSON format.                                                                                    |
+| `--to_piezo`         | `flag`        | Export the resulting catalogue to Piezo-compatible CSV format.                                                                    |
+| `--genbank_ref`      | `str`         | GenBank reference string for Piezo export. Required with `--to_piezo`.                                                            |
+| `--catalogue_name`   | `str`         | Name of the catalogue. Required with `--to_piezo`.                                                                                |
+| `--version`          | `str`         | Catalogue version. Required with `--to_piezo`.                                                                                    |
+| `--drug`             | `str`         | Drug associated with the mutations. Required with `--to_piezo`.                                                                   |
+| `--wildcards`        | `str`         | Path to JSON file containing wildcard mutation rules. Required with `--to_piezo`.                                                 |
+| `--grammar`          | `str`         | Grammar used in the catalogue. Default: `GARC1`.                                                                                  |
+| `--values`           | `str`         | Values used for predictions in the catalogue. Default: `RUS`.                                                                     |
+| `--for_piezo`        | `flag`        | If set, enables Piezo-specific placeholder rows. Omit if not exporting to Piezo. Default: `False`.                                |
+
 ### Notes
 
 - When using post-hoc rule updates via .update(), you must provide wildcards and set replace=True if you intend to override existing entries.
 - For Piezo export, placeholder entries are inserted automatically if needed to satisfy parser requirements (R, S, and U must be represented).
 - The EVIDENCE column includes contingency tables, proportions, confidence intervals, and p-values, and may optionally include sample IDs if `record_ids=True`.
+- To build a catalogue with the regression builder, as currently implemented, requires an ecoff as it will compare the predited effected against the background to supply an R/S/U label
+  - To only calculate predicted effects, this can be done in Python by calling RegressionBuilder.predict_effects()
 
 ## Citation
 
 If you use catomatic in your research, please cite:
 
-- https://doi.org/10.1101/2025.01.30.635633
+- https://doi.org/10.1099/mgen.0.001429