Deterministic impact analysis for clinical data mappings
Diff, verify, and explain SDTM-style mapping changes across specs and registries - deterministically, before submission.
Cheshbon is a kernel-grade engine that answers a single question reliably:
It operates purely on artifacts (specs, bindings, registries) and fails loudly when invariants are violated.
Given mapping_spec v1 and v2, determine exactly which derived outputs are impacted-and why-without executing any transforms.
Given two versions of a mapping specification, Cheshbon determines:
- What changed - structural changes, not textual diffs
- What's impacted - direct vs transitive effects on derived variables
- Why - explicit dependency paths explaining each impact
- What's unaffected - outputs that remain valid and do not require regeneration
All analysis is:
- Deterministic - same inputs produce identical outputs
- Execution-free - no transform execution or data access required
- Artifact-only - operates on specs, bindings, registries
- Reproducible - verifiable hash chains and canonical JSON
Cheshbon is designed for clinical programmers, data engineers, and platform teams who need deterministic answers about mapping changes in regulated pipelines.
Install (from source or wheel):
pip install cheshbonFor development:
pip install -e .Run impact analysis between two specs:
cheshbon diff --from spec_v1.json --to spec_v2.jsonReport modes:
full: human-facing markdown + JSON (default).core: minimal, machine-first JSON for automation/perf.all-details: machine-first JSON evidence (analysis only).
All-details reports assert only analytical impact semantics under the Cheshbon kernel contract; they make no claims about code execution, data correctness, or regulatory acceptance.
Verify an all-details report against artifacts:
cheshbon verify report impact.all-details.json --from spec_v1.json --to spec_v2.jsonReport verification validates all-details artifacts (digests + witness invariants). It is not an execution validator.
Verify kernel-native artifacts (schema + integrity checks):
cheshbon verify spec spec.json
cheshbon verify registry registry.json
cheshbon verify bindings bindings.jsonVerify and ingest SANS run bundles:
# Verify bundle integrity
cheshbon verify bundle <bundle_dir>
# Ingest bundle and materialize artifacts
cheshbon ingest sans --bundle <bundle_dir> --out <out_dir>Run kernel diff directly on two SANS bundles (adapter-only, deterministic):
cheshbon run-diff --bundle-a <bundle_dir_a> --bundle-b <bundle_dir_b> --out <out_dir>Reports are written to reports/ in machine-readable (JSON) and human-readable (Markdown) formats.
When available, SANS vars.graph.json edge kind values (flow, derivation, rename) are preserved: impact.json paths are emitted as typed edge hops, and the markdown dependency paths include hop kinds.
Value-level change annotations appear only when runtime evidence provides column stats (preferred) or when small output tables can be scanned; otherwise value_evidence is absent or marked unavailable and no values are guessed.
Cheshbon provides a minimal public API for programmatic use:
from cheshbon import diff, validate, DiffResult, ValidationResult
# Diff analysis between two specs
result: DiffResult = diff(
from_spec="spec_v1.json", # Path or dict
to_spec="spec_v2.json", # Path or dict
registry="registry.json" # Optional: Path or dict
)
# Access results
print(f"Impacted: {result.impacted_ids}")
print(f"Reasons: {result.reasons}")
print(f"Paths: {result.paths}")
# Validation/preflight checks
validation_result: ValidationResult = validate(
spec="spec_v1.json",
registry="registry.json" # Optional
)The public API exposes only high-level functions that return complete, structured results. Internal implementation details (_internal modules) are not accessible from the public namespace.
Stable API (v1.0 contract):
cheshbon.api- Core functions and result types:diff()- Diff analysis between two specsvalidate()- Validation/preflight checksDiffResult- Result model for diff analysisValidationResult- Result model for validation
cheshbon.contracts- Compatibility models:CompatibilityIssue- Compatibility issue modelCompatibilityReport- Compatibility report model
Root exports (convenience aliases, may change):
- Functions and types are also exported from
cheshbonroot for convenience - Root exports are not part of the v1.0 contract and may be removed or changed in future versions
- For stable code, import from
cheshbon.apiandcheshbon.contractsexplicitly
Version:
__version__- Package version (available from root)
Given
mapping_specv1 and v2, the system can determine exactly which derived outputs are impacted by the change, and explain why, without re-executing the transform.
This is proven through:
- Explicit dependency graphs
- Structural change events (not raw diffs)
- Transitive closure computation
- Binding-aware validation
============================================================
MAPPING SPEC CHANGE IMPACT ANALYSIS
============================================================
## Changes Detected
- Source column added: `RFSTDT` (ID: s:RFSTDT)
- Derived variable `AGE` inputs changed (ID: d:AGE)
- Old: ['s:BRTHDT', 's:RFSTDTC']
- New: ['s:BRTHDT', 's:RFSTDT']
## Impact Analysis
### Impacted Variables (2)
- **AGE** (ID: d:AGE)
- Dependency path: AGE
- Reason: DIRECT_CHANGE
- **AGEGRP** (ID: d:AGEGRP)
- Dependency path: AGE -> AGEGRP
- Reason: TRANSITIVE_DEPENDENCY
### Unaffected Variables (2)
- SEX_CDISC (ID: d:SEX_CDISC)
- USUBJID (ID: d:USUBJID)
- Stable IDs: Identity separate from display names (renames are metadata-only)
- Precise Impact: Distinguishes DIRECT_CHANGE vs TRANSITIVE_DEPENDENCY
- Unresolved References: Explicitly tracks MISSING_INPUT, MISSING_BINDING, MISSING_TRANSFORM_REF
- Binding Layer: Handles raw schema drift without contaminating core ontology
- Transform Registry: First-class transform artifacts with fingerprinting (impl_hash, params_hash)
- Control Plane: Detects registry-level changes (transform impl changed) even when spec is unchanged
- No Heuristics: Everything is explicit and deterministic
.
|-- src/
| `-- cheshbon/
| |-- kernel/ # Core kernel modules
| | |-- all_details_builders.py # Shared all-details builders
| | |-- bindings.py # Binding layer (raw schema -> stable IDs)
| | |-- binding_impact.py # Binding-aware impact propagation
| | |-- diff.py # Structural diff -> change events
| | |-- explain.py # Structured explanations (no rendering)
| | |-- graph.py # Dependency graph builder
| | |-- hash_utils.py # Canonicalization and hashing
| | |-- impact.py # Impact analysis
| | |-- spec.py # Mapping spec models (Pydantic)
| | |-- transform_registry.py # Transform registry models
| | `-- witness.py # Witness generation for all-details
| |-- _internal/ # Internal verification tools
| | |-- io/ # Artifact I/O helpers
| | |-- reporting/ # Report rendering helpers
| | |-- benchmarks.py # Perf sentinel benchmarks
| | |-- verify_artifacts.py # CLI verify helpers
| | `-- report_doctor.py # All-details report doctor
| |-- api.py # Public API
| |-- cli.py # Main CLI entry point
| |-- diff.py # Diff wrapper + report generation
| |-- report_all_details.py # All-details report builder
| `-- contracts.py # Compatibility models
|-- fixtures/ # Golden scenario examples and test fixtures
|-- tests/ # Pytest test suite
`-- docs/ # Documentation
pytest tests/ -vAll tests pass (129+ tests covering kernel, CLI, and golden scenarios).
- Quick Start Guide - Step-by-step workflow
- Architecture Reference - Core design and principles
- Graph Diff Contract - Canonical graph-diff/impact semantics
- Context and Glossary - Project context and terminology
- SANS Ingestion - SANS bundle ingestion and verification
- Graph Diff - Bundle graph diff + impact outputs
- Change Events Ontology - Change event types and impact reasons
- Binding Layer - Binding layer design and usage
- What Cheshbon Will Never Do - Explicit non-goals and scope boundaries
- v1.0 Contract - Complete v1.0 API contract
- Kernel Contract - Core kernel guarantees and exclusions
- Implementation Notes - Transform registry implementation details
- Key Questions - Important design decisions explained
- Performance Sentinels - Frozen perf sentinels and cap audit
- Golden Scenarios - Golden scenario examples for
cheshbon diff
- No LLM calls
- No execution engine
- No UI
- No database
- No SDTM semantics beyond toy naming
- No multi-domain orchestration
This is a kernel, not a product. It proves the invariant or fails loudly.
Cheshbon OSS v1.0 is intentionally artifact-centric.
Workspace management, authoring tools, and orchestration layers live outside the kernel and may appear in future releases or commercial offerings.
Bundle tooling (zip export/import and bundle doctor) is intentionally out of kernel scope and belongs to higher-level tooling outside the kernel.
cheshbon(TM) is a trademark of Joe Norton. Use of the name in commercial offerings requires permission.
See TRADEMARK for more information.
This project is licensed under the Apache License, Version 2.0. See NOTICE for copyright information.