Small, deterministic compiler and executor for a strict SAS‑like batch subset.
sans compiles SAS‑like scripts into a machine-readable IR (plan.ir.json), executes them against tabular data, and emits a detailed execution report (report.json). It is built for auditability, reproducibility, and strict safety.
- Strict by Default: Unsupported constructs refuse the entire script with stable error codes.
- Deterministic: Bit‑identical outputs (CSV/XPT) across Windows and Linux.
- Audit‑Ready: Every run generates a signed manifest (SHA‑256) of all inputs and outputs.
- Portable: No SAS installation required; zero‑dependency runtime (except
pydanticfor schema).
pip install -e .This installs the sans CLI command. You can also use python -m sans.
sans supports a modern .sans DSL or a strict SAS‑like subset.
# example.sans
# sans 0.1
datasource in = inline_csv do
a,b
6,7
3,2
end
table t = from(in) do
derive(base2 = a * 2)
filter(base2 > 10)
select a, base2
end
save t to "out.csv"
Verify a script without executing it. Emits the execution plan (plan.ir.json) and a refusal/ok report.
sans check example.sans --out outGenerate deterministic canonical sans.ir directly from a script using the same compile/check path as sans check.
sans emit-ir example.sans --out out/example.sans.ir
sans ir-validate --strict out/example.sans.irUse --cwd to set the compilation working directory (for relative paths in script context):
sans emit-ir script.expanded.sans --out out/script.sans.ir --cwd fixtures/inputsCompile, validate, and run. Emits output tables (CSV/XPT) and the final signed manifest.
sans run example.sans --out outVerify that a previously generated report matches the current state of files on disk.
sans verify out/report.jsonCanonicalize .sans formatting (presentation only).
sans fmt example.sans
sans fmt example.sans --check
sans fmt example.sans --in-placesans fmt is a pure formatter: it changes presentation only and guarantees parse‑equivalence and idempotence.
Modes
canonical(default): applies the canonical v0 style.identity: preserves bytes (except newline normalization to\n).
Flags
--check: exit non‑zero if formatting would change the file.--in-place: rewrite the file atomically (writes a temp file, then replaces).
Examples:
sans fmt script.sans
sans fmt script.sans --mode identity
sans fmt script.sans --check
sans fmt script.sans --in-placeThe native DSL provides a clean, linear syntax for data pipelines. It is safer than SAS, with strict rules for column creation and overwrites.
- Additive by default: Use
derive(col = expr)to create new columns only (error if column exists). - Explicit overwrites: Use
update!(col = expr)to modify existing columns only (error if missing). - Explicit output: Outputs are defined only via save; there is no implicit "last table wins."
- Explicit cast: Use
cast(col -> type [on_error=null] [trim=true], ...)for deterministic type conversion; target types:str,int,decimal,bool,date,datetime. Evidence (cast_failures, nulled) is emitted in runtime.evidence.json. - Stable ties: Sorting is stable;
nodupkeypreserves the first encountered row.
expanded.sans is the canonical human-readable form (fully explicit, no blocks, kernel vocabulary only); scripts are sugar that lower to the same IR. Compiling expanded.sans must reproduce the same plan.ir.json (byte-identical aside from quarantined metadata).
# process.sans
# sans 0.1
datasource raw = csv("raw.csv")
table enriched = from(raw) do
derive(base_val = a + 1)
filter(base_val > 0)
update!(base_val = base_val * 10)
derive(risk = if(base_val > 100, "HIGH", "LOW"))
cast(base_val -> str)
select(subjid, base_val, risk)
end
save enriched to "enriched.csv"
- DATA Step:
set,merge(within=),by(first./last.),retain,if/then/else,keep/drop/rename. - Dataset Options:
(keep= drop= rename= where=). - Procs:
proc sort(nodupkey)proc transpose(by,id,var)proc sql(Inner/Left joins,where,group by, aggregates)proc format(Value mappings +put()lookups)proc summary(Class means withautoname)
- Macro‑lite:
%let,%include,&var, single‑line%if/%then/%else.
sans guarantees stability through strict runtime rules:
- Missing Values: Nulls sort before all data and satisfy
null < [value]. - Numeric Precision: Uses
Decimalto prevent float precision loss. - I/O Normalization: Enforces LF (
\n) and deterministic CSV quoting. - Stable Hashes: Artifact hashes are invariant across OS platforms.
See DETERMINISM.md for the sacred v1 invariants.
Typed CSV ingestion without hand-typing every column: a run must have either typed pinning in the datasource (e.g. columns(a:int, b:decimal)) or a schema lock file.
-
Generate a lock
Recommended: use the dedicated subcommand (no--outrequired; lock written next to the script by default):sans schema-lock script.sans
This writes
<script_dir>/<script_stem>.schema.lock.json(e.g.demo_high.schema.lock.jsonnext todemo_high.sans). Use-o PATHor--write PATHto override; relative paths are resolved against the script directory. Optionally add--out DIRto also writereport.jsonand stage inputs underDIR/inputs. No pipeline execution.Alternatively, after a successful run or from untyped CSVs via run:
sans run script.sans --out out --emit-schema-lock schema.lock.json
The lock is written under
--outwhen the path is relative. With untyped datasources the tool runs in lock-only mode; otherwise it runs normally and emits the lock after success. Stdout shows(lock-only)or(after run). -
Run with a lock
Omit column types in the script and pass the lock:sans run script.sans --out out --schema-lock schema.lock.json
The lock is copied into
out/schema.lock.jsonso the bundle is self-contained. Extra columns in input are ignored; missing columns or type mismatches fail with clear codes. The report includesschema_lock_sha256,schema_lock_used_path, andschema_lock_copied_path; when a lock was emitted, alsoschema_lock_modeandschema_lock_path. -
Lock-only via run
To generate a lock under--outwithout running (e.g. for CI bundles):sans run script.sans --out out --emit-schema-lock schema.lock.json --lock-only. Prefersans schema-lock script.sanswhen you only need the lock file.
See SCHEMA_LOCK_V0.md for the full contract, path resolution, report fields, inference rules, and error codes.
- Specs: SUBSET_SPEC.md | REPORT_CONTRACT.md | IR_CANONICAL_PARAMS.md | SCHEMA_LOCK_V0.md
- Internals: ARCHITECTURE.md | IR_SPEC.md
- Guidance: ERROR_CODES.md | BIG_PIC.md