-
Notifications
You must be signed in to change notification settings - Fork 49
feat! Introduce Regorus type analysis pipeline with CLI support #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
anakrish
wants to merge
43
commits into
microsoft:main
Choose a base branch
from
anakrish:type-inference
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Extracts the parser/runtime adjustments that unblock constant folding. Includes field lookup improvements in the AST, tolerant parsing hooks in the lexer, and interpreter flags and inference helpers for constant folding mode. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Adds the new src/utils/path.rs module and updates src/utils.rs to route reference parsing through it. The helpers normalize rule paths, match wildcard entrypoints, and will be reused by the upcoming type-analysis wiring. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduces src/type_analysis/model.rs with the structural types, provenance descriptors, and constant tracking primitives that future analyzer modules reuse. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduces ConstantFact and ConstantStore so later type-analysis passes can track constant fold results alongside inferred types. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduces helpers that convert runtime Value instances into TypeFact records and infer structural types, providing the bridge used when constant folding feeds the analyzer. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Provides LookupContext and ScopedBindings to track inferred facts, reachable rules, and scope overlays during analysis. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add summary structures for expressions, rules, dependencies, entrypoints, and diagnostics, translating the propagation pipeline’s analysis state into a public result. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce TypeAnalysisOptions to capture input/data schemas, hoisted loop lookups, entrypoint filters, and function specialization toggles for the propagation analyzer. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce AnalysisState and related helpers for lookup caches, diagnostics, body truth tracking, and function specializations, along with the pipeline glue exposing propagation types. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce the TypeAnalyzer core scaffolding and supporting helpers. * analyzer/mod.rs stores module references, optional schedule, hoisted-loop lookup, and constant-folding engine state. It exposes TypeAnalyzer::new, from_engine, and analyze_modules, wiring up rule traversal before rule-analysis code lands. * analyzer/entrypoints.rs resolves globbed entrypoint patterns, normalizes paths, and tracks default rules pulled in when entrypoint filtering is enabled. * analyzer/rule_index.rs builds per-module and global indexes of rule heads so the analyzer can quickly discover definitions while recursing through the program. * analyzer/validation.rs performs pre-flight checks on the collected rule heads (duplicate definitions, conflicting defaults, etc.) and seeds the analysis stack used by later passes. This commit establishes the shared state and orchestration utilities that the remaining propagation components (rule analysis, facts, diagnostics, expression walkers) plug into. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Bring in the TypeAnalyzer rule_analysis layer that drives propagation through rule heads and bodies. * mod.rs wires submodules for heads, bodies, queries, and orchestration. * orchestration.rs coordinates rule traversal, manages recursion/stack tracking, and records constant folding results. * heads.rs handles rule head fact recording and merge rules for constants and provenance. * bodies.rs evaluates each rule body, applies parameter specialization, and stores body query truth metadata. * queries.rs delegates query evaluation into the expression walker bindings that land later. This commit focuses strictly on the control-flow machinery; supporting loop binding and expression inference follow in subsequent commits. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Augment Parser with tolerant mode, expression position tracking, and richer reference handling so the type analyzer can map spans back to expression indices. Export read-only module slots from Lookup and strip UTF-8 BOMs when reading schemas to keep upstream ingestion resilient. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the TypeAnalyzer diagnostics layer that reports schema and type issues uncovered during propagation. * mod.rs wires the diagnostics submodules into the propagation tree. * categories.rs defines coarse-grained structural categories for quickly reasoning about disjoint types. * helpers.rs builds display labels, numeric capability checks, and structural comparisons used by diagnostics. * checks.rs implements the concrete validation routines (equality, arithmetic, set ops, membership, indexing, builtin arity/type checks), emitting TypeDiagnostic records into the analysis state. These helpers plug into the analyzer traversal so type mismatches and schema violations surface alongside the inferred facts. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the fact utilities used by propagation to derive iteration items, extend provenance, and inspect schemas. * mod.rs exports the iteration/origin/schema helpers for the rest of propagation. * iteration.rs builds TypeFact pairs for array/set/object traversals, preserving provenance segments. * origins.rs contains helpers to mark origins as derived and append path segments. * schema.rs provides helper accessors for properties, array items, constants, and enumerations. These utilities underpin the loop binding and expression walkers that follow. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce the loop/destructuring helpers used by TypeAnalyzer rule bodies. * mod.rs wires binding, destructuring, and seeding helpers under propagation::loops. * binding.rs applies binding plans to assign facts to variables and handles SomeIn iterations. * destructuring.rs derives field/element facts from TypeFact inputs, integrating schema and provenance information. * seeding.rs seeds loop iterations with initial facts, bridging hoisted loop metadata into the analyzer. These utilities connect the rule traversal layer to the upcoming expression inference walkers. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Handle unary numeric negation (e.g. or ) by propagating number facts through the analyzer, preserving constants when the operand is a literal and marking provenance. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add TypeAnalyzer support for set union, intersection, and membership. * merges element structural types and combines constants when both operands are literal sets. * narrows to the left-hand element type while preserving provenance. * returns a boolean fact and, when both sides are constants, folds to true/false (handling arrays, sets, objects, and strings). Diagnostics remain wired via and , so reviewers can trace how type mismatches surface alongside these results. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the boolean comparison inference used for expressions like , , , and . We merge operand origins, fold constants when both operands are known, propagate where appropriate, and continue to surface mismatches via the existing equality diagnostics. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Teach the analyzer how to type arithmetic expressions like , , , and . The implementation: * propagates integer vs number results when both operands are known to be integers, * falls back to set semantics for when both operands are sets, * folds constants for every operator when literal operands are provided, and * records merged provenance while relying on existing arithmetic diagnostics for type warnings. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the analyzer path for assignment expressions inside rule bodies (e.g. ). The RHS is fully inferred, we bind plain identifier LHS targets into the current scope, and we emit a Boolean fact tagged with assignment provenance that keeps the RHS origins. This keeps assignment semantics aligned with the existing binding plans handled earlier in the pipeline. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce literal inference helpers so , , , and produce TypeFacts with the correct structural type, constant value, and literal provenance. These helpers underpin the dispatch code that caches facts for literal expressions. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce structural helpers on TypeAnalyzer so expression inference can inspect unions, arrays/sets, and schema metadata. This adds a join_structural_types routine that finds the LUB for collections, flattens unions, and deduplicates variants, plus normalization utilities for arrays, sets, and objects. Also includes field and element accessors that propagate through unions, along with missing-property diagnostics that consult schema definitions and additionalProperties to choose severity. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Teach the expression propagator how to derive types for object property lookups coming from either schema-backed or structural facts.
Rego:
some_string := input.user.name
some_any := data.accounts[id]
When the base fact is schema-backed we reuse schema_property and schema_additional_properties_schema to plumb the child schema, surface constants when available, and emit diagnostics when the property is neither defined nor allowed. On structural facts we fall back to structural_field_type and structural_missing_property_message so unions and literals flow through.
Origin tracking mirrors the base fact by appending a PathSegment::Field so later diagnostics can trace both successes and failures.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Let the expression propagator produce structural facts for array, set, and object literals so downstream access can see precise shapes instead of any/unknown.
Rego:
user_ids := [input.user.id, data.accounts[id].id]
status_codes := {ok, fail}
user := {name: input.user.name, roles: data.accounts[id].roles}
Array literals walk each element, join their structural types, and preserve per-index origins on the resulting array fact. Set literals work the same way but mark origins as derived to reflect the unordered nature of set construction. Object literals synthesize a StructuralObjectShape keyed by any string literal property and thread PathSegment::Field into the child origins.
All three literal forms attach ConstantValue::known when every child is constant so rule bodies that inline literals continue to carry exact runtime values.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the parent operators.rs and structures.rs modules Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Add the lightweight helpers the rest of the expression pipeline depends on. helpers.rs exposes an is_numeric_type guard that arithmetic/assignment inference reuses, and rules/helpers.rs provides merge_rule_facts to join multiple rule branches while carrying over origins and specialization hits. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Bring in the full infer_expr dispatcher that drives every expression node through the type-propagation pipeline.
infer_expr now:
- consults the lookup cache (except for vars) before recomputing and records freshly-derived facts back into the per-module tables.
- handles every Expr variant by delegating to the specialized helpers (literals, comprehensions, operators, rules, etc.), feeding them the module index, expression id, and scoped bindings.
- threads HybridType provenance through property (RefDot) and index (RefBrack) access: it evaluates the base first, invokes infer_property_access for schema/structural objects, extends origins with PathSegment::Field/Index, and consults schema_additional_properties_schema when explicit fields are missing.
- performs constant folding on dot and bracket lookups when both the base and index/field are known, producing Value::Undefined and downgrading the descriptor to StructuralType::Unknown when we read past the available data.
- applies set-membership semantics when indexing into a set with a constant value, yielding a boolean fact that still preserves any derived origins.
- records constants and origin trails in AnalysisState so later rules can emit diagnostics and build specialization signatures, and invokes loop binding plans when the planner indicates the expression participates in a hoisted loop.
Rego (illustrative):
user := input.users[i]
user_name := user.name
group := data.groups[id]
allowed := group.members[user.name]
Each step leverages the dispatcher to propagate structural types, schema-derived facts, and constant information so subsequent statements operate on precise facts.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Teach the dispatcher how to type array, set, and object comprehensions by pushing a new scope, analyzing the generator query, and reusing infer_expr on the comprehension body.
Rego:
user_ids := [u.id | some u in input.users]
enabled_users := {u | u := data.users[_]; u.enabled}
users_map := {u.id: u | u := data.users[_]}
Key points:
- Each comprehension temporarily extends the ScopedBindings stack so query-local variables do not leak and array/set elements inherit the correct provenance.
- analyze_query runs with true to reuse the rule-analysis logic for binding statements, after which the term/value expression is inferred in that context.
- Array/set comprehensions wrap the inferred element StructuralType in array[...] or set[...] respectively, tagging any collected origins as derived to reflect comprehension semantics.
- Object comprehensions return an empty StructuralObjectShape shell for now (future commits will enrich it) while still carrying derived origins from the value expression.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Implement analyze_stmt so literal statements in queries flow through the expression dispatcher and emit truth hints for the planner.
Rego:
some user in input.users
user.enabled
not data.disabled[user.id]
Details:
- evaluates the collection first; if the collection constant is an empty array/set/object we classify the statement as AlwaysFalse and still propagate the constant into the lookup cache.
- Any key/value expressions on the comprehension are inferred afterwards so bindings stay consistent, though their results do not influence the truth value.
- Positive and negated literal expressions reuse infer_expr and translate constant booleans into AlwaysTrue/AlwaysFalse via small helper functions; assignments stay Unknown because unification can succeed even when RHS is falsey.
- relies on hoisted loop binding plans and remains Unknown, while pushes a transient scope, analyzes the nested query with , and conservatively reports Unknown.
- recognizes empty collections across Value shapes, and truth helper functions encapsulate the boolean flip logic for positive/negated cases.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce the rules module scaffold and variable inference so rule bodies can resolve references through the analyzer.
Rego:
default allow = false
allow if {
input.user = data.users[id]
}
Highlights:
- wires in the submodules (variables, lookup, calls) needed for rule evaluation.
- now handles built-ins / via schema facts, consults scoped bindings, and aggregates rule head facts for other names while tracking provenance/origins.
- Helper routines collect object-shaped rule facts by walking matching rule prefixes, merging structural types, and deduplicating origin paths so downstream property access sees concrete fields.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Teach the analyzer to resolve property reads through existing rule heads before falling back to structural inference.
Rego:
some rule in data.policies
decision := data.authz.allow[user]
Highlights:
- follows the static portion of a reference, normalizes rule paths, and records dependencies so we avoid infinite recursion.
- When matching rule heads are found we merge their facts via , ensuring unioned structural types and provenance survive.
- If no facts are available we defer to the structural path so dispatch can continue without hard failures.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Wire up rule calls so infer_call_expr can dispatch to builtins or rule definitions, enforce parameter templates, and record specializations.
Rego:
result := count(input.items)
allow if my_rule(user)
Details:
- resolve_call_path walks the expression tree to produce a canonical call name (for example, data.authz.allow) while resolve_rule_call_targets maps that name to rule head indices in the current package or data namespace.
- infer_call_expr evaluates arguments first, validates builtin calls (including type templates and arity), combines argument origins for pure builtins, and constant-folds when every argument is known.
- apply_rule_call_effects runs matching rule definitions in a scratch analyzer to harvest head facts and constant results, records RuleSpecialization entries, and ensures dependencies are tracked so recursion cycles are avoided while provenance is preserved.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce the builtin metadata layer that lets the analyzer understand parameter templates, purity, and return descriptors for core functions.
Rego:
sum := count(input.items)
lower := lower(input.user.name)
Details:
- catalog.rs parses builtins.json into feature-gated groups so we can share the same table across front ends.
- spec.rs models BuiltinSpec, validates template indices, and exposes helpers to derive return descriptors or fallback specs.
- matching.rs checks whether call arguments satisfy parameter templates and aggregates argument origins for diagnostics.
- table.rs loads the default catalog at startup, supports override/reset for tests, and resolves builtin specs while falling back to the runtime registry when needed.
- builtins.json seeds the catalog with the default builtin definitions including purity flags, caching hints, and parameter templates.
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Introduce the crate-level type analysis module and high-level TypeChecker wrapper so embedders can drive the new analyzer. Highlights: - src/type_analysis.rs documents the propagation pipeline and re-exports the core building blocks (TypeAnalyzer, TypeAnalysisOptions, LookupContext, builtin metadata, etc.) for external consumers. - src/type_checker.rs provides a convenience API that hoists loops, wires schemas and entrypoints into the analyzer, caches results, and exposes getters for diagnostics and hoisted loop metadata. - Adds doc examples and sanity tests covering schema configuration, invalidation, and the basic check workflow. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Teach Engine about the optional TypeChecker so type analysis runs during preparation and diagnostics are exposed to callers. Highlights: - store an optional TypeChecker inside Engine, mirror module mutations, offer enable/disable/getters, and invoke check() after loop hoisting completes. - add IDE-friendly helpers (tolerant parsing toggle, new_with_modules, get_type_analysis_context, try_eval_rule_constant) that share the engine environment with the analyzer. - export type_analysis, TypeChecker, Schema, Span, and AST aliases from lib.rs so embedders can reach the new APIs. Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
## Summary - add src/tests/type_analysis/mod.rs to parse rego modules, run the TypeAnalyzer, and validate rule/expr facts plus diagnostics from YAML fixtures - seed coverage suites under tests/type_analysis spanning literals, constants, operators, references, diagnostics, and placeholders for future areas - register the new module in src/tests/mod.rs, add ACI schema fixtures, and relax the parser test helper to allow refdot dynamic-field cases ## Testing - cargo test type_analysis::run Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
## Summary - add examples/helpers/type_analysis.rs for detailed CLI visualization of type analysis results - expose the Analyze subcommand in the regorus example and export the helper module entry point - tidy the server example policy formatting and include a matching JSON schema fixture ## Testing - cargo check --examples Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
90be740 to
5c36ce8
Compare
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
5c36ce8 to
c2a342b
Compare
Signed-off-by: Anand Krishnamoorthi <anakrish@microsoft.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements the Regorus type-analysis system end-to-end. Starting from foundation layers (type models, constant/value bridges, lookup contexts, result surfaces, propagation state) through expression and rule inference, it now infers precise types, tracks provenance, records diagnostics, and exposes rich metadata to callers.
Key Components
Type Model Foundations: Added structural and schema-aware type descriptors, constant fact store, value-to-type conversion, diagnostic data structures, and rule-path helpers to support precise inference and reporting.
Analysis Contexts & Result Surface: Introduced lookup contexts for per-expression facts, propagation state for in-flight inference, entrypoint tracking, and a public TypeAnalysisResult exposing expression/rule facts, dependency graphs, and diagnostics.
Analyzer Pipeline: Implemented traversal scaffolding, parser metadata extensions, rule/function traversal, loop binding helpers, builtin catalog, expression dispatcher, comprehension/statement handling, rule call resolution, and specialization hooks.
Engine Integration: Exposed analyzer configuration on Engine, allowing callers to enable type checking, set schemas, run analysis, and consume results programmatically.
Tests & Tooling: Added a YAML-driven regression harness under src/tests/type_analysis/**, including broad coverage of literals, comprehensions, rule dependencies, schema interactions, and diagnostics.
Regorus Binary: Extended the examples/regorus binary with an
analyzesubcommand for human-readable or JSON summaries.Testing
cargo test type_analysis::run