Skip to content

Proposal: Introduce stable Symbol IDs #96

@hardbyte

Description

@hardbyte

To evolve our code generation capabilities beyond simple templating, we can aim to build a more robust, multi-stage compiler style pipeline.
The current system relies heavily on string-based name lookups . This makes it rather fragile at circular reference detection and dependency analysis.

This proposal introduces a foundational change to reflectapi-schema: the integration of a stable and unique SymbolId for every type, field, variant, and function in the schema. This would be the first step towards enabling a reliable pipeline architecture.

Why

Compiler Architecture: A modern compiler pipeline requires a reliable way to track symbols (types, functions, etc.) across different stages of transformation (e.g., parsing -> normalization -> semantic analysis -> code generation). String names are insufficient because they can be ambiguous or change during normalization. SymbolId provides a canonical reference that remains stable throughout the entire process.

Enabling Determinism and Stability: By using identifiers that have a stable Ord implementation (based on kind and path), we can process schema components in a deterministic order. This guarantees that code generation is perfectly reproducible, eliminating random diffs in generated files and making version control history clean and predictable.

The true point of this change is the features it makes possible:

  • Accurate Dependency Tracking: We can build a dependency graph between types, which is essential for detecting and resolving circular references.
  • Transformations: Features like generic monomorphization (generating concrete types like List<User> from a generic List<T>) become feasible because we can reliably track and substitute type symbols.
  • Improved Error Reporting: When a type reference is invalid, the normalization pipeline can report exactly which symbol failed to resolve, providing much clearer and more actionable error messages to the user.

Better and more consistent support across languages: A stable IR with unique symbol identifiers is a prerequisite for a scalable multi-language code generator. The front-end of the pipeline (parsing and normalization) can be shared across all target languages, while the back-ends (e.g., TypeScript, Rust, Python generators) can all rely on the same stable symbol information.

How

reflectapi-schema/src/symbol.rs, will define the identifier:

pub struct SymbolId {
    pub kind: SymbolKind, // e.g., Struct, Enum, Field
    pub path: Vec<String>, // e.g., ["api", "User", "id"]
    pub disambiguator: u32,
}

The id: SymbolId field will be added to all core schema structs (Schema, Struct, Enum, Field, Function, etc.). The derive macros (#[derive(ReflectAPI)]) will have to be updated to automatically generate a preliminary, simple ID at compile time.

For schemas that are not created via the derive macro (e.g., deserialized from JSON), the compile-time IDs may be missing or conflicting. A new function, ensure_symbol_ids(), will be introduced in reflectapi-schema/src/ids.rs. This function traverses a schema and assigns a canonical, fully-qualified, and unique SymbolId to every element, guaranteeing the integrity of the symbol table before it is consumed by the code generator.

Is this a breaking change

No. The new id field in all schema structs is marked with #[serde(default)]. This means that existing schemas serialized to JSON or other formats (which do not contain the id field) can be deserialized without any errors. The field will simply be initialized with a default "unknown" ID.

For any schema loaded from an external source, users should call the new ensure_symbol_ids(&mut schema) function immediately after loading. This will populate the schema with the correct identifiers, making it fully compatible with any new tooling built on this system. Existing code that does not use the new codegen pipeline will continue to function as before without this step.

Will it help or hinder the existing Rust & TypeScript clients?
It should help them. While the immediate benefit is for the new Python generator, this change provides a long-term advantage for all clients and tooling built around reflectapi-schema. At some point, the TypeScript and Rust client generators can be migrated to use this stable symbol system, making them more robust and easier to maintain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions