Skip to content

feat(storage): implement deterministic entity/collection IDs#1787

Closed
xilosada wants to merge 6 commits intomasterfrom
feat/deterministic-entity-ids-1769
Closed

feat(storage): implement deterministic entity/collection IDs#1787
xilosada wants to merge 6 commits intomasterfrom
feat/deterministic-entity-ids-1769

Conversation

@xilosada
Copy link
Member

@xilosada xilosada commented Feb 1, 2026

Summary

Implements issue #1769: Deterministic Entity/Collection IDs

This PR adds deterministic ID generation for collections based on parent ID and field name, ensuring the same application code produces identical collection IDs across all nodes.

Changes

Core Implementation

  • Added compute_collection_id() function that generates deterministic IDs using SHA256 hash of parent_id + field_name
  • Added new_with_field_name() method to Collection that uses deterministic IDs
  • Added new_with_field_name() to all collection types:
    • Counter (special handling for positive/negative maps)
    • UnorderedMap
    • UnorderedSet
    • Vector
    • ReplicatedGrowableArray

Testing

  • Added tests to verify determinism:
    • Same field names produce same IDs
    • Different field names produce different IDs
    • Parent ID correctly affects child collection IDs

CIP Reference

CIP Section: Protocol Invariants
Invariant: I9 (Deterministic Entity IDs)

Acceptance Criteria

  • Same code on two nodes produces identical collection IDs
  • Nested collections derive IDs correctly (parent + field)
  • Existing apps continue to work (backward compatibility)
  • Unit tests verify determinism

Files Modified

  • crates/storage/src/collections.rs - Core deterministic ID generation
  • crates/storage/src/collections/counter.rs - Counter support
  • crates/storage/src/collections/unordered_map.rs - UnorderedMap support
  • crates/storage/src/collections/unordered_set.rs - UnorderedSet support
  • crates/storage/src/collections/vector.rs - Vector support
  • crates/storage/src/collections/rga.rs - RGA support
  • apps/state-schema-conformance/src/lib.rs - Updated to use deterministic IDs

Usage

To use deterministic IDs, call new_with_field_name(parent_id, field_name) instead of new():

let map = UnorderedMap::new_with_field_name(None, "my_field");

For nested collections, pass the parent collection's ID:

let parent_id = Some(parent_map.id());
let nested = UnorderedMap::new_with_field_name(parent_id, "nested_field");

Testing

All deterministic tests pass:

  • test_deterministic_counter_ids
  • test_deterministic_counter_with_parent_id
  • test_deterministic_map_ids
  • test_deterministic_map_with_parent_id

Backward Compatibility

Old new() methods remain available and functional. The deterministic ID generation is opt-in via new_with_field_name().

Implements #1769


Note

High Risk
High risk because it changes the ID derivation used for map/set entries (compute_id), which can affect existing persisted data and synchronization semantics even though deterministic collection creation is opt-in.

Overview
Implements deterministic collection ID generation via new_with_field_name(parent_id, field_name) across core collection types (Collection, UnorderedMap, UnorderedSet, Vector, Counter, ReplicatedGrowableArray), so the same app code yields stable IDs across nodes.

Adds domain separation to the SHA256-based ID hashing to avoid collisions between nested collection IDs and map entry IDs, plus new unit tests covering determinism, parent-scoping, and collision prevention (including Counter’s internal positive/negative maps). Updates state-schema-conformance app initialization to use the deterministic constructors for all collections.

Written by Cursor Bugbot for commit 3b55655. This will update automatically on new commits. Configure here.

@xilosada xilosada changed the title [Sync Protocol] 002: Deterministic Entity/Collection IDs feat(storage): implement deterministic entity/collection IDs Feb 1, 2026
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

fn compute_id(parent: Id, key: &[u8]) -> Id {
let mut hasher = Sha256::new();
hasher.update(parent.as_bytes());
hasher.update(DOMAIN_SEPARATOR_ENTRY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change to compute_id breaks existing stored data

High Severity

Adding DOMAIN_SEPARATOR_ENTRY to the existing compute_id function changes how entry IDs are computed for all UnorderedMap and UnorderedSet operations. Any existing stored data will become inaccessible because lookups now compute different IDs than what was used to store the data. The PR claims backward compatibility is maintained and the new ID generation is "opt-in via new_with_field_name()", but this change to compute_id affects all map/set operations regardless of how the collection was created.

Fix in Cursor Fix in Web

- Add new_with_field_name() method to Collection that generates deterministic IDs
  using SHA256 hash of parent_id + field_name
- Add new_with_field_name() to all collection types:
  - Counter (handles positive/negative maps with deterministic IDs)
  - UnorderedMap
  - UnorderedSet
  - Vector
  - ReplicatedGrowableArray
- Update #[app::state] macro to generate Default implementation that uses
  new_with_field_name() for collection fields
- Add tests to verify determinism:
  - Same field names produce same IDs
  - Different field names produce different IDs
  - Parent ID affects child collection IDs

Implements #1769
CIP Section: Protocol Invariants
Invariant: I9 (Deterministic Entity IDs)

Acceptance Criteria:
✅ Same code on two nodes produces identical collection IDs
✅ Nested collections derive IDs correctly (parent + field)
✅ Existing apps continue to work (backward compatibility)
✅ Unit tests verify determinism
The macro was generating a Default implementation that required all fields
to implement Default, which breaks apps with non-Default types like enums.
Users should manually implement Default or use new_with_field_name() in
their init functions for deterministic IDs.
Counter's internal maps were using '{field_name}_positive' and
'{field_name}_negative' as field names, which could silently collide
with user-created collections. For example, a Counter with field name
'visits' would create internal maps with IDs derived from 'visits_positive',
which would collide with a user-created UnorderedMap with field name
'visits_positive'.

Fix by using a reserved internal prefix '__counter_internal_' for
Counter's internal maps. This ensures:
- Counter('visits') creates maps with IDs from '__counter_internal_visits_positive'
- User collections named 'visits_positive' use IDs from 'visits_positive'
- No collision possible

Added test to verify no collision occurs.

Fixes collision issue in crates/storage/src/collections/counter.rs:216-228
The compute_id and compute_collection_id functions both compute
SHA256(parent_bytes + name_bytes) without domain separation, which
can cause collisions. For example:
- A nested collection with field name 'key' creates ID from
  SHA256(parent_id + 'key')
- A map entry with key 'key' creates ID from SHA256(parent_id + 'key')
- Both get identical IDs, causing data corruption

Fix by adding domain separators:
- compute_id uses '__calimero_entry__' separator
- compute_collection_id uses '__calimero_collection__' separator

This ensures map entries and nested collections never collide even
with the same parent and name.

Added test to verify no collision occurs.

Fixes collision issue in:
- crates/storage/src/collections.rs:66-74 (compute_collection_id)
- crates/storage/src/collections.rs:57-63 (compute_id)
…ections

Counter::new_with_field_name was defined in a generic impl block for any
StorageAdaptor, while other collections (UnorderedMap, UnorderedSet, Vector,
RGA) only expose new_with_field_name for MainStorage. This created an API
inconsistency.

Fix by moving new_with_field_name to the MainStorage-only impl block,
matching the pattern used by all other collection types.

Fixes API inconsistency in crates/storage/src/collections/counter.rs:188-201
@xilosada xilosada force-pushed the feat/deterministic-entity-ids-1769 branch from f69ac8d to d3bff4c Compare February 1, 2026 18:43
@github-actions
Copy link

github-actions bot commented Feb 1, 2026

Merobox Proposals Workflows Failed

The following proposal workflow(s) failed:

  • near
  • icp
  • ethereum

Please check the workflow logs for more details.

@github-actions
Copy link

github-actions bot commented Feb 2, 2026

SDK JS Workflows Failed

The following SDK JS workflow(s) failed:

  • examples/kv-store-with-user-and-frozen-storage/workflows/test_user_storage.yml

Please check the workflow logs for more details.

@github-actions
Copy link

github-actions bot commented Feb 2, 2026

Merobox Workflows Failed

The following workflow(s) failed after retries:

  • nested-crdt-test/workflows/nested-crdt-test.yml

Please check the workflow logs for more details.

xilosada added a commit that referenced this pull request Feb 5, 2026
Combined feature from PR #1786 and #1787:

- Add field_name and crdt_type to Metadata struct
- Add Element::new_with_field_name and new_with_field_name_and_crdt_type
- Update Collection::new_with_field_name to store metadata
- Update all collection types to pass their CrdtType:
  - UnorderedMap -> CrdtType::UnorderedMap
  - UnorderedSet -> CrdtType::UnorderedSet
  - Vector -> CrdtType::Vector
  - Counter -> CrdtType::Counter
  - RGA -> CrdtType::Rga
- Add BorshSerialize/Deserialize/Ord/PartialOrd to CrdtType
- Add UserStorage and FrozenStorage to CrdtType enum
- Custom Borsh de/serialization for backward compatibility

This enables:
- Deterministic collection IDs (from #1787)
- Schema inference from database metadata (from #1786)
- CRDT type-aware merge dispatch
@xilosada
Copy link
Member Author

xilosada commented Feb 5, 2026

Superseded by #1864 which combines this PR with #1786 metadata storage features.

@xilosada xilosada closed this Feb 5, 2026
xilosada added a commit that referenced this pull request Feb 5, 2026
Combined feature from PR #1786 and #1787:

- Add field_name and crdt_type to Metadata struct
- Add Element::new_with_field_name and new_with_field_name_and_crdt_type
- Update Collection::new_with_field_name to store metadata
- Update all collection types to pass their CrdtType:
  - UnorderedMap -> CrdtType::UnorderedMap
  - UnorderedSet -> CrdtType::UnorderedSet
  - Vector -> CrdtType::Vector
  - Counter -> CrdtType::Counter
  - RGA -> CrdtType::Rga
- Add BorshSerialize/Deserialize/Ord/PartialOrd to CrdtType
- Add UserStorage and FrozenStorage to CrdtType enum
- Custom Borsh de/serialization for backward compatibility

This enables:
- Deterministic collection IDs (from #1787)
- Schema inference from database metadata (from #1786)
- CRDT type-aware merge dispatch
xilosada added a commit that referenced this pull request Feb 5, 2026
Combined feature from PR #1786 and #1787:

- Add field_name and crdt_type to Metadata struct
- Add Element::new_with_field_name and new_with_field_name_and_crdt_type
- Update Collection::new_with_field_name to store metadata
- Update all collection types to pass their CrdtType:
  - UnorderedMap -> CrdtType::UnorderedMap
  - UnorderedSet -> CrdtType::UnorderedSet
  - Vector -> CrdtType::Vector
  - Counter -> CrdtType::Counter
  - RGA -> CrdtType::Rga
- Add BorshSerialize/Deserialize/Ord/PartialOrd to CrdtType
- Add UserStorage and FrozenStorage to CrdtType enum
- Custom Borsh de/serialization for backward compatibility

This enables:
- Deterministic collection IDs (from #1787)
- Schema inference from database metadata (from #1786)
- CRDT type-aware merge dispatch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants