Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apps/kv-store/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ impl KvStore {
#[app::init]
pub fn init() -> KvStore {
KvStore {
items: UnorderedMap::new(),
items: UnorderedMap::new_with_field_name("items"),
}
}

Expand Down
10 changes: 5 additions & 5 deletions apps/nested-crdt-test/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ impl NestedCrdtTest {
#[app::init]
pub fn init() -> NestedCrdtTest {
NestedCrdtTest {
counters: UnorderedMap::new(),
registers: UnorderedMap::new(),
metadata: UnorderedMap::new(),
metrics: Vector::new(),
tags: UnorderedMap::new(),
counters: UnorderedMap::new_with_field_name("counters"),
registers: UnorderedMap::new_with_field_name("registers"),
metadata: UnorderedMap::new_with_field_name("metadata"),
metrics: Vector::new_with_field_name("metrics"),
tags: UnorderedMap::new_with_field_name("tags"),
}
}

Expand Down
194 changes: 194 additions & 0 deletions tools/merodb/src/abi.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use std::fs;
use std::path::Path;

use calimero_storage::collections::CrdtType;
use calimero_wasm_abi::schema::Manifest;
use eyre::Result;

Expand Down Expand Up @@ -67,3 +68,196 @@ pub fn load_state_schema_from_json(schema_path: &Path) -> Result<Manifest> {

load_state_schema_from_json_value(&schema_value)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Schema inference silently skips non-root fields and unrecognized entries

The infer_schema_from_database function only processes entries where is_root_field is true and silently skips all other entries. If the root ID detection logic is incorrect (e.g., context_id doesn't match expected format), legitimate fields may be silently excluded from the inferred schema with no warning or indication to the user. Additionally, entries that fail borsh::from_slice::<EntityIndex> deserialization are silently skipped.

Suggested fix:

Add logging or return diagnostic information about skipped entries. Consider returning a count of total entries scanned vs. matched, or track deserialization failures separately so users can diagnose incomplete schema inference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Schema inference defaults all types to String which may cause deserialization failures

The infer_schema_from_database function defaults value types to String (e.g., TypeRef::string() for map values and list items) when inferring schema from CRDT types. This means the inferred schema won't match the actual data types stored, potentially causing issues when trying to deserialize values using the inferred schema. Users may get incorrect type information or deserialization errors.

Suggested fix:

Document clearly that inferred schemas provide structural information only (field names and CRDT types) but not value type information. Consider adding a warning in the output or requiring explicit type hints for accurate schema inference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

/// Infer state schema from database by reading field names and CRDT types from metadata
///
/// This function scans the State column for EntityIndex entries and builds a schema
/// based on field_name and crdt_type found in metadata. This enables schema-free
/// database inspection when field names are stored in metadata.
///
/// # Arguments
/// * `db` - The database to scan
/// * `context_id` - Optional context ID to filter by. If None, scans all contexts (may find fields from multiple contexts)
pub fn infer_schema_from_database(
db: &rocksdb::DBWithThreadMode<rocksdb::SingleThreaded>,
context_id: Option<&[u8]>,
) -> Result<Manifest> {
use calimero_wasm_abi::schema::{
CollectionType, CrdtCollectionType, Field, ScalarType, TypeDef, TypeRef,
};
use std::collections::BTreeMap;

let state_cf = db
.cf_handle("State")
.ok_or_else(|| eyre::eyre!("State column family not found"))?;

let mut fields = Vec::new();
let mut seen_field_names = std::collections::HashSet::new();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Schema inference with no context_id uses arbitrary fallback

When context_id is None, the function uses [0u8; 32] as the root ID. This is documented with a warning, but the root ID comparison logic at line 137-141 checks if parent_id equals root_id_bytes. If the actual context has a different ID, legitimate root fields will be incorrectly filtered out, resulting in an empty or incomplete inferred schema.

Suggested fix:

Consider requiring `context_id` as a non-optional parameter, or scan all EntityIndex entries first to discover available context IDs and warn the user to specify one.


// Root ID depends on context:
// - If context_id is provided, root ID is that context_id (Id::root() returns context_id())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Schema inference with no context_id uses arbitrary fallback

When context_id is None, the function uses [0u8; 32] as the root ID and continues processing. The comment acknowledges 'we can't determine root fields reliably', but the function still returns a potentially incorrect schema. This could lead to confusing results when used without a context_id.

Suggested fix:

Consider returning an error or a clearly-marked 'unreliable' schema when context_id is None, or scan for all unique context IDs first and warn the user about multiple contexts. At minimum, add a warning log when using the fallback.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

// - If no context_id, we can't determine root fields reliably, so use all zeros as fallback
let root_id_bytes: [u8; 32] = match context_id {
Some(ctx_id) => ctx_id.try_into().map_err(|_| {
eyre::eyre!(
"context_id must be exactly 32 bytes, got {} bytes",
ctx_id.len()
)
})?,
None => {
eprintln!(
"[WARNING] No context_id provided for schema inference. \
Using [0; 32] as fallback root ID. This may produce incorrect or incomplete \
schema if the database contains multiple contexts. Consider providing a \
specific context_id for accurate schema inference."
);
[0u8; 32]
}
};

// Scan State column for EntityIndex entries
let iter = db.iterator_cf(&state_cf, rocksdb::IteratorMode::Start);
for item in iter {
let (key, value) = item?;

// Filter by context_id if provided (key format: context_id (32 bytes) + state_key (32 bytes))
if let Some(expected_context_id) = context_id {
if key.len() < 32 || &key[..32] != expected_context_id {
continue;
}
}

// Try to deserialize as EntityIndex
if let Ok(index) = borsh::from_slice::<crate::export::EntityIndex>(&value) {
// Check if this is a root-level field (parent_id is None or equals root/context_id)
let is_root_field = index.parent_id.is_none()
|| index
.parent_id
.as_ref()
.map(|id| id.as_bytes() == &root_id_bytes)
.unwrap_or(false);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Schema inference type defaults may produce incorrect type information

The schema inference defaults all UnorderedMap to Map<String, String>, Vector to List<String>, etc. These default type parameters may not match the actual data types stored in the database. When users rely on this inferred schema for inspection, they may get incorrect deserialization results for values that are not actually strings.

Suggested fix:

Document clearly that inferred schemas use default type parameters and may need manual refinement. Consider adding a comment in the generated schema indicating it was auto-inferred with default types, or attempt to sample actual entries to infer more accurate type information.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - This issue has been addressed in the latest changes.

if is_root_field {
// Check if we have field_name in metadata
if let Some(ref field_name) = index.metadata.field_name {
if !seen_field_names.contains(field_name) {
seen_field_names.insert(field_name.clone());

// Infer type from crdt_type
let type_ref = if let Some(crdt_type) = index.metadata.crdt_type {
match crdt_type {
CrdtType::UnorderedMap => {
// Default to Map<String, String> - can be refined later
TypeRef::Collection {
collection: CollectionType::Map {
key: Box::new(TypeRef::string()),
value: Box::new(TypeRef::string()),
},
crdt_type: Some(CrdtCollectionType::UnorderedMap),
inner_type: None,
}
}
CrdtType::Vector => TypeRef::Collection {
collection: CollectionType::List {
items: Box::new(TypeRef::string()),
},
crdt_type: Some(CrdtCollectionType::Vector),
inner_type: None,
},
CrdtType::UnorderedSet => TypeRef::Collection {
collection: CollectionType::List {
items: Box::new(TypeRef::string()),
},
crdt_type: Some(CrdtCollectionType::UnorderedSet),
inner_type: None,
},
CrdtType::Counter => TypeRef::Collection {
// Counter is stored as Map<String, u64> internally
collection: CollectionType::Map {
key: Box::new(TypeRef::string()),
value: Box::new(TypeRef::Scalar(ScalarType::U64)),
},
crdt_type: Some(CrdtCollectionType::Counter),
inner_type: None,
},
CrdtType::Rga => TypeRef::Collection {
collection: CollectionType::Record { fields: Vec::new() },
crdt_type: Some(CrdtCollectionType::ReplicatedGrowableArray),
inner_type: None,
},
CrdtType::LwwRegister => TypeRef::Collection {
collection: CollectionType::Record { fields: Vec::new() },
crdt_type: Some(CrdtCollectionType::LwwRegister),
inner_type: Some(Box::new(TypeRef::string())),
},
CrdtType::UserStorage => TypeRef::Collection {
collection: CollectionType::Map {
key: Box::new(TypeRef::string()),
value: Box::new(TypeRef::string()),
},
crdt_type: Some(CrdtCollectionType::UnorderedMap),
inner_type: None,
},
CrdtType::FrozenStorage => TypeRef::Collection {
collection: CollectionType::Map {
key: Box::new(TypeRef::string()),
value: Box::new(TypeRef::string()),
},
crdt_type: Some(CrdtCollectionType::UnorderedMap),
inner_type: None,
},
CrdtType::Record => {
// Record type - would need to inspect children to infer fields
TypeRef::Collection {
collection: CollectionType::Record { fields: Vec::new() },
crdt_type: None,
inner_type: None,
}
}
CrdtType::Custom(_) => {
// Custom type - can't infer without schema
TypeRef::Collection {
collection: CollectionType::Record { fields: Vec::new() },
crdt_type: None,
inner_type: None,
}
}
}
} else {
// No CRDT type - default to LWW register
TypeRef::Collection {
collection: CollectionType::Record { fields: Vec::new() },
crdt_type: Some(CrdtCollectionType::LwwRegister),
inner_type: Some(Box::new(TypeRef::string())),
}
};

fields.push(Field {
name: field_name.clone(),
type_: type_ref,
nullable: None,
});
}
}
}
}
}

// Create a record type with all inferred fields
let state_root_type = "InferredStateRoot".to_string();
let mut types = BTreeMap::new();
types.insert(
state_root_type.clone(),
TypeDef::Record {
fields: fields.clone(),
},
);

Ok(Manifest {
schema_version: "wasm-abi/1".to_string(),
types,
methods: Vec::new(),
events: Vec::new(),
state_root: Some(state_root_type),
})
}
Loading
Loading