Add Export Methods for StructuredModel Configuration#56
Add Export Methods for StructuredModel Configuration#56vawsgit wants to merge 8 commits intoawslabs:devfrom
Conversation
src/stickler/structured_object_evaluator/models/structured_model.py
Outdated
Show resolved
Hide resolved
src/stickler/structured_object_evaluator/models/structured_model.py
Outdated
Show resolved
Hide resolved
src/stickler/structured_object_evaluator/models/structured_model.py
Outdated
Show resolved
Hide resolved
src/stickler/structured_object_evaluator/models/json_schema_field_converter.py
Outdated
Show resolved
Hide resolved
src/stickler/structured_object_evaluator/models/json_schema_field_converter.py
Outdated
Show resolved
Hide resolved
src/stickler/structured_object_evaluator/models/json_schema_field_converter.py
Outdated
Show resolved
Hide resolved
adiadd
left a comment
There was a problem hiding this comment.
Looks great, just a couple things!
| } | ||
|
|
||
| # Add match_threshold if available (check both attribute names for compatibility) | ||
| threshold = getattr(cls, "match_threshold", None) or getattr(cls, "_match_threshold", None) |
There was a problem hiding this comment.
The or operator treats 0.0 as falsy. If a user sets match_threshold = 0.0, this will incorrectly fall back to _match_threshold. Use explicit None checking instead.
Suggested Fix:
threshold = getattr(cls, "match_threshold", None)
if threshold is None:
threshold = getattr(cls, "_match_threshold", None)|
|
||
| - [StructuredModel Dynamic Creation](StructuredModel_Dynamic_Creation.md) - Import methods | ||
| - [StructuredModel Advanced Functionality](StructuredModel_Advanced_Functionality.md) - Comparison features | ||
| - [JSON Schema Extensions](../../index.md) - Full extension documentation in main README |
There was a problem hiding this comment.
This link is broken - ../../index.md resolves to a redirect file. The JSON Schema Extensions reference is in the root README.md.
Maybe update to: JSON Schema Extensions Reference (matching the pattern in StructuredModel_Dynamic_Creation.md)
| field_config = { | ||
| "type": "structured_model", | ||
| # Recursively export nested model's fields | ||
| "fields": field_type.to_stickler_config()["fields"] | ||
| } |
There was a problem hiding this comment.
Nested models lose model_name and match_threshold when exported via to_stickler_config(). Consider preserving full config for round-trip fidelity:
nested_config = field_type.to_stickler_config()
field_config = {
"type": "structured_model",
"model_name": nested_config.get("model_name"),
"match_threshold": nested_config.get("match_threshold"),
"fields": nested_config["fields"]
}There was a problem hiding this comment.
Consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export
There was a problem hiding this comment.
ditto - consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export
There was a problem hiding this comment.
It would be good to have corresponding test cases for lines 1156-1157 and 1168-1172 in structured_model.py using pytest.raises(ValueError)
There was a problem hiding this comment.
Would be good to add a test that creates a model with NumericComparator(absolute_tolerance=0.5), exports it, reimports it, and verifies the tolerance config is preserved
| # Unwrap Optional before type checking | ||
| field_type, _ = cls._unwrap_optional(field_type) |
There was a problem hiding this comment.
Minor but double unwrapping of Optional types - once at line 1160 and again inside _is_structured_model_type() at line 1234. Consider removing the internal unwrap since the caller already does it, harmless but redundant
| def _build_comparison_extensions( | ||
| self, | ||
| metadata: Dict[str, Any], | ||
| format: str = "json_schema" | ||
| ) -> Dict[str, Any]: |
There was a problem hiding this comment.
Minor but consider adding validation:
if format not in ("json_schema", "stickler_config"): raise ValueError(...)
Add Export Methods for StructuredModel Configuration
🎯 What Was Implemented
This PR adds bidirectional serialization support to StructuredModel, enabling users to export model configurations in two formats:
to_json_schema()- Exports models as JSON Schema withx-aws-stickler-*extensionsto_stickler_config()- Exports models as custom Stickler JSON configurationBoth methods support full round-trip serialization with their corresponding import methods (
from_json_schema()andmodel_from_json()).💡 Why This Matters
Previously, users could only create StructuredModels by:
This created a chicken-and-egg problem: How do you get the JSON configuration in the first place?
With export methods, users can now:
🚀 Benefits for Developers
1. Configuration-Driven Development 🔧
2. Team Collaboration 👥
Share model configurations as JSON files in version control, making it easy for teams to review and iterate on comparison logic without touching Python code.
3. Environment-Specific Configs 🌍
Export a base configuration and maintain environment-specific variants (dev, staging, prod) with different thresholds.
4. Documentation & Transparency 📚
Export configurations to document exactly how models compare data, making ML pipelines more transparent and auditable.
5. Interoperability 🔗
JSON Schema export works with OpenAPI, AsyncAPI, and standard JSON Schema validators, enabling integration with existing tooling.
🏗️ Architecture
The implementation follows the KISS principle by extending the existing
JsonSchemaFieldConverterto support bidirectional conversion:This approach:
📖 Documentation
Comprehensive documentation added at
docs/docs/Guides/StructuredModel_Export.mdcovering:✅ Testing
14 comprehensive tests added:
🎓 Example Use Case
Before (manual JSON writing):
{ "model_name": "Product", "fields": { "name": {"type": "str", "threshold": 0.8, "comparator": "LevenshteinComparator"}, "price": {"type": "float", "threshold": 0.95, "comparator": "NumericComparator"} } }❌ Error-prone, requires knowing exact format
After (export from Python):
✅ Type-safe, auto-generated, easy to customize
🔍 What's Next
This PR enables future enhancements:
📝 Summary
This PR completes the configuration lifecycle for StructuredModel, making it easy to:
The result is a more flexible, collaborative, and transparent workflow for managing comparison logic in Stickler.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.