Add Export Methods for StructuredModel Configuration by vawsgit · Pull Request #56 · awslabs/stickler

vawsgit · 2026-01-07T20:53:31Z

Add Export Methods for StructuredModel Configuration

🎯 What Was Implemented

This PR adds bidirectional serialization support to StructuredModel, enabling users to export model configurations in two formats:

to_json_schema() - Exports models as JSON Schema with x-aws-stickler-* extensions
to_stickler_config() - Exports models as custom Stickler JSON configuration

Both methods support full round-trip serialization with their corresponding import methods (from_json_schema() and model_from_json()).

💡 Why This Matters

Previously, users could only create StructuredModels by:

Defining them in Python code
Importing from pre-written JSON configurations

This created a chicken-and-egg problem: How do you get the JSON configuration in the first place?

With export methods, users can now:

✅ Start with Python models and export to JSON
✅ Export default configurations as a starting point
✅ Customize thresholds and weights in JSON
✅ Share configurations across teams
✅ Version control comparison logic
✅ A/B test different configurations

🚀 Benefits for Developers

1. Configuration-Driven Development 🔧

# Export defaults, customize, re-import
config = MyModel.to_stickler_config()
config["fields"]["name"]["threshold"] = 0.9  # Fine-tune
CustomModel = StructuredModel.model_from_json(config)

2. Team Collaboration 👥

Share model configurations as JSON files in version control, making it easy for teams to review and iterate on comparison logic without touching Python code.

3. Environment-Specific Configs 🌍

Export a base configuration and maintain environment-specific variants (dev, staging, prod) with different thresholds.

4. Documentation & Transparency 📚

Export configurations to document exactly how models compare data, making ML pipelines more transparent and auditable.

5. Interoperability 🔗

JSON Schema export works with OpenAPI, AsyncAPI, and standard JSON Schema validators, enabling integration with existing tooling.

🏗️ Architecture

The implementation follows the KISS principle by extending the existing JsonSchemaFieldConverter to support bidirectional conversion:

Import: JSON Schema → Pydantic Field (existing)
Export: Pydantic Field → JSON Schema (new)

This approach:

✅ Reuses existing type mappings (single source of truth)
✅ Avoids code duplication
✅ Maintains symmetric import/export logic
✅ No new helper classes needed

📖 Documentation

Comprehensive documentation added at docs/docs/Guides/StructuredModel_Export.md covering:

When to use each export format
Common use cases and workflows
Round-trip examples
Best practices for configuration management
Comparison of all three schema methods

✅ Testing

14 comprehensive tests added:

7 export tests (basic, nested, lists, metadata preservation)
7 round-trip tests (JSON Schema, Stickler config, comparison behavior)
All tests passing ✓

🎓 Example Use Case

Before (manual JSON writing):

{
  "model_name": "Product",
  "fields": {
    "name": {"type": "str", "threshold": 0.8, "comparator": "LevenshteinComparator"},
    "price": {"type": "float", "threshold": 0.95, "comparator": "NumericComparator"}
  }
}

❌ Error-prone, requires knowing exact format

After (export from Python):

class Product(StructuredModel):
    name: str = ComparableField(default=...)
    price: float = ComparableField(default=...)

# Export with defaults, customize as needed
config = Product.to_stickler_config()

✅ Type-safe, auto-generated, easy to customize

🔍 What's Next

This PR enables future enhancements:

CLI tools for exporting models
Configuration validation utilities
Schema migration tools
Configuration diff/merge tools

📝 Summary

This PR completes the configuration lifecycle for StructuredModel, making it easy to:

Define models in Python
Export configurations to JSON
Customize thresholds and weights
Import back to Python
Share and version control

The result is a more flexible, collaborative, and transparent workflow for managing comparison logic in Stickler.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

src/stickler/structured_object_evaluator/models/structured_model.py

src/stickler/structured_object_evaluator/models/json_schema_field_converter.py

docs/docs/Guides/StructuredModel_Export.md

adiadd

Looks great, just a couple things!

adiadd · 2026-02-02T18:11:27Z

src/stickler/structured_object_evaluator/models/structured_model.py

+        }
+
+        # Add match_threshold if available (check both attribute names for compatibility)
+        threshold = getattr(cls, "match_threshold", None) or getattr(cls, "_match_threshold", None)


The or operator treats 0.0 as falsy. If a user sets match_threshold = 0.0, this will incorrectly fall back to _match_threshold. Use explicit None checking instead.

Suggested Fix:

threshold = getattr(cls, "match_threshold", None) if threshold is None: threshold = getattr(cls, "_match_threshold", None)

adiadd · 2026-02-02T18:12:44Z

docs/docs/Guides/StructuredModel_Export.md

+
+- [StructuredModel Dynamic Creation](StructuredModel_Dynamic_Creation.md) - Import methods
+- [StructuredModel Advanced Functionality](StructuredModel_Advanced_Functionality.md) - Comparison features
+- [JSON Schema Extensions](../../index.md) - Full extension documentation in main README


This link is broken - ../../index.md resolves to a redirect file. The JSON Schema Extensions reference is in the root README.md.

Maybe update to: JSON Schema Extensions Reference (matching the pattern in StructuredModel_Dynamic_Creation.md)

adiadd · 2026-02-02T18:14:32Z

src/stickler/structured_object_evaluator/models/structured_model.py

+                field_config = {
+                    "type": "structured_model",
+                    # Recursively export nested model's fields
+                    "fields": field_type.to_stickler_config()["fields"]
+                }


Nested models lose model_name and match_threshold when exported via to_stickler_config(). Consider preserving full config for round-trip fidelity:

nested_config = field_type.to_stickler_config() field_config = { "type": "structured_model", "model_name": nested_config.get("model_name"), "match_threshold": nested_config.get("match_threshold"), "fields": nested_config["fields"] }

adiadd · 2026-02-02T18:15:25Z

tests/structured_object_evaluator/test_export_roundtrip.py

Consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export

adiadd · 2026-02-02T18:15:49Z

tests/structured_object_evaluator/test_model_export.py

ditto - consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export

adiadd · 2026-02-02T18:17:25Z

tests/structured_object_evaluator/test_model_export.py

It would be good to have corresponding test cases for lines 1156-1157 and 1168-1172 in structured_model.py using pytest.raises(ValueError)

adiadd · 2026-02-02T18:17:53Z

tests/structured_object_evaluator/test_export_roundtrip.py

Would be good to add a test that creates a model with NumericComparator(absolute_tolerance=0.5), exports it, reimports it, and verifies the tolerance config is preserved

adiadd · 2026-02-02T18:20:05Z

src/stickler/structured_object_evaluator/models/structured_model.py

+            # Unwrap Optional before type checking
+            field_type, _ = cls._unwrap_optional(field_type)


Minor but double unwrapping of Optional types - once at line 1160 and again inside _is_structured_model_type() at line 1234. Consider removing the internal unwrap since the caller already does it, harmless but redundant

adiadd · 2026-02-02T18:20:59Z

src/stickler/structured_object_evaluator/models/json_schema_field_converter.py

+    def _build_comparison_extensions(
+        self, 
+        metadata: Dict[str, Any], 
+        format: str = "json_schema"
+    ) -> Dict[str, Any]:


Minor but consider adding validation:

if format not in ("json_schema", "stickler_config"): raise ValueError(...)

vawsgit added 2 commits January 7, 2026 14:50

Update: add 4 file(s), modify 4 file(s)

31dff14

Update: modify 1 file(s)

6ff1b9f

vawsgit requested review from adiadd and sromoam January 7, 2026 20:53

vawsgit added 3 commits January 7, 2026 15:59

reverted .gitignore changes

fc54319

Update: modify 165 file(s)

fec37f4

Merge branch 'dev' into feature/vincilb/json-export

f0af639

vawsgit requested a review from ayushi1208 January 13, 2026 20:30

adiadd reviewed Jan 21, 2026

View reviewed changes

vawsgit added 3 commits January 22, 2026 11:07

Update: modify 4 file(s)

4c0aa13

Update: modify 1 file(s)

f01878f

Update: modify 3 file(s)

41f5983

vawsgit requested a review from adiadd January 22, 2026 17:52

adiadd reviewed Feb 2, 2026

View reviewed changes

		# Unwrap Optional before type checking
		field_type, _ = cls._unwrap_optional(field_type)

Conversation

vawsgit commented Jan 7, 2026

Add Export Methods for StructuredModel Configuration

🎯 What Was Implemented

💡 Why This Matters

🚀 Benefits for Developers

1. Configuration-Driven Development 🔧

2. Team Collaboration 👥

3. Environment-Specific Configs 🌍

4. Documentation & Transparency 📚

5. Interoperability 🔗

🏗️ Architecture

📖 Documentation

✅ Testing

🎓 Example Use Case

🔍 What's Next

📝 Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adiadd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants