Skip to content

Add Export Methods for StructuredModel Configuration#56

Open
vawsgit wants to merge 8 commits intoawslabs:devfrom
vawsgit:feature/vincilb/json-export
Open

Add Export Methods for StructuredModel Configuration#56
vawsgit wants to merge 8 commits intoawslabs:devfrom
vawsgit:feature/vincilb/json-export

Conversation

@vawsgit
Copy link
Contributor

@vawsgit vawsgit commented Jan 7, 2026

Add Export Methods for StructuredModel Configuration

🎯 What Was Implemented

This PR adds bidirectional serialization support to StructuredModel, enabling users to export model configurations in two formats:

  • to_json_schema() - Exports models as JSON Schema with x-aws-stickler-* extensions
  • to_stickler_config() - Exports models as custom Stickler JSON configuration

Both methods support full round-trip serialization with their corresponding import methods (from_json_schema() and model_from_json()).

💡 Why This Matters

Previously, users could only create StructuredModels by:

  1. Defining them in Python code
  2. Importing from pre-written JSON configurations

This created a chicken-and-egg problem: How do you get the JSON configuration in the first place?

With export methods, users can now:

  • ✅ Start with Python models and export to JSON
  • ✅ Export default configurations as a starting point
  • ✅ Customize thresholds and weights in JSON
  • ✅ Share configurations across teams
  • ✅ Version control comparison logic
  • ✅ A/B test different configurations

🚀 Benefits for Developers

1. Configuration-Driven Development 🔧

# Export defaults, customize, re-import
config = MyModel.to_stickler_config()
config["fields"]["name"]["threshold"] = 0.9  # Fine-tune
CustomModel = StructuredModel.model_from_json(config)

2. Team Collaboration 👥

Share model configurations as JSON files in version control, making it easy for teams to review and iterate on comparison logic without touching Python code.

3. Environment-Specific Configs 🌍

Export a base configuration and maintain environment-specific variants (dev, staging, prod) with different thresholds.

4. Documentation & Transparency 📚

Export configurations to document exactly how models compare data, making ML pipelines more transparent and auditable.

5. Interoperability 🔗

JSON Schema export works with OpenAPI, AsyncAPI, and standard JSON Schema validators, enabling integration with existing tooling.

🏗️ Architecture

The implementation follows the KISS principle by extending the existing JsonSchemaFieldConverter to support bidirectional conversion:

  • Import: JSON Schema → Pydantic Field (existing)
  • Export: Pydantic Field → JSON Schema (new)

This approach:

  • ✅ Reuses existing type mappings (single source of truth)
  • ✅ Avoids code duplication
  • ✅ Maintains symmetric import/export logic
  • ✅ No new helper classes needed

📖 Documentation

Comprehensive documentation added at docs/docs/Guides/StructuredModel_Export.md covering:

  • When to use each export format
  • Common use cases and workflows
  • Round-trip examples
  • Best practices for configuration management
  • Comparison of all three schema methods

✅ Testing

14 comprehensive tests added:

  • 7 export tests (basic, nested, lists, metadata preservation)
  • 7 round-trip tests (JSON Schema, Stickler config, comparison behavior)
  • All tests passing ✓

🎓 Example Use Case

Before (manual JSON writing):

{
  "model_name": "Product",
  "fields": {
    "name": {"type": "str", "threshold": 0.8, "comparator": "LevenshteinComparator"},
    "price": {"type": "float", "threshold": 0.95, "comparator": "NumericComparator"}
  }
}

❌ Error-prone, requires knowing exact format

After (export from Python):

class Product(StructuredModel):
    name: str = ComparableField(default=...)
    price: float = ComparableField(default=...)

# Export with defaults, customize as needed
config = Product.to_stickler_config()

✅ Type-safe, auto-generated, easy to customize

🔍 What's Next

This PR enables future enhancements:

  • CLI tools for exporting models
  • Configuration validation utilities
  • Schema migration tools
  • Configuration diff/merge tools

📝 Summary

This PR completes the configuration lifecycle for StructuredModel, making it easy to:

  1. Define models in Python
  2. Export configurations to JSON
  3. Customize thresholds and weights
  4. Import back to Python
  5. Share and version control

The result is a more flexible, collaborative, and transparent workflow for managing comparison logic in Stickler.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@vawsgit vawsgit requested review from adiadd and sromoam January 7, 2026 20:53
@vawsgit vawsgit requested a review from ayushi1208 January 13, 2026 20:30
@vawsgit vawsgit requested a review from adiadd January 22, 2026 17:52
Copy link
Contributor

@adiadd adiadd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a couple things!

}

# Add match_threshold if available (check both attribute names for compatibility)
threshold = getattr(cls, "match_threshold", None) or getattr(cls, "_match_threshold", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The or operator treats 0.0 as falsy. If a user sets match_threshold = 0.0, this will incorrectly fall back to _match_threshold. Use explicit None checking instead.

Suggested Fix:

  threshold = getattr(cls, "match_threshold", None)
  if threshold is None:
      threshold = getattr(cls, "_match_threshold", None)


- [StructuredModel Dynamic Creation](StructuredModel_Dynamic_Creation.md) - Import methods
- [StructuredModel Advanced Functionality](StructuredModel_Advanced_Functionality.md) - Comparison features
- [JSON Schema Extensions](../../index.md) - Full extension documentation in main README
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is broken - ../../index.md resolves to a redirect file. The JSON Schema Extensions reference is in the root README.md.

Maybe update to: JSON Schema Extensions Reference (matching the pattern in StructuredModel_Dynamic_Creation.md)

Comment on lines +1291 to +1295
field_config = {
"type": "structured_model",
# Recursively export nested model's fields
"fields": field_type.to_stickler_config()["fields"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested models lose model_name and match_threshold when exported via to_stickler_config(). Consider preserving full config for round-trip fidelity:

nested_config = field_type.to_stickler_config()
field_config = {
    "type": "structured_model",
    "model_name": nested_config.get("model_name"),
    "match_threshold": nested_config.get("match_threshold"),
    "fields": nested_config["fields"]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto - consider adding tests for Optional[str], Optional[StructuredModel], and Optional[List[StructuredModel]] fields to verify the unwrapping logic works correctly during export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have corresponding test cases for lines 1156-1157 and 1168-1172 in structured_model.py using pytest.raises(ValueError)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a test that creates a model with NumericComparator(absolute_tolerance=0.5), exports it, reimports it, and verifies the tolerance config is preserved

Comment on lines +1159 to +1160
# Unwrap Optional before type checking
field_type, _ = cls._unwrap_optional(field_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but double unwrapping of Optional types - once at line 1160 and again inside _is_structured_model_type() at line 1234. Consider removing the internal unwrap since the caller already does it, harmless but redundant

Comment on lines +520 to +524
def _build_comparison_extensions(
self,
metadata: Dict[str, Any],
format: str = "json_schema"
) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but consider adding validation:

if format not in ("json_schema", "stickler_config"): raise ValueError(...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants