-
Notifications
You must be signed in to change notification settings - Fork 48
Extract Pydantic schema utilities to dedicated module #4309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
| if not has_default and field.default_factory is not None: | ||
| try: | ||
| default_val = field.default_factory() | ||
| has_default = True | ||
| except Exception: | ||
| default_val = "<factory>" | ||
| has_default = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling field.default_factory() without arguments can fail for factories that require parameters. Additionally, executing factory functions during schema extraction can trigger unintended side effects (file I/O, network calls, resource allocation, etc.).
# Current code executes the factory:
if not has_default and field.default_factory is not None:
try:
default_val = field.default_factory() # ⚠️ Executes factory
has_default = True
except Exception:
default_val = "<factory>"
has_default = True
# Should avoid execution:
if not has_default and field.default_factory is not None:
default_val = "<factory>"
has_default = TrueThe serialize_default() function already handles callable values by returning "<factory>" (line 78-79), so the factory should not be executed here. This prevents side effects and makes the behavior consistent with how get_pydantic_field_info() handles factories (line 130, 140).
| if not has_default and field.default_factory is not None: | |
| try: | |
| default_val = field.default_factory() | |
| has_default = True | |
| except Exception: | |
| default_val = "<factory>" | |
| has_default = True | |
| if not has_default and field.default_factory is not None: | |
| default_val = "<factory>" | |
| has_default = True |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
6450469 to
b501e73
Compare
b501e73 to
3f5a2da
Compare
4aea879 to
d0a68f9
Compare
3f5a2da to
91dfa7f
Compare
nishu-builder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very exciting! one thing that's coming to mind is that pydantic has some built-in support for schema serialization. e.g. try this out:
"""Extract JSON schema from a Pydantic model using built-in Pydantic schema export."""
import argparse
import importlib
import json
import sys
from pydantic import BaseModel
def load_model_class(spec: str) -> type[BaseModel]:
if "." not in spec:
raise ValueError(f"Invalid spec: {spec}. Use module.path.ClassName format")
module_path, class_name = spec.rsplit(".", 1)
module = importlib.import_module(module_path)
cls = getattr(module, class_name)
if not isinstance(cls, type) or not issubclass(cls, BaseModel):
raise ValueError(f"{spec} is not a Pydantic BaseModel")
return cls
def main():
parser = argparse.ArgumentParser(description="Extract JSON schema from Pydantic model (built-in)")
parser.add_argument("model", help="Model spec: module.path.ClassName")
parser.add_argument("--compact", action="store_true", help="Compact JSON output")
args = parser.parse_args()
try:
model_class = load_model_class(args.model)
except Exception as e:
print(f"Error loading model: {e}", file=sys.stderr)
sys.exit(1)
schema = model_class.model_json_schema() # could replace this with `extract_schema` to compare to yours
indent = None if args.compact else 2
print(json.dumps(schema, indent=indent))
if __name__ == "__main__":
main()
it has a few advantages:
- handles a few types yours may not, like sets, Path, UUID (i think we use all of these)
- field constraints: keeps
ge, etc - handles it gets model-level docstrings (i see yours extracts descriptions, which could be cool to show up when forming the command in skydeck, and maybe you want some analagous thing for objects themselves?)
yours has the advantage of:
- flat dot-path keys. maybe this makes handling in skydeck easier, or you've already written it with dot-path keys in mind
- when there's an inner other pydantic object, you print
<PydanticClassName>, and don't have to do any$refjsonschema-type resolution, which the builtin one requires
I'm happy with either, so approving this. If we go with your impl, may want to add special type handling for sets, Path, UUID
d0a68f9 to
82c1007
Compare
91dfa7f to
012fa28
Compare
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
012fa28 to
2afd2b3
Compare
d### TL;DR Refactored Pydantic schema extraction into a dedicated module with enhanced functionality for JSON schema export. ### What changed? - Moved `get_pydantic_field_info()` from `run_tool.py` to a new dedicated module `schema.py` - Added new utility functions in `schema.py`: - `get_type_str()` - Converts type annotations to readable strings - `serialize_default()` - Serializes default values to JSON-compatible format - `extract_schema()` - Enhanced schema extraction with richer metadata - Created a new tool `pydantic_config_schema.py` that can extract schema from any Pydantic model and output as JSON ### How to test? Run the new schema extraction tool with a Pydantic model: ```bash uv run ./tools/pydantic_config_schema.py metta.rl.trainer_config.TrainerConfig ``` Or with a tool name: ```bash uv run ./tools/pydantic_config_schema.py arena.train ``` ### Why make this change? This refactoring improves code organization by moving schema extraction logic to a dedicated module. The enhanced schema extraction capabilities enable better documentation generation and configuration validation. The new JSON schema export tool allows for programmatic access to configuration schemas, which can be used for generating documentation, UI forms, or validation rules.

d### TL;DR
Refactored Pydantic schema extraction into a dedicated module with enhanced functionality for JSON schema export.
What changed?
get_pydantic_field_info()fromrun_tool.pyto a new dedicated moduleschema.pyschema.py:get_type_str()- Converts type annotations to readable stringsserialize_default()- Serializes default values to JSON-compatible formatextract_schema()- Enhanced schema extraction with richer metadatapydantic_config_schema.pythat can extract schema from any Pydantic model and output as JSONHow to test?
Run the new schema extraction tool with a Pydantic model:
Or with a tool name:
Why make this change?
This refactoring improves code organization by moving schema extraction logic to a dedicated module. The enhanced schema extraction capabilities enable better documentation generation and configuration validation. The new JSON schema export tool allows for programmatic access to configuration schemas, which can be used for generating documentation, UI forms, or validation rules.