-
Notifications
You must be signed in to change notification settings - Fork 71
Open
Description
ParseResponse serialization produces both class and class_ fields when using parse method and dumping to file:
response.model_dump_json(indent=2)
produces json like this:
{
"chunks": [...],
"class_": "full", # <<<<<<<
"identifier": "full",
"markdown": "<string>",
"pages": [0],
"class": "full" # <<<<<<<
}further more downloading blob from s3 by using the job output_url and then trying to load the model like this:
import json
from io import BytesIO
import httpx
async def download_blob(presigned_url: str):
async with httpx.AsyncClient() as client:
async with client.stream("GET", presigned_url) as response:
response.raise_for_status()
buffer = BytesIO()
async for chunk in response.aiter_bytes():
buffer.write(chunk)
return buffer
buf = await download_blob(response.output_url)
parsed = ParseResponse.model_validate_json(buf.getvalue())Fails with:
ValidationError: 1 validation error for ParseResponse splits.0.class
Field required [type=missing, input_value={'class_': 'full', 'ident...a94-9681-e3c8860228dd']}, input_type=dict]
using ParseResponse.model_validate_json(buf.getvalue(), by_name=True) succeeds this not found in the documentation.
Expected behavior:
model_dump_json() should produce only 'class' or 'class_' not both
JSON from S3 should deserialize correctly
Metadata
Metadata
Assignees
Labels
No labels