-
Notifications
You must be signed in to change notification settings - Fork 833
Open
Labels
enhancementNew feature or requestNew feature or request
Description
**JSON File Ingestion – Handling Metadata and Chunking **
Description
When uploading a JSON file, I need Verba to properly ingest structured metadata while still generating chunks automatically. Currently, the behavior is unclear, and it seems that the "chunks" field must be predefined, even though Verba can generate chunks for PDFs automatically.
Expected Behavior:
- Verba should recognize metadata fields without requiring predefined chunks.
- The "content" field should be processed as document text.
- Chunking should be handled automatically based on Verba’s settings.
Actual Behavior:
- The "chunks" field appears necessary, even though I want Verba to generate them dynamically.
- Metadata structure is unclear—what should be included for proper indexing?
Example JSON File:
{
"year": 1995,
"number": "50",
"title": "Circular Nº 50, del 13 de Diciembre de 1995 (modificada) (aclarada / complementada)",
"materia": "Crédito Tributario por inversiones en provincias de Arica y Parinacota",
"url": "https://www.sii.cl/documentos/circulares/1995/circu50.pdf",
"sin_efecto": false,
"downloaded_filename": "circu50.pdf",
"saved_filename": "circular_1995_50_2.pdf",
"content": "Modificada por Circular Nº 45, del 3 de septiembre de 2008 \n\nModificadas por Circular Nº 64, del 6 de noviembre de 1996 \n\nComplementada por Circular Nº 64, del 6 de noviembre de 1996 \n\nCIRCULAR Nº 50, DEL 13 DE D ETC ETC etc",
"modificada": true,
"aclarada_complementada": true
}
Installation
-
pip install goldenverba
If you installed via pip, please specify the version:
Weaviate Deployment
- Local Deployment
Steps to Reproduce
- Go to the dashboard.
- Upload a JSON file with structured metadata.
- Metadata doesnt load, not does the title of the document, etc.
Additional Context
- Do I need to structure metadata differently for proper indexing?
- Should Verba automatically generate chunks even when metadata is present?
- If so, how should metadata fields be formatted?
- Is there a recommended JSON structure for structured documents without manually defining chunks?
@thomashacker Any guidance on this?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request