JSON File Ingestion – Handling Metadata and Chunking

### **JSON File Ingestion – Handling Metadata and Chunking **  

## **Description**  
When uploading a JSON file, I need Verba to properly ingest structured metadata while still generating chunks automatically. Currently, the behavior is unclear, and it seems that the "chunks" field must be predefined, even though Verba can generate chunks for PDFs automatically.  

**Expected Behavior:**  
- Verba should recognize metadata fields without requiring predefined chunks.  
- The "content" field should be processed as document text.  
- Chunking should be handled automatically based on Verba’s settings.  

**Actual Behavior:**  
- The "chunks" field appears necessary, even though I want Verba to generate them dynamically.  
- Metadata structure is unclear—what should be included for proper indexing?  

**Example JSON File:**  

```
{
  "year": 1995,
  "number": "50",
  "title": "Circular Nº 50, del 13 de Diciembre de 1995 (modificada) (aclarada / complementada)",
  "materia": "Crédito Tributario por inversiones en provincias de Arica y Parinacota",
  "url": "https://www.sii.cl/documentos/circulares/1995/circu50.pdf",
  "sin_efecto": false,
  "downloaded_filename": "circu50.pdf",
  "saved_filename": "circular_1995_50_2.pdf",
  "content": "Modificada por Circular Nº 45, del 3 de septiembre de 2008 \n\nModificadas por Circular Nº 64, del 6 de noviembre de 1996 \n\nComplementada por Circular Nº 64, del 6 de noviembre de 1996 \n\nCIRCULAR Nº 50, DEL 13 DE D ETC ETC etc",
  "modificada": true,
  "aclarada_complementada": true
}
```

## **Installation**  
  

- [ ] `pip install goldenverba`  


If you installed via pip, please specify the version:  

## **Weaviate Deployment**  
  

- [ ] Local Deployment  



## **Steps to Reproduce**  
1. Go to the dashboard.  
2. Upload a JSON file with structured metadata.  
3. Metadata doesnt load, not does the title of the document, etc.

## **Additional Context**  
- Do I need to structure metadata differently for proper indexing?  
- Should Verba automatically generate chunks even when metadata is present?  
- If so, how should metadata fields be formatted?  
- Is there a recommended JSON structure for structured documents without manually defining chunks?  

**@thomashacker Any guidance on this?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON File Ingestion – Handling Metadata and Chunking #369