This project validates AI-generated answers against a financial loan document (PDF).
For every question–answer pair, the system determines whether the answer is:
✅ SUPPORTED — fully matches PDF
❌ NOT_SUPPORTED — no relevant match found
The detection uses semantic embeddings, numeric extraction, and similarity search.
| Component | Purpose |
|---|---|
| Python 3 | Core programming language |
| PyPDF2 | PDF text extraction |
| SentenceTransformers (MiniLM) | Embedding generation |
| FAISS | Fast vector similarity search |
| NumPy | Numerical processing |
| JSON | Input/output formats |
Install required libraries:
pip install PyPDF2 sentence-transformers faiss-cpu numpypip install faiss-cpu-windowscd srcStep 2 — Execute the script
python validator.py --pdf ../input-pdfs/axis_loan1.pdf --qa qa_samples.json --out ../validation_results.json| Argument | Meaning |
|---|---|
| Path to source PDF | |
| --qa | JSON file containing questions & answers |
| --out | Output file where validation results save |
📤 Output Format (validation_results.json)
{
"question": "What is the sanctioned loan amount?",
"ai_answer": "The sanctioned loan amount is Rs. 15,00,000.",
"validation_result": "SUPPORTED",
"confidence_score": 0.82,
"supporting_text": "[Page X] ... Facility Amount Rupees: 1,500,000 ..."
}Inside /screenshots, the following proof screenshots are available:
🗂 Project folder structure
🖥 Command-line execution of validator.py
These confirm the application works end-to-end as required.
Extract text from the PDF
Break it into meaningful chunks
Convert chunks → embeddings (MiniLM)
Convert Q&A → embeddings
Compare semantic similarity
Perform numeric extraction & matching
Generate decision label:
-
SUPPORTED
-
PARTIALLY_SUPPORTED
-
NOT_SUPPORTED
✔ Complete folder structure
✔ Full PDF → Q&A → Validation pipeline
✔ Final output JSON included
✔ Screenshots provided
✔ Easy-to-run instructions documented