A Streamlit application that allows users to chat with pre-built document agents or create their own custom agents by simply uploading a PDF file. Each agent is equipped with a trained intent classifier in real-time to handle queries efficiently using a Retrieval-Augmented Generation (RAG) pipeline.
-
Multi-Agent Architecture: Start with pre-built agents (e.g., "gem5 Expert") and dynamically create new, custom agents for any PDF document.
-
Real-time Agent Creation: Uploading a PDF automatically triggers a full machine learning pipeline that chunks the document, creates a vector database, synthetically generates a training dataset, and trains a unique intent classifier for that agent.
-
Dynamic Intent Classification: Each custom agent uses its own
LogisticRegressionclassifier to distinguish between casual conversation and document-specific questions (doc_qna), ensuring resources are used efficiently. -
RAG Pipeline: Leverages
SentenceTransformerembeddings and a persistentChromaDBvector store to retrieve the most relevant context from a document before passing it to the language model. -
Source Attribution: Responses for custom agents include the source page numbers, enhancing trust and verifiability.
When a user uploads a new PDF, a 5-step process (visible via status toasts in the UI) creates a fully functional agent:
- Chunking: The PDF is parsed, and its text is split into manageable chunks. A new
ChromaDBcollection is created to store the embeddings for these chunks. - Data Generation: Keywords are extracted from the text chunks using
YAKE. These keywords are used to synthetically generate a list of relevant, document-specific questions, which will form thedoc_qnaportion of our training set. - Dataset Assembly: The generated
doc_qnaquestions are combined with a predefined list ofcasualquestions fromcasual_queries.csvto form a training dataset. - Embedding Generation: All training queries are converted into numerical vector embeddings using the
SentenceTransformermodel. - Classifier Training: A
LogisticRegressionmodel is trained on the embeddings. The final trained classifier and its correspondingLabelEncoderare saved as.joblibfiles, ready for use.
- Intent Classification: The user's query is first converted into an embedding and passed to the agent's unique, trained
LogisticRegressionmodel. The model classifies the intent as eitherdoc_qnaorcasual. - Conditional Routing:
- If
casual, the RAG pipeline is skipped, and a polite, pre-defined response is given. - If
doc_qna, the RAG pipeline is triggered.
- If
- Retrieval (RAG): The query embedding is used to search the agent's
ChromaDBcollection, retrieving the top 5 most relevant text chunks from the source document. - Generation (RAG): The retrieved chunks are formatted into a detailed prompt along with the original query. This is sent to the
gemini-1.5-flashLLM, which generates an answer based strictly on the provided context. - Display: The final response, including source citations, is displayed in the chat interface.
- Frontend: Streamlit
- Intent Classifier: Scikit-learn (
LogisticRegression) - Embedding Model:
sentence-transformers/all-MiniLM-L6-v2 - LLM: Google Gemini 1.5 Flash (
gemini-1.5-flash) - Vector Database:
ChromaDB - Keyword Extraction:
YAKE
To run this application on your local machine, follow these steps.
-
Clone the Repository
git clone https://github.com/Anand-786/multi-doc-agent.git cd multi-doc-agent -
Create and Activate a Virtual Environment
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Dependencies
pip install -r requirements.txt
-
Set Up Environment Variables Create a file named
.envin the project's root directory and add your Google API key:GOOGLE_API_KEY="your_api_key_here" -
Create the Casual Queries File Create a file named
casual_queries.csvin the root directory with the entries like this:query,intent "Hello there!",casual "Hey, how are you?",casual "What's up?",casual "Thank you!",casual "Thanks a lot",casual "bye",casual
-
Run the Application
streamlit run src/app_multi_agent.py

