EbroBot is a powerful, interactive chatbot application built using state-of-the-art language models. It incorporates the capabilities of OpenAI's GPT-4o-mini and Chroma vector database to deliver context-aware, high-quality responses while leveraging uploaded documents as a knowledge base. The project is designed to simplify human-computer interaction through natural language processing and knowledge retrieval.
This script serves as the main entry point for the chatbot interface. It:
- Configures and initializes the chatbot's language model (
ChatOpenAI) and vector database (Chroma). - Sets up file upload functionality to extend the chatbot's knowledge base dynamically.
- Implements a Gradio-based user interface for seamless interaction.
This script prepares the knowledge base by:
- Loading documents (e.g., PDFs) from the
datadirectory. - Splitting the documents into manageable chunks for efficient processing.
- Embedding the chunks using
OpenAIEmbeddingsand storing them in the Chroma vector database.
- Streamed Responses: Generates partial answers while processing longer queries for a real-time experience.
- Context-Aware Conversations: Leverages conversation history to provide coherent responses.
- Knowledge-Driven Answers: Responds based solely on the knowledge provided by the uploaded documents.
- Users can upload text-based files to dynamically enhance the chatbot's knowledge base.
- Documents are processed, embedded, and stored in the vector database for retrieval.
- Handles large datasets with efficient chunking and embedding strategies.
- Ensures persistence of the vector database for continuous use.
- Python 3.8+
- Install dependencies using:
pip install -r requirements.txt
- Ensure that the OpenAI API key is set up in a
.envfile:OPENAI_API_KEY=your_openai_api_key
To process and store documents in the vector database:
python ingest_database.pyStart the Gradio-based chatbot interface:
python chatbot.pyThe chatbot will be accessible at the URL provided by Gradio (e.g., http://127.0.0.1:****).
.
├── data/ # Directory for documents to be ingested
├── chroma_db/ # Persistent Chroma vector database
├── chatbot.py # Main chatbot script
├── ingest_database.py # Script to ingest documents
├── .env # Environment variables
├── requirements.txt # Python dependencies
- Support for additional file formats (e.g., Word, CSV).
- Advanced conversational memory to handle complex dialogs.
- Deployment on cloud platforms for scalability.
Contributions are welcome! Please follow the standard GitHub workflow:
- Fork the repository.
- Create a feature branch.
- Commit your changes.
- Submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI for providing the language model API.
- LangChain for the robust integration framework.
- The open-source community for their contributions.