- RAG Engine: FAISS vector database with OpenAI GPT-3.5-turbo
- Sentiment Analysis: RoBERTa-based classification model
- Document Processing: Multi-format ingestion (TXT, MD, PDF, DOCX)
- Session Management: Persistent conversation state
- Escalation System: Automated human handoff triggers
sequenceDiagram
participant U as User
participant UI as Streamlit UI
participant SA as Sentiment Analyzer
participant RAG as RAG Engine
participant VS as Vector Store
participant LLM as OpenAI GPT-3.5
participant ES as Escalation System
U->>UI: Submit message
UI->>SA: Analyze sentiment
SA-->>UI: Return sentiment score
alt Negative sentiment > 70%
UI->>ES: Trigger escalation
ES-->>UI: Escalation prompt
UI-->>U: Offer human agent
else Normal processing
UI->>RAG: Process query
RAG->>VS: Retrieve relevant documents
VS-->>RAG: Return context chunks
RAG->>LLM: Generate response with context
LLM-->>RAG: Return AI response
RAG-->>UI: Formatted response
UI-->>U: Display response
alt Low confidence response
UI->>ES: Trigger escalation
ES-->>UI: Escalation prompt
UI-->>U: Offer human agent
end
end
- Python 3.8 or higher
- OpenAI API key
- Git
- Clone repository:
git clone <repository-url>
cd grad_project- Install dependencies:
pip install -r requirements.txt- Configure environment variables in
.env:
OPENAI_API_KEY=your_api_key_here- Create knowledge base directory:
mkdir -p data/documents- Launch application:
streamlit run clickatell_chatbot_single.py| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI API authentication key | Yes |
The system uses the following models (configurable in clickatell_chatbot_single.py):
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHAT_MODEL = "gpt-3.5-turbo"
CHUNK_SIZE = 600
CHUNK_OVERLAP = 80
SEARCH_RESULTS = 5Sentiment thresholds in analyze_sentiment():
if sentiment["label"] == "negative" and sentiment["score"] > 0.7:
escalation_reason = "negative_sentiment"grad_project/
├── clickatell_chatbot_single.py # Main application
├── README.md # Documentation
├── requirements.txt # Dependencies
├── .env # Environment configuration
├── data/
│ └── documents/ # Knowledge base files
├── vector_store/ # FAISS index storage
└── components/
└── ui/
└── assets/
└── logo.png # Application logo
The load_documents_from_folder() function handles multi-format document ingestion:
def load_documents_from_folder():
"""Load all supported documents from data/documents folder."""
# Supports .txt, .md, .pdf, .docx formats
# Returns list of Document objects with metadataThe create_vector_store() function manages FAISS index creation and loading:
def create_vector_store():
"""Create or load FAISS vector store from documents folder."""
# Handles index persistence and document chunking
# Returns configured FAISS storeThe create_chat_chain() function builds the RAG pipeline:
def create_chat_chain(vector_store):
"""Create the conversational RAG chain"""
# Combines retriever, prompt template, and LLM
# Returns RunnableWithMessageHistory instanceThe analyze_sentiment() function processes user input:
def analyze_sentiment(text, session_id=None):
"""Analyze sentiment using RoBERTa model."""
# Returns {"label": str, "score": float}
# Handles preprocessing for social media textThe trigger_escalation() function manages human handoff:
def trigger_escalation(reason, session_id):
"""Generate appropriate escalation message based on trigger reason."""
# Handles different escalation scenarios
# Returns formatted escalation promptstreamlit>=1.28.0
langchain>=0.1.0
langchain-community>=0.0.20
langchain-openai>=0.0.5
langchain-huggingface>=0.0.1
faiss-cpu>=1.7.4
transformers>=4.35.0
torch>=2.0.0
python-dotenv>=1.0.0
PyPDF2>=3.0.1
docx2txt>=0.8- Start the application using
streamlit run clickatell_chatbot_single.py - Access the interface at
http://localhost:8501 - Add knowledge base documents to
data/documents/ - Interact through the chat interface
Supported document formats:
- Text files (.txt, .md)
- PDF documents (.pdf)
- Word documents (.docx)
Documents are automatically processed and indexed on application startup.
The system triggers escalation under these conditions:
- Negative sentiment with confidence > 70%
- AI response contains knowledge limitation indicators
- Processing errors occur
API Key Error
ValueError: OPENAI_API_KEY not found in environment variables
Solution: Verify .env file exists with valid API key
Document Loading Error
No documents found in data/documents folder
Solution: Create directory and add supported file formats
Vector Store Error
Failed to load vector store
Solution: Delete vector_store/ directory to force rebuild
- Limit document size for faster processing
- Adjust
CHUNK_SIZEbased on content complexity - Monitor OpenAI API rate limits and usage
The application follows a modular architecture:
- Configuration: Constants and environment setup
- AI Components: RAG pipeline and sentiment analysis
- UI Components: Streamlit interface elements
- Main Application: Orchestration and message processing
- initialize_embeddings(): HuggingFace embedding model setup
- initialize_sentiment_analyzer(): RoBERTa sentiment model initialization
- process_message(): Main message processing pipeline
- main(): Application entry point


