DoctorG 2.0 - AI Medical Reasoning Assistant

Transform from a static prediction tool into an intelligent Medical AI Assistant with fine-tuned LLM, RAG memory, and real-time streaming.

🎯 Phase 1 Features

Fine-tuned Medical LLM (Mistral-7B + LoRA)
RAG Memory Engine (FAISS + PostgreSQL)
Real-time SSE Streaming
Subscription Logic (Free/Premium tiers)
Modern Dark UI (ChatGPT-style)
Feedback Learning System
Production-Ready (Docker + GPU support)

🏗️ Architecture

User → Next.js Frontend (React + Zustand)
  ↓
FastAPI Backend (Python + Async)
  ↓
Medical LLM (Mistral-7B-LoRA) + RAG (FAISS)
  ↓
PostgreSQL + Redis

📋 Prerequisites

Python 3.10+
Node.js 18+
Docker & Docker Compose
NVIDIA GPU (for training, optional for inference)
CUDA 11.8+ (for GPU training)
16GB+ RAM (32GB recommended)
50GB+ Disk Space

🚀 Quick Start

1. Clone and Setup

git clone <your-repo>
cd doctorg

# Copy environment file
cp .env.example .env

# Edit .env with your API keys
nano .env

2. Configure Environment Variables

Edit .env file:

# Required - Add your OpenAI API key
OPENAI_API_KEY=sk-proj-your_key_here

# Database (auto-configured in Docker)
POSTGRES_PASSWORD=your_secure_password_here
JWT_SECRET=your_jwt_secret_min_32_chars

# Optional - for dataset augmentation
GOOGLE_API_KEY=your_google_key_here
PUBMED_EMAIL=your_email@example.com

3. Run with Docker (Recommended)

# Build and start all services
docker-compose up --build

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

4. Manual Setup (Development)

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run database migrations
python -c "from app.db.database import init_db; init_db()"

# Start backend server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Access at http://localhost:3000

🎓 Training the Medical LLM

Step 1: Prepare Training Data

cd backend

# Activate virtual environment
source venv/bin/activate

# Run data preparation (converts CSV to instruction format)
python scripts/prepare_training_data.py

This creates:

backend/data/training/train.jsonl - Training data
backend/data/training/val.jsonl - Validation data

Step 2: (Optional) Augment with External Data

# Fetch PubMed abstracts and Clinical QA datasets
python scripts/web_agent.py

# This downloads:
# - PubMed medical abstracts (1000+)
# - MedQA clinical questions
# - PubMedQA dataset

Step 3: Fine-tune with GPU

Requirements:

NVIDIA GPU with 16GB+ VRAM (RTX 3090, A100, etc.)
CUDA 11.8+ installed
PyTorch with CUDA support

# Verify GPU is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# Start fine-tuning (takes 2-6 hours depending on GPU)
python scripts/train_llm.py

Training Configuration:

Base Model: Mistral-7B-v0.1
Method: LoRA (Low-Rank Adaptation)
Epochs: 3
Batch Size: 4 (adjust based on VRAM)
Learning Rate: 2e-4
Quantization: 8-bit (reduces VRAM usage)

Expected Output:

Loading model: mistralai/Mistral-7B-v0.1
Model loaded successfully
LoRA configuration created
trainable params: 4,194,304 || all params: 7,241,732,096 || trainable%: 0.0579
Starting training...
Epoch 1/3: 100%|██████████| 500/500 [1:23:45<00:00]
Saving model to backend/models/doctorg-medical-llm
Training completed successfully!

Step 4: Test the Model

# Test inference
python -c "
from backend.scripts.train_llm import MedicalLLMTrainer

trainer = MedicalLLMTrainer()
prompt = '''You are a medical AI assistant. Analyze the symptoms and provide a structured medical assessment.

Symptoms: headache, fever, fatigue

Provide your response in JSON format:'''

response = trainer.test_inference(prompt)
print(response)
"

Training on Cloud GPU (Alternative)

If you don't have a local GPU:

Google Colab (Free GPU):

# Upload your code to Google Drive
# Open Google Colab notebook
# Mount Drive and run:

!pip install -r requirements.txt
!python scripts/prepare_training_data.py
!python scripts/train_llm.py

AWS/GCP/Azure:

Launch GPU instance (g4dn.xlarge on AWS, n1-standard-4 with T4 on GCP)
Clone repository
Run training scripts
Download trained model

🐳 Docker Deployment

Production Deployment

# Build for production
docker-compose -f docker-compose.yml up --build -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Stop and remove volumes (clean slate)
docker-compose down -v

GPU Support in Docker

Edit docker-compose.yml to enable GPU:

services:
  backend:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Then run:

# Requires nvidia-docker2 installed
docker-compose up --build

Environment-Specific Configs

# Development
docker-compose -f docker-compose.yml up

# Production with GPU
docker-compose -f docker-compose.prod.yml up

# Staging
docker-compose -f docker-compose.staging.yml up

📊 Using the Application

1. Register an Account

curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "user@example.com",
    "password": "securepassword123",
    "full_name": "John Doe"
  }'

2. Login

curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "user@example.com",
    "password": "securepassword123"
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer",
  "expires_in": 3600
}

3. Get Medical Consultation

Via Web UI:

Open http://localhost:3000
Login with your credentials
Describe your symptoms
Get real-time streaming response

Via API:

curl -X POST http://localhost:8000/api/v1/chat/predict \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "symptoms": ["headache", "fever", "fatigue"]
  }'

4. Submit Feedback

curl -X POST http://localhost:8000/api/v1/feedback \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "session_id": "session-id-here",
    "rating": 5,
    "helpful": true,
    "comments": "Very helpful diagnosis!"
  }'

🔧 Configuration

Subscription Tiers

Free Tier:

5 sessions per month
No memory/history
Basic medical insights

Premium Tier:

Unlimited sessions
Full RAG memory (past consultations)
Detailed follow-up questions
Priority support

Adjusting Limits

Edit backend/app/core/constants.py:

class SubscriptionLimits:
    FREE_SESSION_LIMIT = 5  # Change to desired limit
    PREMIUM_SESSION_LIMIT = -1  # -1 = unlimited

🧪 Testing

Backend Tests

cd backend
pytest tests/ -v

Frontend Tests

cd frontend
npm test

End-to-End Test

# Start all services
docker-compose up -d

# Run E2E tests
npm run test:e2e

📈 Monitoring

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2026-02-15T10:30:00",
  "services": {
    "database": "connected",
    "llm": "ready",
    "rag": "ready"
  }
}

Logs

# Backend logs
docker-compose logs -f backend

# Frontend logs
docker-compose logs -f frontend

# Database logs
docker-compose logs -f postgres

🔒 Security

✅ No hardcoded secrets (all in .env)
✅ Bcrypt password hashing
✅ JWT authentication with expiration
✅ SQL injection prevention (ORM)
✅ XSS protection (React escaping)
✅ CORS configured
✅ Security headers enabled
✅ Rate limiting implemented

Security Best Practices

Change default passwords in .env
Use strong JWT secret (min 32 characters)
Enable HTTPS in production
Regular dependency updates: pip list --outdated
Backup database regularly

🐛 Troubleshooting

GPU Not Detected

# Check CUDA installation
nvidia-smi

# Check PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"

# Reinstall PyTorch with CUDA
pip install torch --index-url https://download.pytorch.org/whl/cu118

Docker Issues

# Clean rebuild
docker-compose down -v
docker-compose build --no-cache
docker-compose up

# Check container logs
docker-compose logs backend

Database Connection Error

# Reset database
docker-compose down -v
docker-compose up postgres -d
sleep 10
docker-compose up backend

Port Already in Use

# Find and kill process on port 8000
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F

# Linux/Mac:
lsof -ti:8000 | xargs kill -9

📚 API Documentation

Interactive API docs available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

This project is licensed under the MIT License.

👥 Team

Developer: Abhishek Gupta
GitHub: @cosmos-dx
LinkedIn: abhishek-gupta

🙏 Acknowledgments

Mistral AI for the base model
Hugging Face for transformers library
OpenAI for API integration
FastAPI and Next.js communities

📞 Support

For issues and questions:

GitHub Issues: Create an issue
Email: support@doctorg.ai
Discord: Join our community

⚠️ Medical Disclaimer: DoctorG is an AI assistant for educational purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of qualified healthcare providers with questions regarding medical conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.cursor		.cursor
agent		agent
backend		backend
en_core_web_sm		en_core_web_sm
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
DoctorG.ipynb		DoctorG.ipynb
Info		Info
LICENSE		LICENSE
README.md		README.md
disease_prediction_model.h5		disease_prediction_model.h5
docker-compose.yml		docker-compose.yml
doctorg_data.csv		doctorg_data.csv
doctorg_img.png		doctorg_img.png
doctorg_model.tflite		doctorg_model.tflite
doctorg_processed.csv		doctorg_processed.csv
favicon.ico		favicon.ico
main.py		main.py
midway.py		midway.py
requirements.txt		requirements.txt
start.bat		start.bat
start.sh		start.sh
train_doctorg.py		train_doctorg.py

License

cosmos-dx/Doctorg--Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

DoctorG 2.0 - AI Medical Reasoning Assistant

🎯 Phase 1 Features

🏗️ Architecture

📋 Prerequisites

🚀 Quick Start

1. Clone and Setup

2. Configure Environment Variables

3. Run with Docker (Recommended)

4. Manual Setup (Development)

Backend Setup

Frontend Setup

🎓 Training the Medical LLM

Step 1: Prepare Training Data

Step 2: (Optional) Augment with External Data

Step 3: Fine-tune with GPU

Step 4: Test the Model

Training on Cloud GPU (Alternative)

🐳 Docker Deployment

Production Deployment

GPU Support in Docker

Environment-Specific Configs

📊 Using the Application

1. Register an Account

2. Login

3. Get Medical Consultation

4. Submit Feedback

🔧 Configuration

Subscription Tiers

Adjusting Limits

🧪 Testing

Backend Tests

Frontend Tests

End-to-End Test

📈 Monitoring

Health Check

Logs

🔒 Security

Security Best Practices

🐛 Troubleshooting

GPU Not Detected

Docker Issues

Database Connection Error

Port Already in Use

📚 API Documentation

🤝 Contributing

📄 License

👥 Team

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages