A secure, high-performance PDF text extraction tool with a cosmic-inspired interface.
DarkHole extracts text from PDFs using multiple fallback methods to ensure maximum accuracy and reliability.
- Lightning Fast - Process PDFs in seconds with optimized extraction engines
- Secure & Private - Files processed locally with session isolation, automatic cleanup
- High Accuracy - Multi-engine approach with OCR fallback for scanned documents
Multi-Engine Extraction:
- PDFMiner for structural text extraction
- PyMuPDF for complex layouts
- OCR (Tesseract) for scanned documents
Smart Processing:
- Automatic method selection based on PDF type
- Resource limits and timeout protection
- Comprehensive error handling
- Session-based file isolation prevents conflicts
- Automatic cleanup of temporary files
- Mobile-optimized responsive design
- Security hardening with path validation
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.pyVisit http://localhost:5000 to start extracting text from your PDFs.
- Backend: Flask, Python 3.11+
- PDF Processing: PDFMiner, PyMuPDF, pdf2image
- OCR: Tesseract, pytesseract
- Frontend: Vanilla JS, CSS3 with animations
- Deployment: Gunicorn, Render-ready
- Session-based file isolation
- Path traversal protection
- Input validation and sanitization
- Resource limits and timeouts
- Automatic temporary file cleanup
Fully responsive design optimized for mobile devices with touch-friendly interactions and performance optimizations.



