A Python-based desktop application that combines Optical Character Recognition (OCR) with an interactive table interface for extracting and organizing text from images.
OCRTableApp allows users to extract text from images and organize that text into spreadsheet-like tables through an intuitive point-and-click interface. The application features a sophisticated modular architecture with comprehensive GUI functionality and robust data management capabilities.
- Multiple Input Methods: Load images from files or capture screenshots in real-time
- Advanced Image Processing: Automatic resizing, grayscale conversion, and contrast enhancement (CLAHE)
- Multi-language Support: Powered by EasyOCR with configurable language settings (default: English)
- GPU Acceleration: Optional GPU support for faster text recognition
- Multi-monitor Support: Enhanced screenshot functionality across multiple displays
- Visual Word Selection: Click on detected words in images to insert them into tables
- Dynamic Grid: Customizable table dimensions with automatic expansion
- Multiple Navigation Modes:
- Horizontal (→): Move right after each entry
- Vertical (↓): Move down after each entry
- Wrap-around (⟳): Automatic grid expansion
- Real-time Visual Feedback: Color-coded bounding boxes and cell highlighting
- Undo/Redo System: Full command history with comprehensive state management
- Keyboard Navigation: Arrow key support and smart Tab navigation
- CSV Export: Save table data with integrated file dialog
- Direct Cell Editing: Modify table contents with inline editing
- Scrollable Interface: Handle large tables with smooth scrolling
- Language: Python 100%
- OCR Engine: EasyOCR
- Image Processing: OpenCV (cv2)
- GUI Framework: Tkinter with Matplotlib integration
- Additional Libraries: NumPy, Pillow (PIL), MSS
- Python 3.7+
- Git
- Clone the repository:
git clone https://github.com/MarianBirindzhiev/OCRTableApp.git
cd OCRTableApp- Install dependencies:
pip install -r requirements.txtopencv-python
easyocr
matplotlib
numpy
Pillow
mss
python main.py [image_path]python main.py --help
optional arguments:
image_path Path to the input image (optional)
--scale_percent Resize percent (default: 150)
--output_csv CSV output file (default: selected_words.csv)
--lang OCR language (default: en)python main.py document.jpg --scale_percent 200 --lang en- Load Image: Provide an image file or capture a screenshot
- OCR Processing: The application processes the image and detects text regions
- Visual Interface: View the image with red bounding boxes around detected text
- Word Selection: Click on any detected word to insert it into the table
- Table Navigation: Use navigation modes and keyboard shortcuts to organize data
- Export: Save your organized data as a CSV file
- Ctrl+Z: Undo last action
- Ctrl+Y: Redo last undone action
- Tab: Smart navigation based on current mode
- Arrow Keys: Navigate between table cells
OCRTableApp/
├── app/ # Application management layer
├── ocr/ # OCR processing and image handling
├── table_controller/ # Table interaction handlers
├── table_core/ # Core table logic and state management
├── table_ui/ # User interface components
├── utilities/ # Helper functions and constants
├── assets/ # Application icons and resources
├── .github/workflows/ # CI/CD automation
├── main.py # Application entry point
├── requirements.txt # Python dependencies
└── README.md # This file
The application follows a modular architecture with clear separation of concerns:
- OCR Module: Handles image processing, text recognition, and visual feedback
- Table System: Manages grid state, navigation, and data operations
- UI Layer: Provides interactive components and user interface
- Command System: Implements undo/redo functionality with command pattern
- Utilities: Shared functionality including logging, export, and configuration
The project includes GitHub Actions workflows for automated building:
- Windows: Generates
.exeexecutable - macOS: Creates
.appapplication bundle