Convert your comic book panels into speech using OCR and Text-to-Speech (TTS) technology!
This project automatically detects panels in a comic image, extracts text using EasyOCR, and generates corresponding audio using gTTS. Finally, it merges the panel audios into a single voiceover file for a seamless listening experience.
-
π Comic Panel Detection
Automatically splits comic pages into individual panels using image processing. -
π Text Extraction
Extracts English text from each panel using EasyOCR. -
π€ Text-to-Speech
Converts extracted text to audio using Google Text-to-Speech (gTTS). -
π§ Audio Compilation
Combines all panel audio files into one, with pauses between them.
Ensure you're using a Python environment like Google Colab or Jupyter Notebook. Then install the dependencies:
pip install easyocr opencv-python numpy matplotlib gTTS pydubAlso, install FFmpeg for audio processing via pydub. In Google Colab, run:
!apt install ffmpeg- Place your comic image in the working directory.
- Update the path in the code:
image_path = "Comic3.jpg"This will:
- β Detect comic panels
- β Extract text from each panel
- β Generate TTS for each panel
- β Save panel audios
- β Merge them into a single audio file
- Converts comic image to binary using thresholding.
- Identifies white spaces to segment panels.
- Uses EasyOCR to extract text from each panel image.
- Uses Google Text-to-Speech (gTTS) to convert text to MP3 files.
- Uses pydub to concatenate all MP3s with short pauses in between.
- easyocr
- opencv-python
- numpy
- matplotlib
- gTTS
- pydub
System Dependency:
- ffmpeg (external system dependency)
This project is licensed under the MIT License.
- Comic images used are for demonstration purposes only.
- OCR by EasyOCR
- TTS powered by gTTS