Skip to content

Mayank471/image2speech-comics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Comic-to-Audio Converter

Convert your comic book panels into speech using OCR and Text-to-Speech (TTS) technology!

This project automatically detects panels in a comic image, extracts text using EasyOCR, and generates corresponding audio using gTTS. Finally, it merges the panel audios into a single voiceover file for a seamless listening experience.


πŸ“Œ Features

  • πŸ“– Comic Panel Detection
    Automatically splits comic pages into individual panels using image processing.

  • πŸ” Text Extraction
    Extracts English text from each panel using EasyOCR.

  • 🎀 Text-to-Speech
    Converts extracted text to audio using Google Text-to-Speech (gTTS).

  • 🎧 Audio Compilation
    Combines all panel audio files into one, with pauses between them.


πŸ› οΈ Installation

Ensure you're using a Python environment like Google Colab or Jupyter Notebook. Then install the dependencies:

pip install easyocr opencv-python numpy matplotlib gTTS pydub

Also, install FFmpeg for audio processing via pydub. In Google Colab, run:

!apt install ffmpeg

πŸš€ How to Use

βž• Add Your Comic Image

  • Place your comic image in the working directory.
  • Update the path in the code:
image_path = "Comic3.jpg"

▢️ Run the Main Process

This will:

  • βœ… Detect comic panels
  • βœ… Extract text from each panel
  • βœ… Generate TTS for each panel
  • βœ… Save panel audios
  • βœ… Merge them into a single audio file

🧠 How It Works

πŸ–ΌοΈ Panel Detection

  • Converts comic image to binary using thresholding.
  • Identifies white spaces to segment panels.

πŸ‘οΈ OCR

  • Uses EasyOCR to extract text from each panel image.

πŸ”Š TTS

  • Uses Google Text-to-Speech (gTTS) to convert text to MP3 files.

🎚️ Audio Merging

  • Uses pydub to concatenate all MP3s with short pauses in between.

πŸ“¦ Dependencies

  • easyocr
  • opencv-python
  • numpy
  • matplotlib
  • gTTS
  • pydub

System Dependency:

  • ffmpeg (external system dependency)

πŸ“„ License

This project is licensed under the MIT License.


πŸ™Œ Acknowledgments

  • Comic images used are for demonstration purposes only.
  • OCR by EasyOCR
  • TTS powered by gTTS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published