This project is a multi-modal system designed to interpret crime-related images and textual observations to generate contextual insights, similarity-based reasoning, and a narrated video summarizing the incident. It uses image captioning (BLIP-2), semantic similarity (SBERT), and TTS + video generation.
├── main.py # Main script to run the interpreter
├── Facts.csv # CSV file containing factual statements and reasoning
├── crime_story.mp4 # Output video narration (generated)
├── narration.mp3 # Audio file generated from narrative
└── README.md # This file
Install the following Python packages:
pip install transformers sentence-transformers pandas torch gtts moviepy pillowSalesforce/blip2-opt-2.7bvia Hugging Face Transformersall-MiniLM-L6-v2via Sentence Transformers
These will be downloaded automatically the first time you run the script.
You must provide a Facts.csv file with the following columns:
fact: factual observationreasoning: reasoning behind the fact
Ensure images are in .jpg, .jpeg, or .png formats.
- Download & install ImageMagick
- Ensure the executable path is correct in:
mpy_config.change_settings({"IMAGEMAGICK_BINARY": r"C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI\\magick.exe"})
os.environ["IMAGEMAGICK_PATH"] = r"C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI\\magick.exe"Change it based on your system if necessary.
Ensure arial.ttf or any .ttf font file is accessible for PIL. Modify this line if needed:
font = ImageFont.truetype("arial.ttf", 40)- Run the script:
python main.py- Input image paths and/or text separated by commas. Example:
image1.jpg, image2.jpg, There was blood on the floor-
The system will:
- Caption images
- Find the most similar fact and reasoning
- Print the analysis
- Generate a crime story narrative
- Produce a narrated video:
crime_story.mp4
crime_story.mp4— a narrated video constructed from text+image reasoningnarration.mp3— audio narration of the crime story- CLI outputs for each step
- Rohan Raghav – Full Stack Developer & Machine Learning Enthusiast
Feel free to modify this project to suit forensic, educational, or AI storytelling needs!