A Python tool for finding matching segments in video transcripts using semantic similarity txt file only and a script txt file
- Supports WebVTT (.vtt) and custom timestamped text (.txt) formats
- Uses
paraphrase-mpnet-base-v2sentence transformer for accurate semantic matching - Finds top matching segments across multiple transcript files
- Handles large transcripts efficiently
- Clone repository:
git clone https://github.com/yourusername/TranscriptMatcher.git
cd TranscriptMatcherpip install sentence-transformers
pip install webvtt-py
pip install torchTranscript Formats WebVTT (.vtt): Standard subtitle format
text
<00:01:23.456> This is a sample transcript
<00:01:45.789> With custom timestamps=== Top Matches ===
1. video1
00:05:10 → 00:05:15 (score: 0.8723)
"This is the matching segment text"
...