Clipy: Shortform Hyper Intelligent Trimming

Clipy converts long form content into BRAINROT

Overview

Clipy takes long form content and produces several short form video clips

Demo

More Demos

TalkNet - Audio Visual Active Speaker Detection Demo

Installation

Requirements

ffmpeg (for rendering the video)
openai api key (content highlighting) ~ uses $0.035/hr of video with o3-mini
requirements in requirements.txt

Installation Steps

git clone https://github.com/rfheise/clipy.git
cd clipy
pip install -r requirements.txt

Usage

export OPENAI_API_KEY=<insert api key>
python -m clipy.main <optional arguments> -i <input file> -o <output directory>

Additional Arguments

Flag	Description	Default Value
--device	Torch Device For Running Models	cuda if cuda is detected else cpu
--gpt-highlighting-model	gpt model to use for content highlighting	o3-mini
--subtitle-model	subtitle model for generating subtitles (see openai whisper for more info)	turbo
--num-clips	number of clips to output	ceiling(runtime/5)
--debug-mode	runs in debug mode (debug mode runs significantly faster and caches everything but produces very poor quality output)	N/A
-h	shows additional configuration options	N/A

See Config.py for more details

Information about running

You need a gpu to run this software efficiently. Right now it takes ~10 minutes to process an hour of content using my 4090 with the turbo subtitle model. It takes ~1.5hrs to process an hour of content on my macbook using the cpu with tiny.en subtitle model.

You can also try to use gpt-o4-mini (used in debug mode) instead of o3-mini since it's a fraction of the cost. However, I've found that the results are significantly worse. You can also try any other model that you desire but I've found the best performance/cost model to be o3-mini.

Features

Automatically highlights the most interesting moments in a video
- Currently uses chatgpt to highlight the most interesting moments
- This feels like a grift and I plan on developing/finding a model that can run locally
Crops the video around the person speaking
Adds PIZZAZZ to the output video
- Subtitles
- More on the way

How Does It Work/Developer Information

See Dev-info.md for more details

Acknowledgements

The TalkNet & S3FD model weights and preprocessing steps are modified from this repository

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.vscode		.vscode
clipy		clipy
fonts		fonts
.DS_Store		.DS_Store
.gitignore		.gitignore
Dev-info.md		Dev-info.md
README-old.md		README-old.md
README.md		README.md
notes.txt		notes.txt
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clipy: Shortform Hyper Intelligent Trimming

Overview

Demo

TalkNet - Audio Visual Active Speaker Detection Demo

Installation

Requirements

Installation Steps

Usage

Additional Arguments

Information about running

Features

How Does It Work/Developer Information

Acknowledgements

About

Uh oh!

Releases

Uh oh!

Languages

rfheise/clipy

Folders and files

Latest commit

History

Repository files navigation

Clipy: Shortform Hyper Intelligent Trimming

Overview

Demo

TalkNet - Audio Visual Active Speaker Detection Demo

Installation

Requirements

Installation Steps

Usage

Additional Arguments

Information about running

Features

How Does It Work/Developer Information

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages