Skip to content

feat(pipelines): add optional model instance caching to ModelLedger#118

Open
Benjamin Cowen (BenCowen) wants to merge 1 commit intoLightricks:mainfrom
BenCowen:features/model-instance-caching
Open

feat(pipelines): add optional model instance caching to ModelLedger#118
Benjamin Cowen (BenCowen) wants to merge 1 commit intoLightricks:mainfrom
BenCowen:features/model-instance-caching

Conversation

@BenCowen

Summary

  • Extends ModelLedger to optionally cache model instances via cache_models=True
  • Upgrades TI2VidTwoStagesPipeline to thread this option through to both stage ledgers

Motivation

When serving TI2VidTwoStagesPipeline for repeated inference (e.g. behind an API or on a persistent GPU worker), every call to the pipeline rebuilds all models from scratch — loading weights from disk, fusing LoRAs, and moving tensors to GPU. For a 19B parameter model this can add significant per-request overhead on warm containers.

Changes

  • ModelLedger: Added cache_models flag and _model_cache dict. When enabled, factory methods (.transformer(), .video_encoder(), .text_encoder(), etc.) return cached instances on subsequent calls instead of rebuilding. A clear_model_cache() method is provided for explicit memory management. The flag propagates through with_loras().
  • TI2VidTwoStagesPipeline: Accepts cache_models kwarg and forwards it to ModelLedger.

Default behavior (cache_models=False) is unchanged.

Results

Using Modal to benchmark warm vs. cold container inferences, the original code has about a 1.1x speedup. The proposed method achieves a 1.3x speedup on high-res videos (240 frames of 1536 x 1024), and 2-2.5x speedup on lower-resolution (e.g. 512 x 512).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant