Skip to content

Comments

Mcv binary cache#166

Open
maryamtahhan wants to merge 10 commits intoredhat-et:mainfrom
maryamtahhan:mcv-binary-cache
Open

Mcv binary cache#166
maryamtahhan wants to merge 10 commits intoredhat-et:mainfrom
maryamtahhan:mcv-binary-cache

Conversation

@maryamtahhan
Copy link
Collaborator

@maryamtahhan maryamtahhan commented Feb 9, 2026

Enable vllm binary cache support for MCV

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan force-pushed the mcv-binary-cache branch 2 times, most recently from 0a95604 to c602a2a Compare February 9, 2026 12:35
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan marked this pull request as ready for review February 9, 2026 13:17
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan
Copy link
Collaborator Author

TODO - add torch_inductor dir

@maryamtahhan maryamtahhan removed the request for review from Billy99 February 10, 2026 16:07
@maryamtahhan maryamtahhan marked this pull request as draft February 10, 2026 16:07
@maryamtahhan maryamtahhan marked this pull request as ready for review February 23, 2026 11:13
@maryamtahhan
Copy link
Collaborator Author

No precache

(EngineCore_DP0 pid=22) INFO 02-23 01:35:37 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/8d0a361fbc/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=22) INFO 02-23 01:35:37 [backends.py:872] Dynamo bytecode transform time: 28.30 s
(EngineCore_DP0 pid=22) [rank0]:W0223 01:35:45.613000 22 torch/_inductor/utils.py:1613] Not enough SMs to use max_autotune_gemm mode
(EngineCore_DP0 pid=22) INFO 02-23 01:35:55 [backends.py:302] Cache the graph of compile range (1, 2048) for later use
(EngineCore_DP0 pid=22) INFO 02-23 01:36:01 [backends.py:319] Compiling a graph for compile range (1, 2048) takes 18.20 s
(EngineCore_DP0 pid=22) INFO 02-23 01:36:01 [monitor.py:34] torch.compile takes 46.50 s in total

with pre-cache:

(EngineCore_DP0 pid=22) INFO 02-23 03:12:47 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/8d0a361fbc/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=22) INFO 02-23 03:12:47 [backends.py:872] Dynamo bytecode transform time: 7.85 s
(EngineCore_DP0 pid=22) INFO 02-23 03:12:54 [backends.py:267] Directly load the compiled graph(s) for compile range (1, 2048) from the cache, took 1.273 s
(EngineCore_DP0 pid=22) INFO 02-23 03:12:54 [monitor.py:34] torch.compile takes 9.12 s in total

@maryamtahhan maryamtahhan requested a review from Billy99 February 23, 2026 11:14
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant