feat: improve engine caching and fix bugs #3932

zewenli98 · 2025-11-24T23:30:39Z

Description

As I requested, TensorRT 10.14 added an argument trt.SerializationFlag.INCLUDE_REFIT to allow refitted engines to keep refittable. That means engines can be refitted multiple times. Based on the capability, this PR enhances the existing engine caching and refitting features as follows:

To save hard disk space, engine caching will only save weight-stripped engines on disk regardless of compilation_settings.strip_engine_weights. Then, when users pull out the cached engine, it will be automatically refitted and kept refittable.
Compiled TRT modules can be refitted multiple times with refit_module_weights(). e.g.:

for _ in range(3):
    trt_gm = refit_module_weights(trt_gm, exp_program)

Due to some changes, the insertion and pulling of cached engines are located in different places, which causes 🐛 [Bug] Engine cache failed on torch.compile backend=tensorrt #3909. This PR unified the insertion and pulling in _conversion.py.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/conversion/_conversion.py

narendasan · 2025-12-01T20:00:18Z

@cehongwang please take a pass so we have multiple eyes on this PR

py/torch_tensorrt/dynamo/backend/backends.py

py/torch_tensorrt/dynamo/conversion/_conversion.py

zewenli98 · 2025-12-19T09:31:39Z

The reason why JIT's output is not all zeros when strip_engine_weights=True is that AOT and JIT generate different GM before converting to TRT engine. JIT graphs are always weightless because weights are passed by input.

AOT's weights are stored in the model:

graph():
    %conv1_weight : [num_users=1] = get_attr[target=conv1.weight]
    %bn1_weight : [num_users=1] = get_attr[target=bn1.weight]
    %bn1_bias : [num_users=1] = get_attr[target=bn1.bias]
    %layer1_0_conv1_weight : [num_users=1] = get_attr[target=layer1.0.conv1.weight]
    %layer1_0_bn1_weight : [num_users=1] = get_attr[target=layer1.0.bn1.weight]
    %layer1_0_bn1_bias : [num_users=1] = get_attr[target=layer1.0.bn1.bias]
    %layer1_0_conv2_weight : [num_users=1] = get_attr[target=layer1.0.conv2.weight]
    %layer1_0_bn2_weight : [num_users=1] = get_attr[target=layer1.0.bn2.weight]
    %layer1_0_bn2_bias : [num_users=1] = get_attr[target=layer1.0.bn2.bias]
    ...
    %layer4_1_bn2_running_mean : [num_users=1] = get_attr[target=layer4.1.bn2.running_mean]
    %layer4_1_bn2_running_var : [num_users=1] = get_attr[target=layer4.1.bn2.running_var]
    %x : [num_users=1] = placeholder[target=x]
    %convolution : [num_users=1] = call_function[target=torch.ops.aten.convolution.default](args = (%x, %conv1_weight, None, [2, 2], [3, 3], [1, 1], False, [0, 0], 1), kwargs = {})
    %_native_batch_norm_legit_no_training : [num_users=1] = call_function[target=torch.ops.aten._native_batch_norm_legit_no_training.default](args = (%convolution, %bn1_weight, %bn1_bias, %bn1_running_mean, %bn1_running_var, 0.1, 1e-05), kwargs = {})

but JIT uses placeholder to get the weights on the fly, so there's actually no weights to be stripped.

graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %arg2_1 : [num_users=1] = placeholder[target=arg2_1]
    %arg3_1 : [num_users=1] = placeholder[target=arg3_1]
    %arg4_1 : [num_users=1] = placeholder[target=arg4_1]
    %arg5_1 : [num_users=1] = placeholder[target=arg5_1]
    %arg6_1 : [num_users=1] = placeholder[target=arg6_1]
    %arg7_1 : [num_users=1] = placeholder[target=arg7_1]
    %arg8_1 : [num_users=1] = placeholder[target=arg8_1]
    %arg9_1 : [num_users=1] = placeholder[target=arg9_1]
    ...
    %arg101_1 : [num_users=1] = placeholder[target=arg101_1]
    %arg102_1 : [num_users=1] = placeholder[target=arg102_1]
    %convolution : [num_users=1] = call_function[target=torch.ops.aten.convolution.default](args = (%arg1_1, %arg0_1, None, [2, 2], [3, 3], [1, 1], False, [0, 0], 1), kwargs = {})
    %_native_batch_norm_legit_no_training : [num_users=1] = call_function[target=torch.ops.aten._native_batch_norm_legit_no_training.default](args = (%convolution, %arg4_1, %arg5_1, %arg2_1, %arg3_1, 0.1, 1e-05), kwargs = {})

narendasan

LGTM, just make sure tests are passing

zewenli98 requested review from cehongwang, narendasan and peri044 November 24, 2025 23:30

zewenli98 self-assigned this Nov 24, 2025

meta-cla bot added the cla signed label Nov 24, 2025

github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: torch_compile labels Nov 24, 2025

This was referenced Nov 24, 2025

🐛 [Bug] Engine cache failed on torch.compile backend=tensorrt #3909

Open

feat: Enhance capability of engine caching and refitting #3789

Closed

narendasan reviewed Nov 25, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

narendasan reviewed Nov 25, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

zewenli98 force-pushed the improve_engine_caching branch from a54907e to ea81677 Compare December 4, 2025 18:38

narendasan reviewed Dec 5, 2025

View reviewed changes

py/torch_tensorrt/dynamo/backend/backends.py Show resolved Hide resolved

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

narendasan linked an issue Dec 9, 2025 that may be closed by this pull request

📖 [Story] Weightless Engine Building #3924

Open

narendasan reviewed Dec 10, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

narendasan reviewed Dec 10, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

narendasan reviewed Dec 10, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated Show resolved Hide resolved

narendasan reviewed Dec 11, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_conversion.py Show resolved Hide resolved

github-actions bot added the component: core Issues re: The core compiler label Dec 12, 2025

zewenli98 requested review from narendasan and removed request for peri044 December 19, 2025 09:26

narendasan approved these changes Dec 19, 2025

View reviewed changes

zewenli98 added 3 commits December 19, 2025 12:01

improve engine caching and fix bugs

d37eac8

reduce dims in tests

e5d4101

fix comments

b10fa31

zewenli98 added 6 commits December 19, 2025 12:01

resolve CUDA OOM issue in tests

47925b2

fix comments

aa683e0

remove unused args in interpret_module_to_result()

c0d6b8b

fix comments

0a51a72

fix comments and add warnings

85d9a0c

warn strip_engine_weights=True for JIT path

d3bbb94

zewenli98 force-pushed the improve_engine_caching branch from d42ef00 to d3bbb94 Compare December 19, 2025 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve engine caching and fix bugs #3932

feat: improve engine caching and fix bugs #3932

zewenli98 commented Nov 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zewenli98 commented Dec 19, 2025 •

edited

Loading

Uh oh!

narendasan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: improve engine caching and fix bugs #3932

Are you sure you want to change the base?

feat: improve engine caching and fix bugs #3932

Conversation

zewenli98 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zewenli98 commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zewenli98 commented Nov 24, 2025 •

edited

Loading

zewenli98 commented Dec 19, 2025 •

edited

Loading