[autorevert] Add support for job and test filtering in workflow restarts #7595

izaitsevfb · 2025-12-16T00:27:11Z

Adds support for more granular dispatches (test and job level filters, see pytorch/pytorch#168201) to autorevert.

Job/test filtering: When restarting workflows, only re-run specific failed jobs and tests instead of the entire workflow (uses workflow_dispatch inputs when supported)
Workflow resolver: Parses workflow YAML files to detect which inputs are available for filtering
New CLI subcommand: restart-workflow for manually triggering filtered workflow restarts
Fallback behavior: Workflows without input support (e.g., inductor) fall back to full workflow restart

Testing

(links lead to runs per commit, see issued from my account as results of local testing)

manual dispatch testing:

python -m pytorch_auto_revert restart-workflow pull  4816fd912210162bea4cdf34f7a39d2909477549 --jobs "linux-jammy-py3.10-gcc11" --tests "distributed/test_functional_differentials"

runs: https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F4816fd912210162bea4cdf34f7a39d2909477549

granular restart on trunk:

python -m pytorch_auto_revert autorevert-checker pull --hours 12 --as-of "2025-12-18 06:25" --hud-html

log: P2090410466

runs (filters by job and test):
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F9fe21ba6d0583790c1857485ede8e17c89ab9afd
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F3fc6a055e09174135cd839e723c4f0bdab9589b3

many restarts

python -m pytorch_auto_revert --dry-run autorevert-checker pull --hours 18  --hud-html --as-of "2025-12-19 22:00"

log P2090444414:

runs:
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2Feafa4f67d2afdca606eebbca50571b0ba1ab922b
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F96b3e7d78914f5db043e8b9ae3b3f72498abca4e
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F7d49bd5060925055724d8976794cc1fd328066aa

workflow without input support (inductor)

python -m pytorch_auto_revert  autorevert-checker inductor --hours 64 --hud-html --as-of "2025-12-18 17:31"

log: P2090462639

run: https://github.com/pytorch/pytorch/actions/workflows/inductor.yml?query=branch%3Atrunk%2Fa79fbc97065538f756418e6e3bde02a708e893b5

vercel · 2025-12-16T00:27:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
torchci	Ignored	Preview	Dec 19, 2025 10:39pm

jeanschmidt · 2025-12-17T19:26:56Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_actions.py

+            notes_parts.append(f"tests_filter={','.join(tests_to_include)}")
+        if notes:  # Error message from exception
+            notes_parts.append(notes)
+        notes = "; ".join(notes_parts) if notes_parts else ""


you are overwriting notes here, it can have been set in other parts of the code, maybe you want to create notes_parts right at the function start and poppulate it?

jeanschmidt · 2025-12-17T19:32:06Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

                    )
                )
            deduped.append(
                Signal(


to avoid maintenance hell and simplify things like this, all places where you do things like this please use dataclasses.replace.

I remembered the issue I had with that: Signal, SignalCommit, and SignalEvent are regular classes, not dataclasses.

potentially a good suggestion, but should probably be done as a separate BE PR.

jeanschmidt · 2025-12-17T19:33:02Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

                    filtered.append(e)
                    prev_key = key
                new_commits.append(
                    SignalCommit(


dataclasses.replace

jeanschmidt · 2025-12-17T19:33:34Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

                )

            out.append(
                Signal(


dataclasses.replace

jeanschmidt · 2025-12-17T19:42:21Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_resolver.py

                    self._by_display[name] = ref
                    self._by_file[base] = ref
+
+    def get_input_support(self, workflow_name: str) -> WorkflowInputSupport:


remove the overcomplicated self._input_support_cache and replace with

@lru_cache(maxsize=512) def get_input_support(self, workflow_name: str) -> WorkflowInputSupport: # move the logic from self._fetch_and_parse_workflow_inputs here

>>> Lint for aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_resolver.py: Warning (FLAKE8) B019 Use of `functools.lru_cache` or `functools.cache` on methods can lead to memory leaks. The cache may retain instance references, preventing garbage collection. See 130 | ref = self.require(workflow_name) 131 | return self._fetch_workflow_input_support(ref.file_name) 132 | >>> 133 | @lru_cache(maxsize=None) 134 | def _fetch_workflow_input_support(self, file_name: str) -> WorkflowInputSupport: 135 | """Fetch and parse workflow input support with caching. 136 |

jeanschmidt · 2025-12-17T19:45:11Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_resolver.py

+        for attempt in RetryWithBackoff():
+            with attempt:
+                contents = self._repository.get_contents(path)
+                yaml_content = contents.decoded_content.decode("utf-8")


call workflow = yaml.safe_load(yaml_content) here.

you can't trust that the success on do the request will actually lead to a successful response (even if it is 200, etc).

You can get garbage data, so, you might want to retry at this point.

jeanschmidt · 2025-12-17T19:50:23Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_checker.py

        wf_ref = self.resolver.require(workflow_name)

+        # Check what inputs this workflow supports
+        input_support = self.resolver.get_input_support(workflow_name)


be resilient here.

Accept that this function could fail (raise an exception, etc).

And if it does, just assume the workflow does not support inputs and move on. It is usually better to be suboptimal when things are unstable than to not do anything at all and fail.

jeanschmidt · 2025-12-17T19:52:24Z

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_checker.py

                repo = client.get_repo(f"{self.repo_owner}/{self.repo_name}")
                workflow = repo.get_workflow(wf_ref.file_name)
-                proper_workflow_create_dispatch(workflow, ref=tag_ref, inputs={})
+                proper_workflow_create_dispatch(workflow, ref=tag_ref, inputs=inputs)


I am being pedantic, but there is a problem of handling external APIs this way.

If proper_workflow_create_dispatch is failing, you will retry get_repo and get_workflow. This is not ideal.

You should RetryWithBackoff and validate the output of each request independently.

jeanschmidt · 2025-12-17T19:54:23Z

there are a few things I believe we should go over the code first (in other PRs) and fix before implementing those changes to avoid cascading bad standards.

Let me know if you need any help on this.

… inputs

add tests for workflow parsing

izaitsevfb · 2025-12-20T00:08:08Z

@jeanschmidt addressed your comments (except some where I left a reply), this PR is tested and ready for review!

izaitsevfb · 2025-12-20T00:10:28Z

consider reviewing commit-by-commit:

the first one you already saw
the second one adds the local CLI to restart jobs + minor fix
addresses review comments

pytorch-bot bot added the ci-no-td label Dec 16, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2025

izaitsevfb marked this pull request as draft December 16, 2025 00:27

jeanschmidt reviewed Dec 17, 2025

View reviewed changes

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_extraction.py

)

out.append(

Signal(

Copy link

Contributor

jeanschmidt Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataclasses.replace

jeanschmidt reviewed Dec 17, 2025

View reviewed changes

izaitsevfb force-pushed the granular-restarts branch from 84160ab to e3d3767 Compare December 17, 2025 23:49

izaitsevfb added 3 commits December 19, 2025 13:48

[autorevert] Add support for job and test filtering in workflow restarts

9d36e4d

add a restart-workflow subcommand to manually restart workflow with…

ffe507c

… inputs

address review comments

1a39a8e

add tests for workflow parsing

izaitsevfb force-pushed the granular-restarts branch from e3d3767 to 1a39a8e Compare December 19, 2025 22:39

izaitsevfb requested a review from jeanschmidt December 20, 2025 00:04

izaitsevfb marked this pull request as ready for review December 20, 2025 00:05

[autorevert] Add support for job and test filtering in workflow restarts #7595

Are you sure you want to change the base?

[autorevert] Add support for job and test filtering in workflow restarts #7595

Uh oh!

Conversation

izaitsevfb commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

vercel bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeanschmidt commented Dec 17, 2025

Uh oh!

izaitsevfb commented Dec 20, 2025

Uh oh!

izaitsevfb commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

izaitsevfb commented Dec 16, 2025 •

edited

Loading

vercel bot commented Dec 16, 2025 •

edited

Loading