-
Notifications
You must be signed in to change notification settings - Fork 110
[autorevert] Add support for job and test filtering in workflow restarts #7595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
| notes_parts.append(f"tests_filter={','.join(tests_to_include)}") | ||
| if notes: # Error message from exception | ||
| notes_parts.append(notes) | ||
| notes = "; ".join(notes_parts) if notes_parts else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are overwriting notes here, it can have been set in other parts of the code, maybe you want to create notes_parts right at the function start and poppulate it?
| ) | ||
| ) | ||
| deduped.append( | ||
| Signal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid maintenance hell and simplify things like this, all places where you do things like this please use dataclasses.replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered the issue I had with that: Signal, SignalCommit, and SignalEvent are regular classes, not dataclasses.
potentially a good suggestion, but should probably be done as a separate BE PR.
| filtered.append(e) | ||
| prev_key = key | ||
| new_commits.append( | ||
| SignalCommit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataclasses.replace
| ) | ||
|
|
||
| out.append( | ||
| Signal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataclasses.replace
| self._by_display[name] = ref | ||
| self._by_file[base] = ref | ||
|
|
||
| def get_input_support(self, workflow_name: str) -> WorkflowInputSupport: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the overcomplicated self._input_support_cache and replace with
@lru_cache(maxsize=512)
def get_input_support(self, workflow_name: str) -> WorkflowInputSupport:
# move the logic from self._fetch_and_parse_workflow_inputs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> Lint for aws/lambda/pytorch-auto-revert/pytorch_auto_revert/workflow_resolver.py:
Warning (FLAKE8) B019
Use of `functools.lru_cache` or `functools.cache` on methods can lead to
memory leaks. The cache may retain instance references, preventing garbage
collection.
See
130 | ref = self.require(workflow_name)
131 | return self._fetch_workflow_input_support(ref.file_name)
132 |
>>> 133 | @lru_cache(maxsize=None)
134 | def _fetch_workflow_input_support(self, file_name: str) -> WorkflowInputSupport:
135 | """Fetch and parse workflow input support with caching.
136 |
| for attempt in RetryWithBackoff(): | ||
| with attempt: | ||
| contents = self._repository.get_contents(path) | ||
| yaml_content = contents.decoded_content.decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call workflow = yaml.safe_load(yaml_content) here.
you can't trust that the success on do the request will actually lead to a successful response (even if it is 200, etc).
You can get garbage data, so, you might want to retry at this point.
| wf_ref = self.resolver.require(workflow_name) | ||
|
|
||
| # Check what inputs this workflow supports | ||
| input_support = self.resolver.get_input_support(workflow_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be resilient here.
Accept that this function could fail (raise an exception, etc).
And if it does, just assume the workflow does not support inputs and move on. It is usually better to be suboptimal when things are unstable than to not do anything at all and fail.
| repo = client.get_repo(f"{self.repo_owner}/{self.repo_name}") | ||
| workflow = repo.get_workflow(wf_ref.file_name) | ||
| proper_workflow_create_dispatch(workflow, ref=tag_ref, inputs={}) | ||
| proper_workflow_create_dispatch(workflow, ref=tag_ref, inputs=inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am being pedantic, but there is a problem of handling external APIs this way.
If proper_workflow_create_dispatch is failing, you will retry get_repo and get_workflow. This is not ideal.
You should RetryWithBackoff and validate the output of each request independently.
|
there are a few things I believe we should go over the code first (in other PRs) and fix before implementing those changes to avoid cascading bad standards. Let me know if you need any help on this. |
84160ab to
e3d3767
Compare
e3d3767 to
1a39a8e
Compare
|
@jeanschmidt addressed your comments (except some where I left a reply), this PR is tested and ready for review! |
|
consider reviewing commit-by-commit:
|
Adds support for more granular dispatches (test and job level filters, see pytorch/pytorch#168201) to autorevert.
Testing
(links lead to runs per commit, see issued from my account as results of local testing)
runs: https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F4816fd912210162bea4cdf34f7a39d2909477549
log: P2090410466
runs (filters by job and test):
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F9fe21ba6d0583790c1857485ede8e17c89ab9afd
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F3fc6a055e09174135cd839e723c4f0bdab9589b3
log P2090444414:
runs:
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2Feafa4f67d2afdca606eebbca50571b0ba1ab922b
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F96b3e7d78914f5db043e8b9ae3b3f72498abca4e
https://github.com/pytorch/pytorch/actions/workflows/pull.yml?query=branch%3Atrunk%2F7d49bd5060925055724d8976794cc1fd328066aa
log: P2090462639
run: https://github.com/pytorch/pytorch/actions/workflows/inductor.yml?query=branch%3Atrunk%2Fa79fbc97065538f756418e6e3bde02a708e893b5