Skip to content

Add retry with exponential backoff for transient GitHub API errors#222

Merged
matias-christensen-skydio merged 1 commit intomainfrom
retry-transient-github-errors
Feb 2, 2026
Merged

Add retry with exponential backoff for transient GitHub API errors#222
matias-christensen-skydio merged 1 commit intomainfrom
retry-transient-github-errors

Conversation

@matias-christensen-skydio
Copy link
Contributor

@matias-christensen-skydio matias-christensen-skydio commented Jan 30, 2026

Summary

GitHub's API occasionally returns transient 5xx errors or RESOURCE_LIMITS_EXCEEDED responses that would succeed on retry. These transient failures are especially frustrating when uploading a stack of PRs, as the entire operation fails and must be restarted.

This PR adds retry with exponential backoff (up to 3 attempts with 1s, 2s, 4s delays) for:

  • Transient HTTP errors (500, 502, 503, 504)
  • GraphQL RESOURCE_LIMITS_EXCEEDED errors

@aaron-skydio
Copy link
Contributor

flipped a coin between me and brian to review, came up brian

raise RevupRequestException(resp.status, r)

return r
ratelimit_reset = resp.headers.get("x-ratelimit-reset")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible future improvement: take this into account when there's a resource limit and either fail immediately (if the timeout is too long) or wait the full limit before retrying instead of using exponential backoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Added a TODO to document this as a future improvement.

GitHub's API occasionally returns transient 5xx errors or
RESOURCE_LIMITS_EXCEEDED responses that would succeed on retry.
These transient failures are especially frustrating when uploading
a stack of PRs, as the entire operation fails and must be restarted.

Retry up to 3 times with exponential backoff (1s, 2s, 4s delays).
@matias-christensen-skydio matias-christensen-skydio merged commit bb4b60d into main Feb 2, 2026
8 checks passed
@matias-christensen-skydio matias-christensen-skydio deleted the retry-transient-github-errors branch February 2, 2026 13:43
if self.session:
await self.session.close()

async def _retry_with_backoff(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name this _should_retry or _should_retry_backoff to indicate what it returns. retry_with_backoff makes it seem like the function does the retrying internally or something, which it doens't

self.session = ClientSession()

start_time = time.time()
# Retry config: 3 attempts with exponential backoff (1s, 2s, 4s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make at least max_retries and base_delay kwargs to the function. for transient_statuses and retryable_graphql_errors make them either kwargs or global constants with frozenset

"Ratelimit: {} remaining, resets at {}".format(
resp.headers.get("x-ratelimit-remaining"),
reset_timestamp,
for attempt in range(max_retries):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than stack the indentation here i'd rather rename this function _graphql_once and make a new function graphql() that becomes the new entry point and handles only the retrying. this would help make the function less long, and it can pass error info via the exception and catch it if needed

@jerry-skydio
Copy link
Collaborator

a few small comments, can you make a follow up to address them? otherwise changes look good

matias-christensen-skydio added a commit that referenced this pull request Feb 16, 2026
Address jerry-skydio's review comments:
- Rename _retry_with_backoff to _should_retry to reflect the return value
- Extract transient_statuses and retryable_graphql_errors to module-level
  frozenset constants, make max_retries and base_delay kwargs on graphql()
- Split graphql into _graphql_once (single attempt) and graphql (retry
  wrapper) to reduce indentation and separate concerns
matias-christensen-skydio added a commit that referenced this pull request Feb 16, 2026
Address jerry-skydio's review comments:
- Rename _retry_with_backoff to _should_retry to reflect the return value
- Extract transient_statuses and retryable_graphql_errors to module-level
  frozenset constants, make max_retries and base_delay kwargs on graphql()
- Split graphql into _graphql_once (single attempt) and graphql (retry
  wrapper) to reduce indentation and separate concerns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments