Add retry with exponential backoff for transient GitHub API errors#222
Conversation
ba99348 to
4082cab
Compare
79d2b62 to
abc7a2e
Compare
|
flipped a coin between me and brian to review, came up brian |
| raise RevupRequestException(resp.status, r) | ||
|
|
||
| return r | ||
| ratelimit_reset = resp.headers.get("x-ratelimit-reset") |
There was a problem hiding this comment.
Possible future improvement: take this into account when there's a resource limit and either fail immediately (if the timeout is too long) or wait the full limit before retrying instead of using exponential backoff.
There was a problem hiding this comment.
Good suggestion. Added a TODO to document this as a future improvement.
GitHub's API occasionally returns transient 5xx errors or RESOURCE_LIMITS_EXCEEDED responses that would succeed on retry. These transient failures are especially frustrating when uploading a stack of PRs, as the entire operation fails and must be restarted. Retry up to 3 times with exponential backoff (1s, 2s, 4s delays).
abc7a2e to
15d821b
Compare
| if self.session: | ||
| await self.session.close() | ||
|
|
||
| async def _retry_with_backoff( |
There was a problem hiding this comment.
name this _should_retry or _should_retry_backoff to indicate what it returns. retry_with_backoff makes it seem like the function does the retrying internally or something, which it doens't
| self.session = ClientSession() | ||
|
|
||
| start_time = time.time() | ||
| # Retry config: 3 attempts with exponential backoff (1s, 2s, 4s) |
There was a problem hiding this comment.
make at least max_retries and base_delay kwargs to the function. for transient_statuses and retryable_graphql_errors make them either kwargs or global constants with frozenset
| "Ratelimit: {} remaining, resets at {}".format( | ||
| resp.headers.get("x-ratelimit-remaining"), | ||
| reset_timestamp, | ||
| for attempt in range(max_retries): |
There was a problem hiding this comment.
rather than stack the indentation here i'd rather rename this function _graphql_once and make a new function graphql() that becomes the new entry point and handles only the retrying. this would help make the function less long, and it can pass error info via the exception and catch it if needed
|
a few small comments, can you make a follow up to address them? otherwise changes look good |
Address jerry-skydio's review comments: - Rename _retry_with_backoff to _should_retry to reflect the return value - Extract transient_statuses and retryable_graphql_errors to module-level frozenset constants, make max_retries and base_delay kwargs on graphql() - Split graphql into _graphql_once (single attempt) and graphql (retry wrapper) to reduce indentation and separate concerns
Address jerry-skydio's review comments: - Rename _retry_with_backoff to _should_retry to reflect the return value - Extract transient_statuses and retryable_graphql_errors to module-level frozenset constants, make max_retries and base_delay kwargs on graphql() - Split graphql into _graphql_once (single attempt) and graphql (retry wrapper) to reduce indentation and separate concerns
Summary
GitHub's API occasionally returns transient 5xx errors or RESOURCE_LIMITS_EXCEEDED responses that would succeed on retry. These transient failures are especially frustrating when uploading a stack of PRs, as the entire operation fails and must be restarted.
This PR adds retry with exponential backoff (up to 3 attempts with 1s, 2s, 4s delays) for: