Skip to content

Comments

Fix: Add circuit breaker and safe exponential retry for OpenRouter#114

Open
navin-oss wants to merge 3 commits intoAnkanMisra:mainfrom
navin-oss:feature/openrouter-resilience
Open

Fix: Add circuit breaker and safe exponential retry for OpenRouter#114
navin-oss wants to merge 3 commits intoAnkanMisra:mainfrom
navin-oss:feature/openrouter-resilience

Conversation

@navin-oss
Copy link

@navin-oss navin-oss commented Feb 4, 2026

Closes #101

Adds resilience to OpenRouter calls:

• Circuit breaker using sony/gobreaker
• Exponential backoff retries (3 attempts)
• Reset request body using req.GetBody() before retries
• Closed resp.Body before retry to prevent connection leaks
• Context-aware backoff (no blocking sleep)
• Avoid retrying 4xx responses
• Safe type assertions in main.go to prevent panics

Implements all requested review changes and fixes production issues identified by Greptile and CodeRabbit.

Summary by CodeRabbit

  • New Features

    • Implemented circuit breaker protection for AI service calls, returning 503 Service Unavailable when activated.
    • Added automatic retry logic with exponential backoff (up to 3 retries) for failed requests.
  • Bug Fixes

    • Improved error handling to map service errors to specific HTTP status codes.
  • Chores

    • Updated .gitignore to exclude package-lock.json files.

Greptile Overview

Greptile Summary

This PR adds resilience around OpenRouter calls by introducing a global circuit breaker (sony/gobreaker) and an exponential backoff retry helper that resets request bodies and closes response bodies before retrying. The summarize handler now treats an open circuit as a 503 while keeping existing 504 timeout handling.

Key merge blockers:

  • callOpenRouter performs an unchecked interface{} -> *http.Response assertion on the circuit breaker return value, which can panic and crash the gateway if the callback ever returns an unexpected type / nil.
  • The retry helper discards the underlying transport error and final response context, always returning a generic error after exhausting attempts, which breaks observability and makes failures indistinguishable in logs/metrics.

Confidence Score: 2/5

  • This PR should not be merged until the panic risk and retry error-loss issues are fixed.
  • Score reflects two clear merge blockers: an unchecked type assertion on the circuit breaker result that can crash the gateway at runtime, and retry logic that discards the last transport error / final status context, making failures opaque and hard to operate.
  • gateway/main.go, gateway/openrouter_retry.go

Important Files Changed

Filename Overview
.gitignore Adds package-lock.json to ignored files; no functional code impact.
gateway/go.mod Adds sony/gobreaker dependency for circuit breaker; otherwise unchanged.
gateway/go.sum Updates module sums for sony/gobreaker; no runtime logic changes.
gateway/openrouter_retry.go Introduces retry with exponential backoff and body reset; currently returns a generic error on final failure and doesn't preserve last HTTP error/response status context.
gateway/openrouter_breaker.go Adds global OpenRouter circuit breaker; uses init() to set settings (trip after 3 consecutive failures).
gateway/main.go Wraps OpenRouter call with circuit breaker + retry and adds 503 handling; introduces unsafe type assertion on circuit breaker result and changes error handling behavior for non-2xx responses.

Sequence Diagram

sequenceDiagram
  autonumber
  participant C as Client
  participant G as Gateway (gin)
  participant CB as CircuitBreaker
  participant R as RetryHelper
  participant OR as OpenRouter API

  C->>G: POST /api/ai/summarize (text + payment headers)
  G->>G: verifyPayment()
  alt payment invalid
    G-->>C: 402/403/400
  else payment valid
    G->>CB: Execute(doRequestWithRetry(req))
    alt Circuit open
      CB-->>G: ErrOpenState
      G-->>C: 503 Service Unavailable
    else Circuit closed
      CB->>R: doRequestWithRetry(req)
      loop up to 3 attempts
        R->>R: reset req.Body via req.GetBody()
        R->>OR: HTTP POST /chat/completions
        alt 2xx/4xx
          OR-->>R: resp (Status<500)
          R-->>CB: resp
        else 5xx or network error
          OR-->>R: resp 5xx / error
          R->>R: close resp.Body (if present)
          R->>R: backoff (context-aware)
        end
      end
      CB-->>G: resp
      G->>G: decode JSON choices[0].message.content
      G-->>C: 200 {result} + X-402-Receipt
    end
  end
Loading

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

Context used:

  • Context from dashboard - AGENTS.md (source)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

The PR implements circuit breaker and retry mechanisms for OpenRouter AI service calls. A new dependency (gobreaker) is added, circuit breaker logic is initialized with failure thresholds, retry logic with exponential backoff is implemented, and main.go is updated to integrate both patterns and return appropriate HTTP status codes for different failure scenarios.

Changes

Cohort / File(s) Summary
Dependencies
.gitignore, gateway/go.mod
Added package-lock.json to ignore patterns and introduced github.com/sony/gobreaker v1.0.0 dependency.
Circuit Breaker Implementation
gateway/openrouter_breaker.go
New module initializing a circuit breaker with 3 consecutive failure threshold, 5 max concurrent requests, and 30s timeout.
Retry Logic Implementation
gateway/openrouter_retry.go
New function implementing HTTP request retry with exponential backoff (200ms base, max 3 retries), context-aware sleep, and automatic body reset for retryable requests.
Integration
gateway/main.go
Wrapped OpenRouter calls with circuit breaker execution, added 503 response when circuit is open, improved error handling to return 504 for timeouts and 500 for generic failures, introduced VerifyResponseInternal type.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gateway
    participant CircuitBreaker
    participant RetryHandler
    participant OpenRouter

    Client->>Gateway: POST /summarize (with request body)
    Gateway->>CircuitBreaker: Check state
    
    alt Circuit Open
        CircuitBreaker-->>Gateway: ErrOpenState
        Gateway-->>Client: 503 Service Unavailable
    else Circuit Closed/HalfOpen
        CircuitBreaker->>RetryHandler: Execute request
        loop Retry Logic (up to 3 attempts)
            RetryHandler->>OpenRouter: HTTP POST
            alt 4xx Response
                OpenRouter-->>RetryHandler: 4xx error
                RetryHandler-->>CircuitBreaker: Return error
            else 5xx Response
                OpenRouter-->>RetryHandler: 5xx error
                RetryHandler->>RetryHandler: Exponential backoff sleep
                RetryHandler->>OpenRouter: Retry request
            else 2xx/3xx Response
                OpenRouter-->>RetryHandler: Success
                RetryHandler-->>CircuitBreaker: Return response
            end
        end
        CircuitBreaker-->>Gateway: Response or error
        alt Success
            Gateway-->>Client: 200 with AI summary
        else Timeout
            Gateway-->>Client: 504 Gateway Timeout
        else Other Failure
            Gateway-->>Client: 500 Internal Server Error
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #67: Modifies same functions (callOpenRouter/handleSummarize) to add X-Correlation-ID propagation alongside the circuit breaker and retry logic integration.
  • PR #36: Modifies AI request/handling code in callOpenRouter and handleSummarize to change timeout and error handling behavior.

Suggested labels

Medium, SWoC26

Suggested reviewers

  • AnkanMisra

Poem

🐰 A circuit breaks when failures stack,
Retries bring the service back!
With exponential grace we wait,
No more spam—just wise fate. ✨
The Gateway now stands tall and strong! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding circuit breaker and exponential retry mechanisms for OpenRouter API calls.
Linked Issues check ✅ Passed The PR implements all acceptance criteria from issue #101: circuit breaker with gobreaker dependency, 503 response on open state, exponential backoff retries up to 3 attempts, and automatic recovery.
Out of Scope Changes check ✅ Passed All code changes are directly related to resilience improvements for OpenRouter calls as specified in issue #101; only ancillary change is adding package-lock.json to .gitignore.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 566 to 569
}

resp := result.(*http.Response)
defer resp.Body.Close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe type assertion

openRouterCB.Execute returns interface{}; resp := result.(*http.Response) will panic if the callback ever returns a non-*http.Response (including a nil interface). This is a merge-blocker because it turns transient upstream failures into a gateway crash. Guard the assertion (check ok and non-nil) and return a normal error if the type is unexpected.

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 566:569

Comment:
**Unsafe type assertion**

`openRouterCB.Execute` returns `interface{}`; `resp := result.(*http.Response)` will panic if the callback ever returns a non-`*http.Response` (including a `nil` interface). This is a merge-blocker because it turns transient upstream failures into a gateway crash. Guard the assertion (check `ok` and non-nil) and return a normal error if the type is unexpected.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
gateway/main.go (1)

80-110: ⚠️ Potential issue | 🟠 Major

Fix code formatting and resolve test failures before merging.

The gateway code requires formatting and has failing tests:

  1. Apply formatting: Run cd gateway && go fmt ./... to format the code (formatting issues detected)
  2. Verify linting: go vet ./... passes
  3. Fix failing tests:
    • TestCallOpenRouter_RespectsContextTimeout
    • TestHandleSummarize_AIRequestTimeoutReturns504

Ensure all tests pass with cd gateway && go test -v ./... before submitting.

🤖 Fix all issues with AI agents
In `@gateway/main.go`:
- Line 31: Update callOpenRouter and its caller to treat
gobreaker.ErrTooManyRequests the same as gobreaker.ErrOpenState and map both
cases to a rejection HTTP status (e.g., 503); replace direct error equality
checks with errors.Is(err, gobreaker.ErrOpenState) and errors.Is(err,
gobreaker.ErrTooManyRequests) to handle sentinel errors robustly. Specifically,
in the function callOpenRouter and the calling code that currently checks for
ErrOpenState, add an OR branch to check errors.Is(...,
gobreaker.ErrTooManyRequests) and return the same rejection handling/path
(status and message) as for ErrOpenState.

In `@gateway/openrouter_retry.go`:
- Around line 26-40: After the call to http.DefaultClient.Do(req) in
gateway/openrouter_retry.go, ensure response bodies are closed when Do returns a
non-nil resp alongside an error to avoid leaked connections; specifically, after
the call to http.DefaultClient.Do (the resp, err variables), add logic that if
err != nil && resp != nil then call resp.Body.Close() before handling/returning
the error or retrying, keeping the existing successful-path resp.Body.Close()
for 5xx cases so you don't close the same body twice.
🧹 Nitpick comments (1)
gateway/go.mod (1)

12-12: Consider upgrading gobreaker to a newer version.

No known security advisories found for v1.0.0; however, this version is significantly outdated. The latest version is v2.4.0 (released January 1, 2026). While v1.0.0 is safe to use, consider upgrading to v2.4.0 for access to recent bug fixes and improvements.

@AnkanMisra
Copy link
Owner

@navin-oss might take a look at the conflicts and put your branch latest with my main branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Circuit Breaker and Retries for AI Provider

2 participants