Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions gateway/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ require (
)

require (
github.com/ProjectZKM/Ziren/crates/go-runtime/zkvm_runtime v0.0.0-20251001021608-1fe7b43fc4d6 // indirect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is unrelated to graceful shutdown functionality and is not used anywhere in the codebase. This should be removed from this PR as it:

  1. Adds unnecessary bloat to the project
  2. Could introduce security vulnerabilities or supply chain risks
  3. Makes code review more difficult by mixing unrelated changes

Verified with: grep -r "zkvm_runtime" --include="*.go" . (no matches found)

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/go.mod
Line: 14:14

Comment:
This dependency is unrelated to graceful shutdown functionality and is not used anywhere in the codebase. This should be removed from this PR as it:

1. Adds unnecessary bloat to the project
2. Could introduce security vulnerabilities or supply chain risks
3. Makes code review more difficult by mixing unrelated changes

Verified with: `grep -r "zkvm_runtime" --include="*.go" .` (no matches found)

How can I resolve this? If you propose a fix, please make it concise.

github.com/bytedance/sonic v1.14.0 // indirect
github.com/bytedance/sonic/loader v0.3.0 // indirect
github.com/cloudwego/base64x v0.1.6 // indirect
Expand Down
48 changes: 46 additions & 2 deletions gateway/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ import (
"strings"
"sync"
"time"
"os/signal"
"syscall"
Comment on lines +23 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟑 Minor

Fix indentation to use tabs instead of spaces.

The import statements appear to use spaces for indentation instead of tabs. Run gofmt or go fmt to ensure consistent formatting with Go standards.

πŸ”Ž Fix formatting
-    "os/signal"
-    "syscall"
+	"os/signal"
+	"syscall"
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"os/signal"
"syscall"
"os/signal"
"syscall"
πŸ€– Prompt for AI Agents
In @gateway/main.go around lines 20-21, The import block containing "os/signal"
and "syscall" in gateway/main.go uses space indentation instead of tabs; run
gofmt (or go fmt) to reformat the file so the import statements and surrounding
code use Go's standard tab indentation, or manually replace the leading spaces
with tabs in the import block so the "os/signal" and "syscall" lines align with
the rest of the imports.

Comment on lines +23 to +24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import formatting is incorrect - these imports have leading spaces and should align with the other imports above.

Suggested change
"os/signal"
"syscall"
"os/signal"
"syscall"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 23:24

Comment:
Import formatting is incorrect - these imports have leading spaces and should align with the other imports above.

```suggestion
	"os/signal"
	"syscall"
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.


"github.com/ethereum/go-ethereum/crypto"
"github.com/gin-contrib/cors"
Expand Down Expand Up @@ -157,6 +159,7 @@ func main() {
// deadline; the middleware implementation always uses the earliest
// deadline when nested timeouts are present to avoid surprising behavior.
r.Use(RequestTimeoutMiddleware(getRequestTimeout()))
r.Use(TrackInFlightRequests())

// Health check with shorter timeout (2s)
r.GET("/healthz", RequestTimeoutMiddleware(getHealthCheckTimeout()), handleHealth)
Expand Down Expand Up @@ -187,8 +190,49 @@ func main() {
port = "3000"
}

log.Printf("Go Gateway running on port %s", port)
r.Run(":" + port)
addr := ":" + port

srv := &http.Server{
Addr: addr,
Handler: r,
ReadHeaderTimeout: 5 * time.Second,
ReadTimeout: 10 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 120 * time.Second,
}


go func() {
log.Printf("[INFO] Gateway listening on %s", addr)
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("[FATAL] listen error: %v", err)
}
}()
Comment on lines +205 to +210
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

Critical: log.Fatalf in goroutine prevents graceful shutdown.

Using log.Fatalf (which calls os.Exit()) inside the server goroutine will immediately terminate the program if ListenAndServe returns an error, bypassing all graceful shutdown logic on lines 186-201. For example, if the port is already in use, the program exits without cleanup.

πŸ”Ž Proposed fix using error channel
+	errChan := make(chan error, 1)
+
 	go func() {
 		log.Printf("[INFO] Gateway listening on %s", addr)
-		if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
-			log.Fatalf("[FATAL] listen error: %v", err)
-		}
+		errChan <- srv.ListenAndServe()
 	}()
 
 	// ---- Graceful shutdown ----
 	quit := make(chan os.Signal, 1)
 	signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
 
-	<-quit
-	log.Println("[INFO] Shutdown signal received, draining connections...")
+	select {
+	case err := <-errChan:
+		if err != nil && err != http.ErrServerClosed {
+			log.Fatalf("[FATAL] Server failed to start: %v", err)
+		}
+		return
+	case <-quit:
+		log.Println("[INFO] Shutdown signal received, draining connections...")
+	}
 
 	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
 	defer cancel()
 
 	if err := srv.Shutdown(ctx); err != nil {
 		log.Printf("[ERROR] Server forced to shutdown: %v", err)
 	} else {
 		log.Println("[OK] Server shutdown completed")
 	}
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
log.Printf("[INFO] Gateway listening on %s", addr)
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("[FATAL] listen error: %v", err)
}
}()
errChan := make(chan error, 1)
go func() {
log.Printf("[INFO] Gateway listening on %s", addr)
errChan <- srv.ListenAndServe()
}()
// ---- Graceful shutdown ----
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
select {
case err := <-errChan:
if err != nil && err != http.ErrServerClosed {
log.Fatalf("[FATAL] Server failed to start: %v", err)
}
return
case <-quit:
log.Println("[INFO] Shutdown signal received, draining connections...")
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}
πŸ€– Prompt for AI Agents
In @gateway/main.go around lines 179-184, The goroutine currently calls
log.Fatalf on srv.ListenAndServe errors which exits immediately and skips the
graceful shutdown logic around the main shutdown handling; instead remove
log.Fatalf and propagate the error to the main goroutine via an error channel
(e.g., create errCh before launching the goroutine), have the goroutine send any
non-nil, non-http.ErrServerClosed error into errCh after logging, and let the
main select/receive on errCh alongside the existing shutdown signals so the main
goroutine performs the cleanup and graceful shutdown when ListenAndServe fails
on srv (address `addr`) rather than calling os.Exit inside the goroutine.


// ---- Graceful shutdown ----
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)

<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")

active := GetActiveRequestCount()
if active > 0 {
log.Printf("[INFO] Waiting for %d in-flight request(s)...", active)
WaitForInFlightRequests()
log.Println("[INFO] All in-flight requests completed")
}
Comment on lines +219 to +224
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: Redundant and race-prone shutdown logic

This manual request tracking and waiting is both redundant and architecturally incorrect:

  1. http.Server.Shutdown() already waits for active connections - From Go docs: "Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down." So the call on line 229 will wait anyway.

  2. Race condition: New requests can arrive between line 222 (when WaitForInFlightRequests returns) and line 229 (when srv.Shutdown is called). The server is still accepting connections during this window.

  3. Misleading logs: Line 223 logs "All in-flight requests completed" but the server is still running and accepting new requests until line 229.

  4. No timeout: WaitForInFlightRequests() blocks forever if a request hangs. The 30-second timeout on line 226 only applies to srv.Shutdown(), not this manual wait.

Correct approach: Remove lines 219-224 entirely and rely solely on srv.Shutdown() to wait for active connections, OR redesign to stop accepting new connections before waiting (which is what Shutdown already does).

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 219:224

Comment:
**CRITICAL: Redundant and race-prone shutdown logic**

This manual request tracking and waiting is both redundant and architecturally incorrect:

1. **`http.Server.Shutdown()` already waits for active connections** - From Go docs: "Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down." So the call on line 229 will wait anyway.

2. **Race condition**: New requests can arrive between line 222 (when WaitForInFlightRequests returns) and line 229 (when srv.Shutdown is called). The server is still accepting connections during this window.

3. **Misleading logs**: Line 223 logs "All in-flight requests completed" but the server is still running and accepting new requests until line 229.

4. **No timeout**: `WaitForInFlightRequests()` blocks forever if a request hangs. The 30-second timeout on line 226 only applies to `srv.Shutdown()`, not this manual wait.

**Correct approach**: Remove lines 219-224 entirely and rely solely on `srv.Shutdown()` to wait for active connections, OR redesign to stop accepting new connections before waiting (which is what Shutdown already does).

How can I resolve this? If you propose a fix, please make it concise.


ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}

Comment on lines +213 to +234
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition: wait for in-flight requests after stopping new connections.

The current sequence waits for in-flight requests (lines 219-223) before calling srv.Shutdown() (line 229), which means the server continues accepting new connections during the wait. This creates a race:

  1. Line 219 checks active count (e.g., 1 request)
  2. A new request arrives and is accepted by the still-running server
  3. Lines 220-223 wait for the original request to complete
  4. Line 229 calls srv.Shutdown(), which must now wait for the new request

The standard graceful shutdown pattern is to call srv.Shutdown() firstβ€”it stops accepting new connections and waits for existing ones to complete (up to the timeout). The manual WaitForInFlightRequests() wait before shutdown undermines this.

πŸ”§ Recommended fix

Either remove the manual wait and rely on srv.Shutdown() alone:

 	<-quit
 	log.Println("[INFO] Shutdown signal received, draining connections...")
 
-	active := GetActiveRequestCount()
-	if active > 0 {
-		log.Printf("[INFO] Waiting for %d in-flight request(s)...", active)
-		WaitForInFlightRequests()
-		log.Println("[INFO] All in-flight requests completed")
-	}
-
 	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
 	defer cancel()
 
 	if err := srv.Shutdown(ctx); err != nil {
 		log.Printf("[ERROR] Server forced to shutdown: %v", err)
 	} else {
 		log.Println("[OK] Server shutdown completed")
 	}

Or, if logging the active count is important, just log it without the manual wait:

 	<-quit
 	log.Println("[INFO] Shutdown signal received, draining connections...")
 
 	active := GetActiveRequestCount()
 	if active > 0 {
 		log.Printf("[INFO] %d in-flight request(s) detected, waiting for completion...", active)
-		WaitForInFlightRequests()
-		log.Println("[INFO] All in-flight requests completed")
 	}
 
 	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
 	defer cancel()
 
 	if err := srv.Shutdown(ctx); err != nil {
 		log.Printf("[ERROR] Server forced to shutdown: %v", err)
 	} else {
 		log.Println("[OK] Server shutdown completed")
 	}

srv.Shutdown() internally waits for connections to idle, making the explicit WaitForInFlightRequests() redundant and potentially harmful.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")
active := GetActiveRequestCount()
if active > 0 {
log.Printf("[INFO] Waiting for %d in-flight request(s)...", active)
WaitForInFlightRequests()
log.Println("[INFO] All in-flight requests completed")
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}
πŸ€– Prompt for AI Agents
In @gateway/main.go around lines 213 - 234, The shutdown sequence currently
calls GetActiveRequestCount() and WaitForInFlightRequests() before
srv.Shutdown(), which allows the server to keep accepting new connections and
creates a race; fix by removing the manual wait and invoking srv.Shutdown(ctx)
first (so the server stops accepting new connections and waits for in-flight
requests), and if you need visibility keep the GetActiveRequestCount() log but
do not call WaitForInFlightRequests() prior to srv.Shutdown(); ensure you still
use context.WithTimeout(...) and handle the error from srv.Shutdown(ctx) as
before.


}

// handleSummarize handles POST /api/ai/summarize requests. It validates
Expand Down
38 changes: 38 additions & 0 deletions gateway/request_tracker.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package main

import (
"sync"
"sync/atomic"

"github.com/gin-gonic/gin"
)

var (
activeRequestsWG sync.WaitGroup
activeRequestCnt int64
)
Comment on lines +10 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Package-level state prevents test isolation and concurrent execution.

The package-level activeRequestsWG and activeRequestCnt variables create shared mutable state that:

  1. Cannot be reset between test runs, causing test interference
  2. Prevents parallel test execution (go test -parallel)
  3. Makes the module non-reusable if multiple instances are needed

Consider refactoring to use a struct-based approach:

type RequestTracker struct {
    wg  sync.WaitGroup
    cnt int64
}

func NewRequestTracker() *RequestTracker {
    return &RequestTracker{}
}

func (rt *RequestTracker) Middleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        rt.wg.Add(1)
        atomic.AddInt64(&rt.cnt, 1)
        defer func() {
            atomic.AddInt64(&rt.cnt, -1)
            rt.wg.Done()
        }()
        c.Next()
    }
}

func (rt *RequestTracker) Wait() {
    rt.wg.Wait()
}

func (rt *RequestTracker) Count() int64 {
    return atomic.LoadInt64(&rt.cnt)
}

This allows each test to create its own isolated RequestTracker instance.

Comment on lines +10 to +13
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global state with no reset mechanism creates issues in tests and server restarts

These global variables are never reset, which causes problems:

  1. Test pollution: If multiple tests create servers (like in shutdown_test.go), the counters persist between tests, causing incorrect counts
  2. Server restart issues: If the server is stopped and restarted within the same process, the counts will be wrong
  3. The test in shutdown_test.go only works by accident because it's the only test, but running multiple shutdown tests would fail

Better approach: Either make these per-server instance variables, or provide a reset function for tests. Example:

// For tests
func ResetRequestTracking() {
    activeRequestsWG = sync.WaitGroup{}
    atomic.StoreInt64(&activeRequestCnt, 0)
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/request_tracker.go
Line: 10:13

Comment:
**Global state with no reset mechanism creates issues in tests and server restarts**

These global variables are never reset, which causes problems:

1. **Test pollution**: If multiple tests create servers (like in shutdown_test.go), the counters persist between tests, causing incorrect counts
2. **Server restart issues**: If the server is stopped and restarted within the same process, the counts will be wrong
3. **The test in shutdown_test.go only works by accident** because it's the only test, but running multiple shutdown tests would fail

**Better approach**: Either make these per-server instance variables, or provide a reset function for tests. Example:
```go
// For tests
func ResetRequestTracking() {
    activeRequestsWG = sync.WaitGroup{}
    atomic.StoreInt64(&activeRequestCnt, 0)
}
```

How can I resolve this? If you propose a fix, please make it concise.


// TrackInFlightRequests tracks active HTTP requests.
func TrackInFlightRequests() gin.HandlerFunc {
return func(c *gin.Context) {
activeRequestsWG.Add(1)
atomic.AddInt64(&activeRequestCnt, 1)

defer func() {
atomic.AddInt64(&activeRequestCnt, -1)
activeRequestsWG.Done()
}()

c.Next()
}
}

// WaitForInFlightRequests blocks until all active requests finish.
func WaitForInFlightRequests() {
activeRequestsWG.Wait()
}

// GetActiveRequestCount returns the current number of active requests.
func GetActiveRequestCount() int64 {
return atomic.LoadInt64(&activeRequestCnt)
}
67 changes: 67 additions & 0 deletions gateway/shutdown_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package main

import (
"context"
"net/http"
"net/http/httptest"
"testing"
"time"

"github.com/gin-gonic/gin"
)
func TestGracefulShutdown_WaitsForInFlightRequests(t *testing.T) {
gin.SetMode(gin.TestMode)

r := gin.New()
r.Use(TrackInFlightRequests())

// Simulate slow handler
r.GET("/slow", func(c *gin.Context) {
time.Sleep(200 * time.Millisecond)
c.Status(http.StatusOK)
})

srv := &http.Server{
Handler: r,
}

// Start test server
ln := httptest.NewUnstartedServer(r)
ln.Config = srv
ln.Start()
defer ln.Close()
Comment on lines +24 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

Test server setup is incorrect.

The test mixes httptest.NewUnstartedServer with a manually created http.Server, but this doesn't work as intended:

  1. httptest.NewUnstartedServer(r) creates its own http.Server internally
  2. Assigning ln.Config = srv replaces the httptest server's config, but srv is still not the actively listening server
  3. Later on line 54, srv.Shutdown(ctx) attempts to shut down srv, but srv was never started via srv.ListenAndServe() β€” the httptest server ln is what's actually running
  4. This means srv.Shutdown() likely returns immediately without actually shutting down the test server
πŸ› οΈ Proposed fix

Use the httptest server directly without creating a separate http.Server:

-	srv := &http.Server{
-		Handler: r,
-	}
-
 	// Start test server
-	ln := httptest.NewUnstartedServer(r)
-	ln.Config = srv
-	ln.Start()
-	defer ln.Close()
+	ts := httptest.NewServer(r)
+	defer ts.Close()

Then update the shutdown logic:

-	// Shutdown server
-	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
-	defer cancel()
-
-	start := time.Now()
-	if err := srv.Shutdown(ctx); err != nil {
-		t.Fatalf("shutdown failed: %v", err)
-	}
+	// Close the test server (httptest.Server doesn't support graceful shutdown)
+	start := time.Now()
+	ts.Close()

Note: httptest.Server doesn't expose graceful shutdown. For testing graceful shutdown behavior, you need to start a real http.Server with srv.ListenAndServe() on a chosen port or use net.Listen to get a listener, then pass it to srv.Serve(listener).

Committable suggestion skipped: line range outside the PR's diff.

πŸ€– Prompt for AI Agents
In @gateway/shutdown_test.go around lines 24 - 32, The test incorrectly mixes an
httptest.Server and a manually created http.Server: stop using srv as the
running server because httptest.NewUnstartedServer(r) creates and runs its own
server (ln) so calling srv.Shutdown(ctx) is a no-op; either use the httptest
server API (remove srv, start ln with ln.Start() and call ln.Close() /
ln.CloseClientConnections() in the teardown) or replace the httptest helper with
a real http.Server started via srv.ListenAndServe() (or srv.Serve(listener) on a
net.Listener started in a goroutine) and then call srv.Shutdown(ctx) to test
graceful shutdown; update references to srv and ln accordingly (symbols: srv,
ln, httptest.NewUnstartedServer, srv.Shutdown, ln.Close).


// Make request in background
done := make(chan struct{})
go func() {
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
t.Errorf("request failed: %v", err)
return
}
resp.Body.Close()
close(done)
Comment on lines +36 to +43
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should verify that the request completed successfully by checking the response status code, not just that it didn't error. A connection could be closed mid-request and still not return an error, but the response would be incomplete.

Add verification:

Suggested change
go func() {
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
t.Errorf("request failed: %v", err)
return
}
resp.Body.Close()
close(done)
go func() {
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
t.Errorf("request failed: %v", err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
close(done)
}()
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 36:43

Comment:
The test should verify that the request completed successfully by checking the response status code, not just that it didn't error. A connection could be closed mid-request and still not return an error, but the response would be incomplete.

Add verification:
```suggestion
go func() {
	resp, err := http.Get(ln.URL + "/slow")
	if err != nil {
		t.Errorf("request failed: %v", err)
		return
	}
	defer resp.Body.Close()
	if resp.StatusCode != http.StatusOK {
		t.Errorf("expected status 200, got %d", resp.StatusCode)
	}
	close(done)
}()
```

How can I resolve this? If you propose a fix, please make it concise.

}()
Comment on lines +36 to +44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

Unsafe use of t.Errorf in goroutine.

Line 39 calls t.Errorf from within a goroutine. The testing.T methods are not safe for concurrent use, and calling them from goroutines can cause data races or undefined behavior.

πŸ”’ Proposed fix

Capture the error in the goroutine and check it in the main test flow:

 	// Make request in background
-	done := make(chan struct{})
+	done := make(chan error)
 	go func() {
 		resp, err := http.Get(ln.URL + "/slow")
 		if err != nil {
-			t.Errorf("request failed: %v", err)
-			return
+			done <- err
+			return
 		}
 		resp.Body.Close()
-		close(done)
+		done <- nil
 	}()
 
 	// Give request time to start
 	time.Sleep(50 * time.Millisecond)
 
 	// Shutdown server
 	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
 	defer cancel()
 
 	start := time.Now()
 	if err := srv.Shutdown(ctx); err != nil {
 		t.Fatalf("shutdown failed: %v", err)
 	}
 
 	WaitForInFlightRequests()
 	elapsed := time.Since(start)
 
-	<-done
+	if err := <-done; err != nil {
+		t.Fatalf("request failed: %v", err)
+	}

Committable suggestion skipped: line range outside the PR's diff.

πŸ€– Prompt for AI Agents
In @gateway/shutdown_test.go around lines 36 - 44, The goroutine is calling
t.Errorf directly (unsafe) when http.Get fails; instead capture the error and
report it from the main test goroutine: have the goroutine send the error (or
nil) over the done channel or a new err channel (created alongside done) after
calling http.Get and closing resp.Body, and in the main test routine receive
that error and call t.Errorf or t.Fatal there; update the anonymous goroutine
that uses ln.URL+"/slow" and the test's receive logic to propagate and assert
the error in the main goroutine rather than calling t.Errorf inside the
goroutine.


// Give request time to start
time.Sleep(50 * time.Millisecond)
Comment on lines +46 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sleep-based synchronization creates a race condition.

The 50ms sleep (line 47) assumes the background request will have started by then, but this is not guaranteed. The request might:

  1. Not have started yet (slow goroutine scheduling)
  2. Have already completed (if it runs faster than expected)

This makes the test flaky.

⏱️ Proposed fix using proper synchronization

Use a channel to signal when the request has actually started:

 	// Make request in background
-	done := make(chan struct{})
+	requestStarted := make(chan struct{})
+	requestDone := make(chan struct{})
+	
 	go func() {
+		// Signal that we're about to make the request
+		close(requestStarted)
 		resp, err := http.Get(ln.URL + "/slow")
 		if err != nil {
 			t.Errorf("request failed: %v", err)
 			return
 		}
 		resp.Body.Close()
-		close(done)
+		close(requestDone)
 	}()
 
-	// Give request time to start
-	time.Sleep(50 * time.Millisecond)
+	// Wait for request to actually start
+	<-requestStarted
+	// Give it a moment to enter the handler
+	time.Sleep(10 * time.Millisecond)

Or better yet, modify the slow handler to signal when it starts:

+	handlerStarted := make(chan struct{})
 	r.GET("/slow", func(c *gin.Context) {
+		close(handlerStarted)
 		time.Sleep(200 * time.Millisecond)
 		c.Status(http.StatusOK)
 	})
 	
 	// ... later ...
 	
-	// Give request time to start
-	time.Sleep(50 * time.Millisecond)
+	// Wait for handler to start
+	<-handlerStarted

Committable suggestion skipped: line range outside the PR's diff.

πŸ€– Prompt for AI Agents
In @gateway/shutdown_test.go around lines 46 - 47, The test uses time.Sleep(50 *
time.Millisecond) to wait for the background request which creates a race;
replace this ad-hoc sleep with proper synchronization by modifying the slow
handler in shutdown_test.go to signal when it actually begins (e.g., send on a
started chan in the handler) and have the test wait on that channel instead of
sleeping, then let the handler continue (or block on another channel) so the
test can deterministically trigger shutdown and observe behavior; ensure
channels are closed or signaled to avoid goroutine leaks.

Comment on lines +46 to +47
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: Sleep-based synchronization is unreliable

Using time.Sleep(50 * time.Millisecond) to "give request time to start" is a classic test race condition:

  • On slow systems (CI, loaded machines), 50ms may not be enough and the test will fail
  • On fast systems, the request might already be complete by then
  • This makes tests flaky and unpredictable

Better approach: Use proper synchronization with channels or wait for the request to actually be in-flight. Example:

started := make(chan struct{})
go func() {
    // Signal when request enters the handler
    close(started)
    resp, err := http.Get(ln.URL + "/slow")
    // ...
}()
<-started // Wait for request to actually start

Or check GetActiveRequestCount() > 0 in a loop with timeout instead of sleeping.

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 46:47

Comment:
**Race condition: Sleep-based synchronization is unreliable**

Using `time.Sleep(50 * time.Millisecond)` to "give request time to start" is a classic test race condition:
- On slow systems (CI, loaded machines), 50ms may not be enough and the test will fail
- On fast systems, the request might already be complete by then
- This makes tests flaky and unpredictable

**Better approach**: Use proper synchronization with channels or wait for the request to actually be in-flight. Example:
```go
started := make(chan struct{})
go func() {
    // Signal when request enters the handler
    close(started)
    resp, err := http.Get(ln.URL + "/slow")
    // ...
}()
<-started // Wait for request to actually start
```

Or check `GetActiveRequestCount() > 0` in a loop with timeout instead of sleeping.

How can I resolve this? If you propose a fix, please make it concise.


// Shutdown server
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()

start := time.Now()
if err := srv.Shutdown(ctx); err != nil {
t.Fatalf("shutdown failed: %v", err)
}

WaitForInFlightRequests()
elapsed := time.Since(start)
Comment on lines +53 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test doesn't actually verify what it claims to test

This test has a fundamental flaw: srv.Shutdown(ctx) on line 54 already waits for active connections to complete (this is documented behavior of http.Server.Shutdown). So by the time line 58 calls WaitForInFlightRequests(), the request has already finished.

The test is measuring that http.Server.Shutdown() works (which is a given), not that our middleware tracking works correctly.

What should be tested instead:

  1. Verify that GetActiveRequestCount() returns the correct count WHILE requests are in-flight
  2. Test that requests complete successfully (check response status code)
  3. Test the interaction between the tracking middleware and actual shutdown

Current test: Measures Shutdown() waiting for requests βœ“ (not our code)
Should test: Our tracking middleware correctly counts requests βœ— (not tested)

Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 53:59

Comment:
**Test doesn't actually verify what it claims to test**

This test has a fundamental flaw: `srv.Shutdown(ctx)` on line 54 already waits for active connections to complete (this is documented behavior of http.Server.Shutdown). So by the time line 58 calls `WaitForInFlightRequests()`, the request has already finished.

The test is measuring that `http.Server.Shutdown()` works (which is a given), not that our middleware tracking works correctly.

**What should be tested instead:**
1. Verify that `GetActiveRequestCount()` returns the correct count WHILE requests are in-flight
2. Test that requests complete successfully (check response status code)
3. Test the interaction between the tracking middleware and actual shutdown

Current test: Measures Shutdown() waiting for requests βœ“ (not our code)
Should test: Our tracking middleware correctly counts requests βœ— (not tested)

How can I resolve this? If you propose a fix, please make it concise.


<-done

// Assert shutdown waited for request
if elapsed < 200*time.Millisecond {
t.Fatalf("shutdown did not wait for in-flight request")
}
}
Loading