Skip to content

Conversation

@Tharsanan1
Copy link
Contributor

@Tharsanan1 Tharsanan1 commented Jan 15, 2026

$subject

Summary by CodeRabbit

Release Notes

  • New Features

    • Exposed metrics endpoints from gateway controller (port 9091) and policy engine (port 9003), providing Prometheus-format metrics for monitoring API counts and gateway performance.
  • Tests

    • Added comprehensive integration tests validating metrics endpoint accessibility, Prometheus format compliance, and API count metric updates.

✏️ Tip: You can customize this high-level summary in your review settings.

… policy-engine (#3)

* Initial plan

* Add metrics integration tests and expose metrics ports

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Fix response body reading and temporarily use normal gateway-controller image

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Revert to coverage image and finalize metrics integration tests

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Refactor metrics steps to extract helper functions and reduce duplication

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Fix test logic: clear cached response body after API creation and use consistent body reading

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Fix mutex locking consistency in getResponseBody helper

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Optimize regex compilation, extract API definition constant, and improve error logging

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Add integration tests for metrics endpoints in gateway-controller and policy-engine

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

* Revert unintended go.mod and go.sum changes - keep only test files

Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Tharsanan1 <25491225+Tharsanan1@users.noreply.github.com>
@CLAassistant
Copy link

CLAassistant commented Jan 15, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Tharsanan1
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

This pull request adds integration tests for gateway metrics endpoints. Changes include Docker Compose port mappings for metrics, a BDD feature file defining test scenarios, Go-based test step implementations for metrics validation, and suite configuration to execute the new metrics tests.

Changes

Cohort / File(s) Summary
Docker Configuration
gateway/it/docker-compose.test.yaml
Exposed metrics ports: gateway-controller port 9091 and policy-engine port 9003
Test Feature Specification
gateway/it/features/metrics.feature
New BDD feature defining four scenarios: controller/policy-engine metrics endpoint accessibility, Prometheus format validation, and API count metric tracking with updates after API creation
Test Step Implementation
gateway/it/steps_metrics.go
New step definitions implementing metrics HTTP request handling, Prometheus format parsing, API count metric extraction, and cache management for response bodies
Test Suite Configuration
gateway/it/suite_test.go
Registered metrics feature and corresponding step definitions in the test suite

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy hop, metrics now flow,
Through gateways we test, from high up to low,
Prometheus counts dancing in formats so bright,
Controller and engine, both shining so right!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty (only contains '$subject' placeholder), missing all required sections from the template such as Purpose, Goals, Approach, Documentation, and test details. Fill in the required description template sections: Purpose (why these tests are needed), Goals, Approach, Documentation impact, and details about the integration tests being added.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changeset: adding integration tests for metrics endpoints, which matches the file additions and changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@gateway/it/steps_metrics.go`:
- Around line 200-223: In the function theAPICountMetricShouldHaveIncreased,
avoid holding s.mutex while calling s.GetContextInt (which calls GetContextValue
and acquires s.mutex.RLock), as that causes a deadlock; instead call
s.GetContextInt("initial_api_count") before acquiring s.mutex (remove or move
the s.mutex.Lock()/defer Unlock() so the initialCount is read without holding
the write lock), then re-acquire the lock only for the remaining protected
operations (or drop the explicit lock entirely if not needed) so GetContextInt
does not attempt nested locking with s.mutex.Lock.
- Around line 168-182: The function iExtractCurrentAPICountFromMetrics currently
holds s.mutex while calling s.SetContextValue which also locks the same mutex
and causes a deadlock; fix it by moving the SetContextValue call out of the
critical section: lock only long enough to call getResponseBody and
parseAPICountFromMetrics (or capture any data that requires protection), store
the parsed count in a local variable, unlock the mutex, then call
s.SetContextValue("initial_api_count", count). Alternatively, if you prefer
keeping the work under one lock, add an internal helper like
setContextValueLocked that assumes the mutex is already held and call that
instead of SetContextValue; ensure you use the existing s.mutex,
iExtractCurrentAPICountFromMetrics, getResponseBody, parseAPICountFromMetrics,
and SetContextValue symbols to locate code.
🧹 Nitpick comments (4)
gateway/it/features/metrics.feature (1)

42-47: Consider adding status code verification after the second metrics request.

In "API count metric updates after API creation", the second GET request to the metrics endpoint (line 46) doesn't verify the response status code before checking the metric increase. While the metric comparison would likely fail if the request failed, an explicit status check would provide clearer error messages.

Suggested improvement
   When I create a new API via the gateway controller
   And I send a GET request to the gateway controller metrics endpoint
+  Then the response status code should be 200
-  Then the API count metric should have increased
+  And the API count metric should have increased
gateway/it/steps_metrics.go (3)

140-143: Consider using structured logging instead of fmt.Printf.

Using fmt.Printf for warnings may not integrate well with the test framework's logging. Consider using log.Printf for consistency with the rest of the codebase or the test framework's logging facilities.

Suggested change
 			val, err := strconv.Atoi(match[1])
 			if err != nil {
-				// Log parsing error but continue processing other matches
-				fmt.Printf("Warning: failed to parse metric value '%s': %v\n", match[1], err)
+				// Log parsing error but continue processing other matches
+				log.Printf("Warning: failed to parse metric value '%s': %v", match[1], err)
 				continue
 			}

Add "log" to the imports.


184-198: Test API creation doesn't verify success response.

iCreateTestAPIViaGatewayController sends a POST request but doesn't verify the response status code. If the API creation fails, the test will proceed and fail later with a misleading error about the metric count not increasing.

Suggested improvement
 func (s *TestState) iCreateTestAPIViaGatewayController(httpSteps *steps.HTTPSteps) error {
 	body := &godog.DocString{Content: testAPIDefinition}
 	httpSteps.SetHeader("Content-Type", "application/yaml")
 	if err := httpSteps.SendPOSTToService("gateway-controller", "/apis", body); err != nil {
 		return err
 	}
+
+	// Verify API creation succeeded
+	if httpSteps.LastResponse == nil || httpSteps.LastResponse.StatusCode >= 400 {
+		return fmt.Errorf("failed to create test API: status=%d", httpSteps.LastResponse.StatusCode)
+	}

 	// Clear the cached response body since we'll be reading metrics again
 	s.mutex.Lock()

Note: Adjust based on how httpSteps exposes the last response.


74-84: Hardcoded localhost URLs.

The metrics endpoint URLs are hardcoded to localhost:9091 and localhost:9003. While this works for the current Docker Compose setup, consider making these configurable via TestState.Config for consistency with other service URLs (e.g., GatewayControllerURL).

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d422545 and 1bbf5e1.

📒 Files selected for processing (4)
  • gateway/it/docker-compose.test.yaml
  • gateway/it/features/metrics.feature
  • gateway/it/steps_metrics.go
  • gateway/it/suite_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
gateway/it/suite_test.go (1)
gateway/it/steps_metrics.go (1)
  • RegisterMetricsSteps (52-72)
🔇 Additional comments (4)
gateway/it/docker-compose.test.yaml (1)

33-33: LGTM!

The new port mappings for metrics endpoints are correctly configured and align with the metrics endpoint URLs used in steps_metrics.go (:9091/metrics for gateway-controller and :9003/metrics for policy-engine).

Also applies to: 63-63

gateway/it/features/metrics.feature (1)

19-47: Well-structured BDD feature for metrics testing.

The scenarios cover essential aspects: endpoint accessibility, Prometheus format validation, and metric behavior verification. The progressive complexity from basic access checks to state-change verification is a good testing approach.

gateway/it/steps_metrics.go (1)

32-35: Good practice: Pre-compiled regex.

Pre-compiling the regex at package initialization avoids repeated compilation overhead during test execution.

gateway/it/suite_test.go (1)

75-75: LGTM!

The metrics feature and step definitions are correctly integrated into the test suite, following the same pattern as other feature registrations.

Also applies to: 225-225

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +168 to +182
// iExtractCurrentAPICountFromMetrics extracts the API count metric value
func (s *TestState) iExtractCurrentAPICountFromMetrics() error {
s.mutex.Lock()
defer s.mutex.Unlock()

bodyStr, err := s.getResponseBody()
if err != nil {
return err
}

// Parse the API count from metrics
count := parseAPICountFromMetrics(bodyStr)
s.SetContextValue("initial_api_count", count)
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for SetContextValue implementation in the repository
rg "func.*SetContextValue" -A 10 --type go

Repository: wso2/api-platform

Length of output: 609


Confirmed deadlock risk: SetContextValue acquires the same mutex while it's already held.

iExtractCurrentAPICountFromMetrics locks s.mutex and then calls s.SetContextValue, which also attempts to lock s.mutex. Since the mutex is not recursive, this will deadlock.

The fix is to either:

  • Release the mutex before calling s.SetContextValue, or
  • Move the SetContextValue call outside the critical section, or
  • Create a separate internal method that assumes the mutex is already held
🤖 Prompt for AI Agents
In `@gateway/it/steps_metrics.go` around lines 168 - 182, The function
iExtractCurrentAPICountFromMetrics currently holds s.mutex while calling
s.SetContextValue which also locks the same mutex and causes a deadlock; fix it
by moving the SetContextValue call out of the critical section: lock only long
enough to call getResponseBody and parseAPICountFromMetrics (or capture any data
that requires protection), store the parsed count in a local variable, unlock
the mutex, then call s.SetContextValue("initial_api_count", count).
Alternatively, if you prefer keeping the work under one lock, add an internal
helper like setContextValueLocked that assumes the mutex is already held and
call that instead of SetContextValue; ensure you use the existing s.mutex,
iExtractCurrentAPICountFromMetrics, getResponseBody, parseAPICountFromMetrics,
and SetContextValue symbols to locate code.

Comment on lines +200 to +223
// theAPICountMetricShouldHaveIncreased verifies the API count increased
func (s *TestState) theAPICountMetricShouldHaveIncreased() error {
s.mutex.Lock()
defer s.mutex.Unlock()

initialCount, ok := s.GetContextInt("initial_api_count")
if !ok {
return fmt.Errorf("initial API count not found in context")
}

bodyStr, err := s.getResponseBody()
if err != nil {
return err
}

// Parse the current API count from metrics
currentCount := parseAPICountFromMetrics(bodyStr)

if currentCount <= initialCount {
return fmt.Errorf("API count did not increase: initial=%d, current=%d", initialCount, currentCount)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for GetContextInt implementation
ast-grep --pattern $'func ($_ *TestState) GetContextInt($_) ($_, $_) {
  $$$
}'

Repository: wso2/api-platform

Length of output: 414


🏁 Script executed:

# Also search for iExtractCurrentAPICountFromMetrics to understand the similar pattern
rg -A 10 "iExtractCurrentAPICountFromMetrics" gateway/it/

Repository: wso2/api-platform

Length of output: 1565


🏁 Script executed:

# Search for TestState struct definition to understand the mutex field
rg "type TestState struct" gateway/it/ -A 10

Repository: wso2/api-platform

Length of output: 575


🏁 Script executed:

#!/bin/bash
# Search for GetContextValue implementation
rg "func.*GetContextValue" gateway/it/ -A 10

Repository: wso2/api-platform

Length of output: 611


Fix deadlock: release mutex before calling GetContextInt.

This method acquires s.mutex with a write lock, then calls GetContextInt, which internally calls GetContextValue that tries to acquire s.mutex.RLock(). Attempting to acquire a read lock while holding a write lock will cause a deadlock in the same goroutine.

Either retrieve the value before locking the mutex, or refactor to avoid nested locking:

// Option 1: Retrieve value before locking
initialCount, ok := s.GetContextInt("initial_api_count")
if !ok {
    return fmt.Errorf("initial API count not found in context")
}

s.mutex.Lock()
defer s.mutex.Unlock()
// ... rest of the logic
🤖 Prompt for AI Agents
In `@gateway/it/steps_metrics.go` around lines 200 - 223, In the function
theAPICountMetricShouldHaveIncreased, avoid holding s.mutex while calling
s.GetContextInt (which calls GetContextValue and acquires s.mutex.RLock), as
that causes a deadlock; instead call s.GetContextInt("initial_api_count") before
acquiring s.mutex (remove or move the s.mutex.Lock()/defer Unlock() so the
initialCount is read without holding the write lock), then re-acquire the lock
only for the remaining protected operations (or drop the explicit lock entirely
if not needed) so GetContextInt does not attempt nested locking with
s.mutex.Lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants