Skip to content

Conversation

@VirajSalaka
Copy link
Contributor

@VirajSalaka VirajSalaka commented Jan 7, 2026

Purpose

This pull request introduces multi-replica event synchronization support to the gateway controller, primarily by integrating a new EventHub system and associated event listener infrastructure. The changes enable replica synchronization in multi-replica deployments, add configuration options, and update several components to support this new functionality. Additionally, some test utilities are improved for better maintainability.

Multi-replica event synchronization and infrastructure:

  • Introduced a new EventHub abstraction with a pluggable backend model (defaulting to SQLite), including interface definitions and a concrete implementation (eventhub.go, backend.go). This enables event-based synchronization between controller replicas in multi-replica mode. [1] [2]
  • Integrated EventHub initialization in the main controller startup sequence, gated by the new EnableReplicaSync configuration flag and persistent storage requirements. The main function now starts and stops both the EventHub and an EventListener for event-driven updates. [1] [2] [3]
  • Added EnableReplicaSync to the server configuration (default: false), allowing users to control event-based multi-replica synchronization. [1] [2]

Dependency and API updates:

  • Updated the control plane client and API server constructors to accept and propagate the new EventHub and PolicyManager dependencies, ensuring that event-driven updates and policy management are coordinated throughout the system. [1] [2] [3] [4]

Testing improvements:

  • Refactored test utility creation in policy_ordering_test.go to use the new deployment service and configuration store, aligning tests with the updated API server structure and improving test reliability. [1] [2] [3] [4] [5]

Dependency and import organization:

  • Added and reorganized imports to support the new event hub and event listener components throughout the codebase. [1] [2] [3]

These changes lay the groundwork for robust multi-replica, event-driven synchronization in the gateway controller, while maintaining backward compatibility and improving testability.

Goals

Describe what solutions this feature or fix introduces to address the problems outlined above.

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI. Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter “N/A” plus brief explanation of why there’s no doc impact

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Summary by CodeRabbit

  • New Features
    • Pluggable EventHub (SQLite default), EventListener, adapter and mock for cross-component publish/subscribe and event-driven API processing.
  • Configuration
    • New EnableReplicaSync toggle to enable multi-replica event synchronization.
  • Database
    • Schema bumped to v6: adds events and per-organization state; storage now exposes DB handle for event wiring.
  • Refactor
    • Policy derivation and deployment moved into deployment service; deployment APIs extended to accept policy manager, event hub and replica mode.
  • Tests
    • Comprehensive EventHub/listener tests and updated deployment/topic tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

Walkthrough

Adds a pluggable EventHub (default SQLite) and SQLite backend, an EventListener with an EventHub→EventSource adapter, shifts policy derivation/update into APIDeploymentService with optional multi-replica EventHub publish, updates controller wiring and config flag, and migrates storage schema to version 6.

Changes

Cohort / File(s) Summary
EventHub Types & Core
gateway/gateway-controller/pkg/eventhub/types.go, gateway/gateway-controller/pkg/eventhub/backend.go, gateway/gateway-controller/pkg/eventhub/eventhub.go, gateway/gateway-controller/pkg/eventhub/topic.go
Introduces EventHub public API (types, Config, EventHub interface), backend abstraction, organization registry, and errors.
Default SQLite Backend
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
Adds full SQLite-backed backend: init, poll loop, publish (transactional), subscribe/unsubscribe, per-org state management, cleanup loop, delivery semantics, and lifecycle.
EventHub Tests
gateway/gateway-controller/pkg/eventhub/eventhub_test.go
Comprehensive in-memory SQLite tests for registration, publish/subscribe, polling delivery, cleanup, atomicity, and shutdown behavior.
EventListener Subsystem
gateway/gateway-controller/pkg/eventlistener/event_source.go, gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go, gateway/gateway-controller/pkg/eventlistener/listener.go, gateway/gateway-controller/pkg/eventlistener/api_processor.go, gateway/gateway-controller/pkg/eventlistener/mock_event_source.go
Adds generic EventSource, EventHub adapter, EventListener (Start/Stop, event handling for API lifecycle), API event processing logic, and a test mock.
API Server & Deployment Wiring
gateway/gateway-controller/pkg/api/handlers/handlers.go, gateway/gateway-controller/pkg/api/handlers/policy_ordering_test.go, gateway/gateway-controller/pkg/utils/api_deployment.go
Moves policy-building/updates into APIDeploymentService (public BuildStoredPolicyFromAPI, UpdateAPIConfiguration); NewAPIServer signature extended to accept eventHub and enableMultiReplicaMode; APIDeploymentService now holds policyManager, eventHub, and enableReplicaSync and implements topic lifecycle and conditional publish/xDS flows.
Storage Schema & DB Access
gateway/gateway-controller/pkg/storage/gateway-controller-db.sql, gateway/gateway-controller/pkg/storage/sqlite.go, gateway/gateway-controller/pkg/storage/interface.go
Migrates schema to v6: adds organization_states and events tables and indexes; exposes underlying DB via GetDB() *sql.DB on Storage/SQLiteStorage.
Controller & Config Wiring
gateway/gateway-controller/cmd/controller/main.go, gateway/gateway-controller/pkg/config/config.go, gateway/gateway-controller/pkg/controlplane/client.go
Adds EnableReplicaSync ServerConfig flag; controller initializes EventHub and EventListener when enabled; controlplane.NewClient signature updated to accept a PolicyManager and passes policyManager into APIDeploymentService.
Tests & Minor Updates
gateway/gateway-controller/pkg/utils/websub_topic_registration_test.go, gateway/gateway-controller/tests/integration/schema_test.go
Tests updated to new APIDeploymentService signature; integration schema test expects user_version=6.

Sequence Diagram(s)

sequenceDiagram
    participant API as API Handler
    participant DS as APIDeploymentService
    participant EH as EventHub
    participant EL as EventListener
    participant PM as PolicyManager
    participant XDS as XDS Manager

    rect rgb(245,247,255)
    note right of API: Multi-replica flow (enableReplicaSync=true)
    API->>DS: UpdateAPIConfiguration(params)
    DS->>DS: Parse & Persist config
    DS->>EH: PublishEvent(CREATE/UPDATE)
    EH->>EL: Deliver Event(s)
    EL->>DS: handleAPICreateOrUpdate(event)
    DS->>PM: Build & Apply Policy
    DS->>XDS: Trigger Snapshot Update (async)
    end

    rect rgb(245,255,245)
    note right of API: Single-replica flow (enableReplicaSync=false)
    API->>DS: UpdateAPIConfiguration(params)
    DS->>DS: Parse & Persist config
    DS->>PM: Build & Apply Policy
    DS->>XDS: Trigger Snapshot Update (sync)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I tunneled through tables, version six in sight,

I nudged up events and set the poller right,
Replicas now whisper, listeners awake,
Policies hop into place, snapshots gently take,
A rabbit applauds the sync — and nibbles a byte.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides a comprehensive purpose section explaining the multi-replica event synchronization feature and its implementation. However, the Goals, Approach, User stories, Documentation, Automation tests, Security checks, Samples, Related PRs, and Test environment sections are incomplete or contain only template placeholders. Complete the remaining sections of the PR description: provide Goals, Approach, User stories, Documentation links, test coverage details, security check confirmations, sample information, related PRs, and test environment specifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature being introduced: event-based synchronization between gateway controller replicas.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@VirajSalaka VirajSalaka changed the title Sync 2 temp Add Event based synchronization between gateway controller replicas Jan 7, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
gateway/gateway-controller/pkg/utils/api_deployment.go (1)

184-186: Remove debug print statement.

Line 185 contains fmt.Println("APIData: ", apiData) which appears to be a leftover debug statement.

🐛 Remove debug output
 	case api.RestApi:
 		apiData, err := apiConfig.Spec.AsAPIConfigData()
-		fmt.Println("APIData: ", apiData)
 		if err != nil {
🤖 Fix all issues with AI agents
In @gateway/gateway-controller/cmd/controller/main.go:
- Around line 113-128: The call to eventHub.RegisterOrganization("default")
ignores its returned error so failures (other than ErrOrganizationAlreadyExists)
are not handled; capture the error from eventHub.RegisterOrganization, import
the errors package, and if err != nil and !errors.Is(err,
eventhub.ErrOrganizationAlreadyExists) log.Fatal or return the error (consistent
with existing startup failure handling) to prevent continuing with a broken
EventHub; keep the existing success log when registration succeeds or the error
is ErrOrganizationAlreadyExists.

In @gateway/gateway-controller/pkg/eventhub/eventhub.go:
- Around line 11-18: The eventHub struct's initialized field is accessed
concurrently by Initialize and Close causing race conditions; add a mutex (e.g.,
stateMu sync.Mutex) to the eventHub struct and use it to protect reads and
writes of initialized in the Initialize and Close methods: lock before checking
initialized, perform initialization/cleanup and set initialized, then unlock
(use defer where appropriate) so state transitions are atomic and prevent
duplicate init/close operations. Ensure you reference the eventHub struct, the
initialized field, and the Initialize and Close methods when applying the
lock/unlock changes.

In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:
- Around line 307-312: deliverEvents can drop events when the delivery channel
is full, but org.updatePollState(state.VersionID, ...) is always called, causing
dropped events to be permanently skipped; change deliverEvents to return a bool
(e.g., success) indicating whether all events were enqueued/delivered and then
only call org.updatePollState(state.VersionID, time.Now()) when that bool is
true; update the call site in the block that currently checks if len(events) > 0
(replace the unconditional b.deliverEvents(org, events) with success :=
b.deliverEvents(org, events) and conditionally call org.updatePollState only on
success), and update the deliverEvents function signature and any other callers
accordingly.
- Around line 45-73: The Initialize method creates poller and cleanup child
contexts from the caller-provided ctx so external cancellation can prematurely
stop background work; change Initialize to create independent cancellable
contexts (e.g., context.WithCancel(context.Background())) for
b.pollerCtx/b.pollerCancel and b.cleanupCtx/b.cleanupCancel so pollLoop and
cleanupLoop run until Close explicitly cancels them; ensure Close still calls
b.pollerCancel and b.cleanupCancel to stop the goroutines.

In @gateway/gateway-controller/pkg/eventlistener/api_processor.go:
- Around line 81-106: The snapshot update goroutine can run before the new
policy is applied, causing a race; move the policy application before triggering
UpdateSnapshot (i.e., call el.policyManager.AddPolicy(storedPolicy) and handle
errors first when storedPolicy != nil), and only after AddPolicy succeeds spawn
the goroutine that calls el.snapshotManager.UpdateSnapshot (or if you need async
behavior, perform AddPolicy synchronously then start the goroutine); ensure you
reference storedPolicy, el.policyManager.AddPolicy, and
el.snapshotManager.UpdateSnapshot in the adjusted flow so the snapshot reflects
the applied policy.
- Around line 109-147: handleAPIDelete currently spawns the async snapshot
update goroutine before removing policies, which can cause a race where snapshot
is built with stale policies; move the policy removal logic (check
el.policyManager != nil && config != nil and call policyManager.RemovePolicy
with policyID) to occur synchronously before starting the goroutine that calls
el.snapshotManager.UpdateSnapshot, preserving the existing error logging for
RemovePolicy and ensuring you still trigger the async UpdateSnapshot afterward
(leave the context/timeout and logging in the goroutine intact).

In @gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go:
- Around line 173-177: Close() currently ranges over a.activeSubscriptions while
calling Unsubscribe(orgID), which mutates that map and is unsafe; fix by first
collecting the orgIDs into a separate slice (e.g., iterate over
a.activeSubscriptions to append keys), then iterate over that slice calling
a.Unsubscribe(orgID); ensure you reference the EventHubAdapter.Close method, the
a.activeSubscriptions map, and the Unsubscribe method when making the change.

In @gateway/gateway-controller/pkg/eventlistener/EXAMPLE_USAGE.md:
- Around line 369-381: The fenced diagram block lacks a language identifier;
update the opening fence for the event flow diagram (the triple backticks before
the lines starting "EventSource → Subscribe(orgID, chan) → EventListener") to
include a language tag such as "text" (i.e., change ``` to ```text) so the
diagram renders and is recognized by tooling.
- Around line 9-30: The fenced ASCII diagram in EXAMPLE_USAGE.md lacks a
language identifier; update the opening fence for the diagram (the triple
backticks surrounding the ASCII art) to include a language identifier such as
"text" (e.g., change ``` to ```text) so tools/renderers treat it as plain text;
the relevant block contains the ASCII diagram with symbols like EventListener,
EventSource, EventHubAdapter, MockEventSource and EventHub—modify that fenced
code block only.

In @gateway/gateway-controller/pkg/storage/gateway-controller-db.sql:
- Around line 165-178: The events table uses a composite primary key
(organization_id, processed_timestamp) which can collide under high throughput
and also has a redundant explicit index idx_events_lookup; fix by altering the
schema: add an auto-incrementing surrogate primary key column (e.g., id INTEGER
PRIMARY KEY AUTOINCREMENT) and keep organization_id and processed_timestamp as
regular columns (or create a separate non-unique index if needed), and remove
the explicit idx_events_lookup creation since the new PK index or a single
targeted index renders it unnecessary; if you prefer to keep the composite PK,
instead remove the CREATE INDEX IF NOT EXISTS idx_events_lookup ON
events(organization_id, processed_timestamp) line to avoid duplicating the
primary key index.

In @gateway/gateway-controller/pkg/utils/api_deployment.go:
- Around line 570-589: When enableReplicaSync is true the code path skips
s.store.Add in DeployAPIConfiguration so the DB write succeeds but the in-memory
store never gets the new config, causing immediate reads (e.g., GetAPIById) on
the same replica to miss; fix by ensuring the in-memory store is updated
immediately after a successful DB write even in multi-replica mode (call
s.store.Add or s.updateExistingConfig as appropriate) or implement a read
fallback in GetAPIById to query s.db when s.store returns not-found; adjust the
logic around enableReplicaSync in DeployAPIConfiguration (and reuse
updateExistingConfig for conflict handling) so the memory store and DB remain
consistent on the originating replica.
- Around line 408-418: UpdateAPIConfiguration currently calls
triggerXDSSnapshotUpdate and updatePolicyConfiguration directly, which misses
replica synchronization when enableReplicaSync is true; change it to follow
DeployAPIConfiguration's pattern: when enableReplicaSync is enabled publish the
update event to EventHub (using the same event type/payload used by
DeployAPIConfiguration) instead of calling
triggerXDSSnapshotUpdate/updatePolicyConfiguration directly, and only call those
local update methods when replica sync is disabled or when the event handling
path delivers the event back to this instance; reference UpdateAPIConfiguration,
DeployAPIConfiguration, triggerXDSSnapshotUpdate, updatePolicyConfiguration,
enableReplicaSync and the EventHub publish method used in DeployAPIConfiguration
to implement the conditional behavior.
🧹 Nitpick comments (13)
gateway/gateway-controller/pkg/config/config.go (1)

120-124: Configuration field looks good with proper defaults.

The EnableReplicaSync field is well-integrated with appropriate default value (false) for backward compatibility. The field correctly uses koanf tags consistent with other configuration fields.

Consider adding a documentation comment to clarify the field's purpose:

// EnableReplicaSync enables event-driven synchronization between controller replicas
// via EventHub. Requires persistent storage (SQLite or PostgreSQL). Default: false
EnableReplicaSync bool `koanf:"enable_replica_sync"`
gateway/gateway-controller/pkg/storage/sqlite.go (1)

222-247: Migration tables missing IF NOT EXISTS clause.

Unlike earlier migrations (e.g., lines 99, 125, 154, 177), the new organization_states and events tables are created without IF NOT EXISTS. This could cause migration failures if re-run after a partial failure.

♻️ Proposed fix
-			if _, err := s.db.Exec(`CREATE TABLE organization_states (
+			if _, err := s.db.Exec(`CREATE TABLE IF NOT EXISTS organization_states (
 				organization TEXT PRIMARY KEY,
 				version_id TEXT NOT NULL DEFAULT '',
 				updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
 			);`); err != nil {
 				return fmt.Errorf("failed to create organization_states table: %w", err)
 			}
-			if _, err := s.db.Exec(`CREATE TABLE events (
+			if _, err := s.db.Exec(`CREATE TABLE IF NOT EXISTS events (
 				organization_id TEXT NOT NULL,
 				processed_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
 				originated_timestamp TIMESTAMP NOT NULL,
 				event_type TEXT NOT NULL,
 				action TEXT NOT NULL CHECK(action IN ('CREATE', 'UPDATE', 'DELETE')),
 				entity_id TEXT NOT NULL,
 				correlation_id TEXT NOT NULL DEFAULT '',
 				event_data TEXT NOT NULL,
 				PRIMARY KEY (organization_id, processed_timestamp)
 			);`); err != nil {
 				return fmt.Errorf("failed to create events table: %w", err)
 			}
gateway/gateway-controller/pkg/eventlistener/listener.go (3)

99-100: Address the TODO for panic recovery in production.

The TODO comment is valid. An unhandled panic in processEvents() would cause the goroutine to silently die, leaving the EventListener in a broken state without recovery.

Consider wrapping the goroutine with a recovery mechanism. Would you like me to generate a panic-recovery wrapper implementation?


143-154: Potential race condition when closing the event channel.

The Stop() method closes eventChan after wg.Wait(), but there's a subtle issue: if the EventSource continues to send events after cancel() is called but before the internal goroutine exits, those sends to a potentially closed channel could panic.

Since the contract states "The channel should NOT be closed by the EventSource" (per event_source.go line 75), this should be safe. However, the channel close here is unnecessary since processEvents() exits on context cancellation and the channel will be garbage collected.

♻️ Optional: Remove unnecessary channel close
 func (el *EventListener) Stop() {
 	if el.cancel != nil {
 		el.cancel()
 	}
 	el.wg.Wait()
-
-	if el.eventChan != nil {
-		close(el.eventChan)
-	}

 	el.logger.Info("EventListener stopped")
 }

134-139: Use constant instead of magic string for event type.

The string "API" should reference the EventTypeAPI constant from the eventhub package for consistency and to avoid drift between components.

♻️ Proposed fix

Import the eventhub package and use its constant:

+import (
+	...
+	"github.com/wso2/api-platform/gateway/gateway-controller/pkg/eventhub"
+)

 func (el *EventListener) handleEvent(event Event) {
 	...
 	switch event.EventType {
-	case "API": // EventTypeAPI constant
+	case string(eventhub.EventTypeAPI):
 		el.processAPIEvents(event)
 	default:
 		log.Debug("Ignoring non-API event")
 	}
 }

Alternatively, define local constants in the eventlistener package that mirror the eventhub constants.

gateway/gateway-controller/pkg/eventhub/eventhub.go (2)

61-65: Accept context parameter for RegisterOrganization.

Creating a background context ignores any parent cancellation or timeout. If the EventHub was initialized with a context containing deadlines, those constraints won't propagate to organization registration. Consider updating the method signature to accept a context parameter, or at minimum document why background context is required.

♻️ Proposed refactor: Accept context parameter
-func (eh *eventHub) RegisterOrganization(organizationID string) error {
-	ctx := context.Background()
+func (eh *eventHub) RegisterOrganization(ctx context.Context, organizationID string) error {
 	return eh.backend.RegisterOrganization(ctx, organizationID)
 }

Note: This requires updating the EventHub interface in types.go and all call sites.


20-42: Consider adding nil parameter validation.

The constructors don't validate that db, logger, or backend are non-nil. While this may be acceptable if callers are trusted, adding defensive nil checks could prevent panics and provide clearer error messages.

♻️ Optional: Add parameter validation
 func New(db *sql.DB, logger *zap.Logger, config Config) EventHub {
+	if db == nil {
+		panic("db cannot be nil")
+	}
+	if logger == nil {
+		panic("logger cannot be nil")
+	}
 	sqliteConfig := &SQLiteBackendConfig{
 		PollInterval:    config.PollInterval,
 		CleanupInterval: config.CleanupInterval,
 		RetentionPeriod: config.RetentionPeriod,
 	}
 	backend := NewSQLiteBackend(db, logger, sqliteConfig)
 	return &eventHub{
 		backend: backend,
 		logger:  logger,
 	}
 }
 
 func NewWithBackend(backend EventhubImpl, logger *zap.Logger) EventHub {
+	if backend == nil {
+		panic("backend cannot be nil")
+	}
+	if logger == nil {
+		panic("logger cannot be nil")
+	}
 	return &eventHub{
 		backend: backend,
 		logger:  logger,
 	}
 }
gateway/gateway-controller/pkg/eventlistener/api_processor.go (1)

35-52: Consider validating event fields before processing.

The function doesn't validate that event.EntityID (apiID) is non-empty. If an event arrives with an empty entity ID, downstream operations (DB lookups, store operations) may fail with unclear errors. Adding validation here would improve error clarity and early detection of malformed events.

♻️ Optional: Add field validation
 func (el *EventListener) processAPIEvents(event Event) {
 	log := el.logger.With(
 		zap.String("api_id", event.EntityID),
 		zap.String("action", event.Action),
 		zap.String("correlation_id", event.CorrelationID),
 	)
 
 	apiID := event.EntityID
+	if apiID == "" {
+		log.Error("Event has empty EntityID")
+		return
+	}
+
 	// TODO: (VirajSalaka) Use Context to propogate correlationID
 	switch event.Action {
 	case "CREATE", "UPDATE":
 		el.handleAPICreateOrUpdate(apiID, event.CorrelationID, log)
 	case "DELETE":
 		el.handleAPIDelete(apiID, event.CorrelationID, log)
 	default:
 		log.Warn("Unknown action type")
 	}
 }
gateway/gateway-controller/pkg/storage/gateway-controller-db.sql (1)

153-161: Consider using NULL instead of empty string for version_id default.

The version_id field defaults to an empty string (''). Using NULL would more clearly indicate "no version set" and is more idiomatic for SQL. Empty strings can complicate queries and may require special handling in application code.

♻️ Optional: Use NULL default
 CREATE TABLE IF NOT EXISTS organization_states (
     organization TEXT PRIMARY KEY,
-    version_id TEXT NOT NULL DEFAULT '',
+    version_id TEXT DEFAULT NULL,
     updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
 );
gateway/gateway-controller/pkg/eventlistener/mock_event_source.go (1)

152-170: Potential test hang: unbuffered channel send.

Line 167 sends directly to eventChan without a select/timeout. If the test's receiver goroutine isn't ready, this will block indefinitely, causing test hangs.

♻️ Proposed fix: Add non-blocking send or document requirement
 	// Send events to subscriber
-	eventChan <- events
+	select {
+	case eventChan <- events:
+		// Successfully sent
+	default:
+		// Channel full or not ready - for testing, this might indicate a test setup issue
+		return fmt.Errorf("subscriber channel not ready for organization: %s", organizationID)
+	}

 	return nil

Alternatively, document that test channels must be buffered.

gateway/gateway-controller/pkg/api/handlers/handlers.go (1)

711-740: Incomplete EventHub integration in DeleteAPI.

The TODO comments indicate that DeleteAPI doesn't use the EventHub for multi-replica synchronization, unlike CreateAPI which conditionally publishes events when enableReplicaSync is true. This creates inconsistent behavior in multi-replica mode.

Is this intentional for this PR, with DeleteAPI EventHub integration planned for a follow-up? If you'd like, I can help draft the EventHub integration for the delete flow.

gateway/gateway-controller/pkg/eventhub/types.go (2)

18-28: Consider aligning Event structs between packages.

The eventhub.Event struct here and eventlistener.Event in event_source.go (lines 28-49 of that file) have similar but not identical fields. The adapter converts between them. While this separation allows for independent evolution, ensure the field semantics are documented to prevent confusion.

The current approach with explicit conversion in EventHubAdapter.bridgeEvents is acceptable, but consider adding comments explaining why the two Event types exist separately.


37-61: Consider adding Unsubscribe to EventHub interface.

The EventHub interface defines Subscribe but not Unsubscribe. However, SQLiteBackend implements Unsubscribe(orgID string, eventChan chan<- []Event). If unsubscription is a valid operation, it should be part of the interface for consistency.

♻️ Proposed interface addition
 type EventHub interface {
 	// ... existing methods ...

 	// Subscribe registers a channel to receive events for an organization
 	Subscribe(organizationID string, eventChan chan<- []Event) error

+	// Unsubscribe removes a subscription channel for an organization
+	Unsubscribe(organizationID string, eventChan chan<- []Event) error

 	// CleanUpEvents removes events between the specified time range
 	CleanUpEvents(ctx context.Context, timeFrom, timeEnd time.Time) error
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adfd6fb and 278c8a9.

📒 Files selected for processing (23)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/api/handlers/handlers.go
  • gateway/gateway-controller/pkg/api/handlers/policy_ordering_test.go
  • gateway/gateway-controller/pkg/config/config.go
  • gateway/gateway-controller/pkg/controlplane/client.go
  • gateway/gateway-controller/pkg/eventhub/backend.go
  • gateway/gateway-controller/pkg/eventhub/eventhub.go
  • gateway/gateway-controller/pkg/eventhub/eventhub_test.go
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/eventhub/topic.go
  • gateway/gateway-controller/pkg/eventhub/types.go
  • gateway/gateway-controller/pkg/eventlistener/EXAMPLE_USAGE.md
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
  • gateway/gateway-controller/pkg/eventlistener/event_source.go
  • gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go
  • gateway/gateway-controller/pkg/eventlistener/listener.go
  • gateway/gateway-controller/pkg/eventlistener/mock_event_source.go
  • gateway/gateway-controller/pkg/storage/gateway-controller-db.sql
  • gateway/gateway-controller/pkg/storage/interface.go
  • gateway/gateway-controller/pkg/storage/sqlite.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
  • gateway/gateway-controller/pkg/utils/websub_topic_registration_test.go
  • gateway/gateway-controller/tests/integration/schema_test.go
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/pkg/controlplane/client.go
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
  • gateway/gateway-controller/pkg/api/handlers/handlers.go
  • gateway/gateway-controller/pkg/utils/websub_topic_registration_test.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2025-12-23T10:41:47.584Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
  • gateway/gateway-controller/pkg/api/handlers/handlers.go
📚 Learning: 2025-11-08T13:06:22.133Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.

Applied to files:

  • gateway/gateway-controller/pkg/api/handlers/handlers.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2025-12-22T16:00:12.851Z
Learnt from: O-sura
Repo: wso2/api-platform PR: 569
File: gateway/policy-engine/internal/analytics/constants.go:100-101
Timestamp: 2025-12-22T16:00:12.851Z
Learning: In gateway/policy-engine/internal/analytics/constants.go, the constant APITypeKey is intentionally named with "Type" terminology (instead of "Kind") to maintain consistency with the analytics publisher platform, which uses "type" rather than "kind". The constant resolves to "x-wso2-api-kind" but the name APITypeKey is kept for cross-platform naming consistency.

Applied to files:

  • gateway/gateway-controller/pkg/api/handlers/handlers.go
📚 Learning: 2025-12-18T08:33:51.340Z
Learnt from: O-sura
Repo: wso2/api-platform PR: 485
File: gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go:21-90
Timestamp: 2025-12-18T08:33:51.340Z
Learning: In gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go, the JSON tag casing differences between DefaultFaultEvent (correlationID, applicationID, organizationID, environmentID, regionID with uppercase 'ID') and DefaultResponseEvent (correlationId, applicationId, organizationId, environmentId, regionId with lowercase 'Id') are intentional to maintain consistency with respective event type conventions and config file property standards.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/types.go
🧬 Code graph analysis (12)
gateway/gateway-controller/pkg/storage/interface.go (1)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/eventhub.go (4)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • EventhubImpl (10-46)
  • SQLiteBackendConfig (76-83)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/types.go (4)
  • Config (65-72)
  • EventHub (38-61)
  • EventType (9-9)
  • Event (19-28)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (1)
  • NewSQLiteBackend (33-43)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
gateway/gateway-controller/pkg/eventhub/types.go (2)
  • EventType (9-9)
  • Event (19-28)
gateway/gateway-controller/pkg/eventlistener/event_source.go (1)
  • Event (29-50)
gateway/gateway-controller/pkg/eventlistener/event_source.go (1)
gateway/gateway-controller/pkg/eventhub/types.go (2)
  • EventType (9-9)
  • Event (19-28)
gateway/gateway-controller/cmd/controller/main.go (4)
gateway/gateway-controller/pkg/eventhub/types.go (1)
  • EventHub (38-61)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (22-33)
gateway/gateway-controller/pkg/eventlistener/listener.go (2)
  • EventListener (35-48)
  • NewEventListener (63-81)
gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go (1)
  • NewEventHubAdapter (49-55)
gateway/gateway-controller/pkg/eventlistener/mock_event_source.go (1)
gateway/gateway-controller/pkg/eventlistener/event_source.go (1)
  • Event (29-50)
gateway/gateway-controller/pkg/api/handlers/handlers.go (6)
gateway/gateway-controller/pkg/eventhub/types.go (1)
  • EventHub (38-61)
gateway/gateway-controller/pkg/utils/api_deployment.go (8)
  • APIUpdateParams (59-65)
  • NotFoundError (70-72)
  • HandleMismatchError (79-82)
  • APIValidationError (89-91)
  • ParseError (116-118)
  • TopicOperationError (98-100)
  • ConflictError (107-109)
  • DatabaseUnavailableError (125-125)
gateway/gateway-controller/pkg/api/middleware/correlation.go (1)
  • GetCorrelationID (81-88)
gateway/gateway-controller/pkg/api/generated/generated.go (3)
  • ErrorResponse (637-644)
  • Error (80-80)
  • ValidationError (1287-1293)
gateway/gateway-controller/pkg/config/validator.go (1)
  • ValidationError (24-27)
gateway/gateway-controller/pkg/models/stored_config.go (1)
  • StoredConfig (38-48)
gateway/gateway-controller/pkg/eventhub/topic.go (2)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (22-33)
gateway/gateway-controller/pkg/eventhub/types.go (1)
  • Event (19-28)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (4)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • SQLiteBackendConfig (76-83)
  • DefaultSQLiteBackendConfig (86-92)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventType (9-9)
  • Event (19-28)
  • OrganizationState (31-35)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (22-33)
gateway/gateway-controller/pkg/eventhub/types.go (1)
gateway/gateway-controller/pkg/eventlistener/event_source.go (1)
  • Event (29-50)
gateway/gateway-controller/pkg/utils/websub_topic_registration_test.go (1)
gateway/gateway-controller/pkg/utils/api_deployment.go (1)
  • NewAPIDeploymentService (146-168)
gateway/gateway-controller/pkg/utils/api_deployment.go (8)
gateway/gateway-controller/pkg/api/generated/generated.go (8)
  • Error (80-80)
  • ValidationError (1287-1293)
  • Asyncwebsub (33-33)
  • APIConfiguration (311-321)
  • Metadata (1150-1153)
  • RestApi (34-34)
  • Upstream (1259-1266)
  • Policy (1171-1183)
gateway/gateway-controller/pkg/config/validator.go (2)
  • ValidationError (24-27)
  • Validator (32-34)
gateway/gateway-controller/pkg/storage/memory.go (1)
  • ConfigStore (31-46)
gateway/gateway-controller/pkg/config/config.go (1)
  • RouterConfig (172-185)
gateway/gateway-controller/pkg/eventhub/types.go (2)
  • EventHub (38-61)
  • EventTypeAPI (13-13)
gateway/gateway-controller/pkg/models/stored_config.go (1)
  • StoredConfig (38-48)
sdk/gateway/policyengine/v1/config.go (2)
  • Configuration (41-47)
  • PolicyInstance (18-37)
gateway/gateway-controller/pkg/xds/translator.go (1)
  • GenerateRouteName (114-117)
🪛 Gitleaks (8.30.0)
gateway/gateway-controller/pkg/eventhub/eventhub_test.go

[high] 309-309: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 markdownlint-cli2 (0.18.1)
gateway/gateway-controller/pkg/eventlistener/EXAMPLE_USAGE.md

9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


369-369: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (36)
gateway/gateway-controller/pkg/eventlistener/EXAMPLE_USAGE.md (1)

1-381: Excellent documentation structure and examples.

The documentation provides clear architectural overview and comprehensive examples covering production usage, testing, and extensibility. The three usage patterns (EventHub, MockEventSource, custom Kafka) effectively demonstrate the pluggable design.

gateway/gateway-controller/tests/integration/schema_test.go (1)

98-98: Schema version update correctly reflects the new database schema.

The version bump from 5 to 6 aligns with the addition of new tables (organization_states, events) and related indexes for EventHub integration.

gateway/gateway-controller/pkg/config/config.go (1)

363-367: Default configuration correctly initializes the new field.

Setting EnableReplicaSync to false by default maintains backward compatibility and follows the opt-in pattern for new features.

gateway/gateway-controller/pkg/utils/websub_topic_registration_test.go (1)

21-21: Test constructor calls correctly updated to match new signature.

The updated calls properly pass nil for policyManager and eventHub, and false for enableReplicaSync, which is appropriate for these WebSub-focused tests that don't require replica synchronization. Both fields are handled gracefully throughout APIDeploymentService: policyManager has an early return guard at line 889, and eventHub is protected by a nil check at line 283 before any usage.

gateway/gateway-controller/pkg/storage/sqlite.go (1)

1643-1647: LGTM!

Clean accessor method enabling EventHub to share the same database connection. The comment clarifies the purpose.

gateway/gateway-controller/pkg/api/handlers/policy_ordering_test.go (2)

39-48: LGTM!

Test setup correctly wires the new APIDeploymentService with nil values for optional dependencies (policyManager, eventHub, etc.), which is appropriate for unit testing policy ordering logic in isolation.


203-203: LGTM!

Correct migration from direct method call to using the deployment service abstraction. The policy building logic is now properly encapsulated.

gateway/gateway-controller/pkg/eventhub/eventhub_test.go (4)

16-47: LGTM!

Well-structured test helper that creates an in-memory SQLite database with the required schema. The schema matches the production migration in sqlite.go.


69-106: LGTM!

Good test for the publish-subscribe workflow with appropriate timeout handling. The 100ms poll interval speeds up the test without sacrificing coverage.


265-285: LGTM!

Good test for idempotent shutdown - verifying that calling Close() multiple times is safe, which is important for graceful shutdown scenarios.


307-318: Static analysis false positive - ignore.

The Gitleaks warning about "generic-api-key" on line 309 is a false positive. The string "test-correlation-id-1" is test data for a correlation ID field, not an actual API key or secret.

gateway/gateway-controller/cmd/controller/main.go (2)

269-287: LGTM!

EventListener is correctly initialized and started only when EventHub is available. The dependencies are properly wired, and the comment noting policyManager can be nil is helpful.


390-393: LGTM!

EventListener is properly stopped during shutdown, maintaining the correct shutdown sequence (after servers, before final log).

gateway/gateway-controller/pkg/controlplane/client.go (1)

117-132: LGTM!

The policyManager parameter is correctly wired to NewAPIDeploymentService. Passing nil for eventHub and false for enableReplicaSync is appropriate since the control plane client handles deployments via WebSocket events rather than the EventHub mechanism.

The TODO comment is a good reminder for future design decisions regarding control plane integration with replica sync.

gateway/gateway-controller/pkg/eventlistener/event_source.go (2)

29-50: LGTM - Good abstraction layer.

The Event struct intentionally duplicates fields from eventhub.Event to provide an agnostic event type that decouples eventlistener from eventhub. This enables plugging in different event sources (Kafka, RabbitMQ, etc.) without changing the listener code.

The single Timestamp field (vs ProcessedTimestamp/OriginatedTimestamp in eventhub) appropriately simplifies the consumer interface.


60-100: LGTM!

Well-documented interface with clear contracts:

  • Batch delivery via channel
  • Explicit ownership rules (channel not closed by EventSource)
  • Idempotent Close() requirement

The design principles align with best practices for testable, pluggable event systems.

gateway/gateway-controller/pkg/eventlistener/api_processor.go (1)

149-303: Well-designed policy merging logic.

The policy building logic correctly implements the documented merging rules:

  • Operation policies define authoritative order when present
  • API-level policies not mentioned in operation policies are appended
  • Falls back to API-level policies when operation has none
  • Handles vhost resolution with sensible defaults
  • Returns nil when no policies exist (avoiding unnecessary objects)

The implementation for both WebSub and REST APIs is consistent and defensive.

gateway/gateway-controller/pkg/eventhub/backend.go (1)

8-92: Well-designed backend abstraction.

The EventhubImpl interface provides a clean abstraction for different message broker implementations:

  • Clear separation between initialization, operations, and cleanup
  • Consistent error handling across all methods
  • Context support for cancellation and timeouts
  • Batch delivery pattern (chan<- []Event) is efficient
  • Backend-specific behavior is well-documented in comments
  • Default SQLite configuration values are sensible

The design enables future integration with NATS, Azure Service Bus, or other brokers without changing the EventHub API.

gateway/gateway-controller/pkg/eventhub/topic.go (4)

38-54: LGTM!

The register method correctly:

  • Uses write lock for safe concurrent access
  • Checks for duplicates before registration
  • Initializes lastPolled to current time to avoid replaying old events
  • Returns clear error when organization already exists

68-82: LGTM!

The addSubscriber method uses proper two-phase locking:

  • Acquires registry read lock to retrieve organization
  • Releases registry lock before acquiring organization lock (prevents deadlock)
  • Safely appends subscriber under write lock

84-105: LGTM!

The removeSubscriber method correctly:

  • Uses two-phase locking to prevent deadlock
  • Safely removes subscriber from slice
  • Returns immediately after finding the match
  • Treats "not found" as success (line 104), which is appropriate for cleanup operations

125-133: LGTM!

The getSubscribers method properly:

  • Uses read lock for concurrent access
  • Returns a copy of the subscribers slice (prevents external mutation)
  • Pre-allocates the copy slice for efficiency
gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go (2)

57-95: LGTM on Subscribe logic flow.

The subscription flow is well-structured: idempotency check, organization registration, bridge channel creation, and goroutine spawn. The error handling for registration being non-fatal aligns with the comment about idempotent operations.


97-150: LGTM on bridgeEvents implementation.

The bridge goroutine properly handles context cancellation, channel closure, and event conversion. The non-blocking forward pattern with context check prevents deadlocks.

gateway/gateway-controller/pkg/eventlistener/mock_event_source.go (2)

43-77: LGTM on MockEventSource structure and constructor.

The mock implementation has proper thread-safety with mutex protection and comprehensive state tracking for test assertions. The constructor initializes all slices to avoid nil issues.


189-211: LGTM on helper methods.

The Reset and GetSubscriptionCount methods properly handle thread-safety and provide useful test utilities.

gateway/gateway-controller/pkg/api/handlers/handlers.go (3)

84-87: LGTM on EventHub integration in NewAPIServer.

The constructor properly accepts and propagates the new eventHub and enableMultiReplicaMode parameters to the deployment service. This maintains clean dependency injection.


480-556: LGTM on UpdateAPI refactoring.

The refactored UpdateAPI properly delegates to deploymentService.UpdateAPIConfiguration with clean error type mapping to appropriate HTTP status codes. The separation of concerns improves maintainability.


988-1000: Policy update errors occur after HTTP response.

The policy management code runs after c.JSON(http.StatusCreated, ...) returns the response to the client. If AddPolicy fails, the client has already received a success response. This pattern appears in multiple endpoints (CreateLLMProvider, UpdateLLMProvider, CreateLLMProxy, etc.).

Is this fire-and-forget pattern intentional? If policy synchronization is critical, consider moving it before the response or using a different error handling strategy.

gateway/gateway-controller/pkg/eventhub/types.go (1)

74-81: LGTM on Config and DefaultConfig.

The default configuration values are reasonable: 5-second poll interval balances responsiveness with load, 10-minute cleanup interval avoids excessive DB operations, and 1-hour retention provides sufficient buffer for catching up.

gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (3)

100-158: LGTM on Publish transaction handling.

The atomic transaction pattern with defer tx.Rollback() (safe no-op after commit) ensures consistency between event insertion and organization state update. The UUID-based versioning provides reliable change detection.


372-398: Event drop on full subscriber channel.

The non-blocking send at lines 385-396 drops events when subscriber channels are full. While this prevents poller deadlock, dropped events could cause replicas to become inconsistent.

Consider whether:

  1. The buffer size on subscriber channels (currently 10 in the adapter) is sufficient for expected load
  2. A metric/counter for dropped events would help with observability
  3. An alternative backpressure mechanism is needed for critical events

217-244: LGTM on Close shutdown pattern.

The graceful shutdown properly cancels both context, waits for goroutines via WaitGroup, and resets the initialized flag under lock protection.

gateway/gateway-controller/pkg/utils/api_deployment.go (3)

58-129: LGTM on custom error types.

The custom error types provide clean HTTP status code mapping in handlers. Each type has a clear semantic meaning and proper Error() implementation.


281-316: Multi-replica mode skips local policy update.

In multi-replica mode (lines 281-312), only EventHub publish occurs - updatePolicyConfiguration is not called. The single-replica path (lines 313-316) calls both triggerXDSSnapshotUpdate and updatePolicyConfiguration.

Is this intentional? In multi-replica mode:

  1. Does the EventListener handle policy updates when processing received events?
  2. If the publishing replica is also a subscriber, will it receive its own event and update policies?
  3. Could there be a window where the API is created but policies aren't applied on the originating replica?

718-868: LGTM on BuildStoredPolicyFromAPI implementation.

The policy building logic correctly:

  • Prioritizes operation-level policies over API-level
  • Appends non-overlapping API-level policies
  • Handles both RestApi and Asyncwebsub kinds
  • Uses consistent route naming via xds.GenerateRouteName

The vhost handling for sandbox environments (lines 816-831) properly falls back to defaults when not specified.

Comment on lines +233 to +234
// GetDB exposes the underlying *sql.DB for advanced operations.
GetDB() *sql.DB
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GetDB() breaks the storage abstraction and violates the interface's design philosophy.

The Storage interface documentation explicitly states it should be "database-agnostic" (line 31) and warns "DON'T: Expose database implementation details through return types" (line 47). Adding GetDB() directly violates these principles by coupling consumers to SQL databases and preventing non-SQL implementations.

Consider these alternatives:

  • Option 1 (Preferred): Add specific methods to the Storage interface for EventHub operations (e.g., ExecuteInTransaction(func(*sql.Tx) error) or domain-specific operations).
  • Option 2: Use type assertion in EventHub initialization where SQL access is required: if sqlStorage, ok := storage.(*SQLiteStorage); ok { ... } and document that EventHub requires SQLite/PostgreSQL storage.
  • Option 3: Create a separate SQLStorage interface that extends Storage with GetDB(), allowing opt-in SQL access while maintaining the base abstraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database agnostic means any SQL database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, thank you for the clarification! That makes perfect sense. Since the Storage interface is scoped specifically to SQL databases (SQLite, PostgreSQL, MySQL, etc.) rather than all storage types, exposing GetDB() *sql.DB is appropriate and doesn't break the abstraction within that scope. The *sql.DB type from Go's standard library works uniformly across all SQL database drivers, maintaining portability between SQL implementations.

Given this design intent, the GetDB() method is a reasonable addition for cases like EventHub initialization where direct database access is needed.


✏️ Learnings added
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/storage/interface.go:233-234
Timestamp: 2026-01-09T08:19:13.830Z
Learning: In gateway/gateway-controller/pkg/storage/interface.go, the Storage interface design philosophy of being "database-agnostic" means it should support any SQL database (SQLite, PostgreSQL, MySQL, etc.), not non-SQL storage backends. Therefore, exposing *sql.DB through GetDB() is appropriate and maintains the intended abstraction.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Comment on lines +408 to +418
// Update xDS snapshot asynchronously
s.triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger)

// Update derived policy configuration
s.updatePolicyConfiguration(existing, params.Logger)

return &APIDeploymentResult{
StoredConfig: existing,
IsUpdate: true,
}, nil
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

UpdateAPIConfiguration doesn't use EventHub in multi-replica mode.

Unlike DeployAPIConfiguration which conditionally uses EventHub when enableReplicaSync is true, UpdateAPIConfiguration always calls triggerXDSSnapshotUpdate and updatePolicyConfiguration directly (lines 409, 412).

This inconsistency means:

  • Creates in multi-replica mode → event published to EventHub
  • Updates → direct xDS/policy update, no event published

Other replicas won't learn about updates. Should this follow the same pattern as DeployAPIConfiguration?

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 408 -
418, UpdateAPIConfiguration currently calls triggerXDSSnapshotUpdate and
updatePolicyConfiguration directly, which misses replica synchronization when
enableReplicaSync is true; change it to follow DeployAPIConfiguration's pattern:
when enableReplicaSync is enabled publish the update event to EventHub (using
the same event type/payload used by DeployAPIConfiguration) instead of calling
triggerXDSSnapshotUpdate/updatePolicyConfiguration directly, and only call those
local update methods when replica sync is disabled or when the event handling
path delivers the event back to this instance; reference UpdateAPIConfiguration,
DeployAPIConfiguration, triggerXDSSnapshotUpdate, updatePolicyConfiguration,
enableReplicaSync and the EventHub publish method used in DeployAPIConfiguration
to implement the conditional behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, understood! I've noted that this will be addressed in a future update.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In @gateway/gateway-controller/cmd/controller/main.go:
- Around line 113-130: When registering the default org, treat
ErrOrganizationAlreadyExists as non-fatal so restarts succeed: after calling
eventHub.RegisterOrganization("default") check if err != nil and if
errors.Is(err, eventhub.ErrOrganizationAlreadyExists) then log.Info that the
organization already exists and continue, otherwise call log.Fatal as before;
ensure you import the standard "errors" package and reference
eventhub.ErrOrganizationAlreadyExists and the eventHub.RegisterOrganization call
to locate the change.

In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:
- Around line 303-316: Duplicate/malformed conditional block: remove the extra
nested `if len(events) > 0`, fix the inverted error check, and eliminate the
stray unconditional `org.updatePollState` so timestamps are only updated on
successful delivery or when there are no events. Concretely, in the function
that handles `events` (references: `events`, `b.deliverEvents`,
`org.updatePollState`, `state.VersionID`), implement a single `if len(events) >
0 { ... } else { ... }` where inside the true branch you call
`b.deliverEvents(org, events)` and only call
`org.updatePollState(state.VersionID, time.Now())` when `b.deliverEvents`
returns nil (success); in the false branch call
`org.updatePollState(state.VersionID, time.Now())`; remove any extra/duplicate
`if` and the orphaned `org.updatePollState` to restore correct control flow and
compilation.

In @gateway/gateway-controller/pkg/eventlistener/api_processor.go:
- Around line 81-108: The snapshot update goroutine is started before the policy
is applied, causing a race where snapshotManager.UpdateSnapshot may push an xDS
snapshot that doesn't include the new policy; move the policy application so
that el.policyManager.AddPolicy(storedPolicy) is executed and confirmed (handle
its error) before launching the goroutine that calls
el.snapshotManager.UpdateSnapshot(ctx, correlationID); ensure you still build
the policy with el.buildStoredPolicyFromAPI(config) as before, and if AddPolicy
returns an error avoid or abort the snapshot update (or log and return) so the
control plane and data plane remain consistent.
- Around line 111-149: In handleAPIDelete, remove the policy from PolicyManager
before spawning the async xDS snapshot goroutine to avoid the race;
specifically, call el.policyManager.RemovePolicy(apiID + "-policies") (guarded
by el.policyManager != nil and config != nil) and log any errors or success,
then start the goroutine that calls el.snapshotManager.UpdateSnapshot(ctx,
correlationID); keep existing timeouts and error logging but invert the order so
policy removal completes (synchronously) prior to triggering the snapshot.

In @gateway/gateway-controller/pkg/storage/sqlite.go:
- Around line 235-246: The current CREATE TABLE in sqlite.go uses a composite
PRIMARY KEY (organization_id, processed_timestamp) which can collide under high
throughput; update the DDL in the block that executes on s.db.Exec to use a
surrogate auto-increment integer primary key (e.g., add id INTEGER PRIMARY KEY
AUTOINCREMENT) instead of the composite key, keep the processed_timestamp column
as DEFAULT CURRENT_TIMESTAMP, and then create a non-unique index on
(organization_id, processed_timestamp) (via a separate s.db.Exec CREATE INDEX
statement) to preserve query performance and eliminate uniqueness collisions.

In @gateway/gateway-controller/pkg/utils/api_deployment.go:
- Around line 570-589: The code skips calling s.store.Add when
s.enableReplicaSync is true, causing the originating replica to return before
its in-memory store contains the new config and breaking immediate reads; modify
DeployAPIConfiguration to still add the storedCfg to s.store after the database
write succeeds (even when enableReplicaSync is true), handle
storage.IsConflictError by calling s.updateExistingConfig(storedCfg, logger) if
Add returns a conflict, and on any store.Add or update failure rollback the DB
write (s.db.DeleteConfig(storedCfg.ID)) and return an error so callers see a
consistent state; ensure GetAPIById continues to read from the in-memory store
reflecting the new config immediately.
- Around line 408-412: UpdateAPIConfiguration currently calls
triggerXDSSnapshotUpdate and updatePolicyConfiguration unconditionally,
bypassing EventHub and enableReplicaSync and causing desync in multi-replica
setups; change UpdateAPIConfiguration to mirror DeployAPIConfiguration: if
enableReplicaSync is true, publish the same replication event to EventHub (the
same event payload used by DeployAPIConfiguration) and skip direct local calls,
otherwise perform the existing direct calls to
triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger) and
s.updatePolicyConfiguration(existing, params.Logger); ensure you reference and
reuse the same enableReplicaSync check and EventHub publish path used in
DeployAPIConfiguration so behavior is consistent across both functions.
🧹 Nitpick comments (3)
gateway/gateway-controller/cmd/controller/main.go (1)

392-395: Consider logging EventListener stop errors.

The Stop() method may return an error that is currently ignored. Adding error logging would aid troubleshooting during shutdown.

📝 Optional improvement
 // Stop EventListener if it was started
 if evtListener != nil {
-	evtListener.Stop()
+	if err := evtListener.Stop(); err != nil {
+		log.Warn("Failed to stop EventListener cleanly", zap.Error(err))
+	}
 }
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)

46-64: Consider using RLock for read-only initialization checks.

The initialized field uses sync.RWMutex but only calls Lock() in both Initialize() and Close(). Since both methods check the field before modifying it, you could use RLock() for the initial check, then upgrade to Lock() if modification is needed. This would allow concurrent initialization checks without blocking.

♻️ Optional optimization
 func (eh *eventHub) Initialize(ctx context.Context) error {
+	// Fast path: check if already initialized
+	eh.mu.RLock()
+	if eh.initialized {
+		eh.mu.RUnlock()
+		return nil
+	}
+	eh.mu.RUnlock()
+	
+	// Acquire write lock for initialization
 	eh.mu.Lock()
 	defer eh.mu.Unlock()
 
+	// Double-check after acquiring write lock
 	if eh.initialized {
 		return nil
 	}
 
 	eh.logger.Info("Initializing EventHub")
 
 	if err := eh.backend.Initialize(ctx); err != nil {
 		return err
 	}
 
 	eh.initialized = true
 	eh.logger.Info("EventHub initialized successfully")
 	return nil
 }

Similar pattern can be applied to Close().

Also applies to: 88-106

gateway/gateway-controller/pkg/utils/api_deployment.go (1)

887-887: TODO comment flags incomplete multi-replica support.

Line 887 contains a TODO indicating updatePolicyConfiguration doesn't work correctly in multi-replica mode. This aligns with the broader issue that policy updates (like xDS updates) need EventHub integration when enableReplicaSync is enabled.

Would you like me to generate a shell script to identify all code paths that update xDS or policy state without checking enableReplicaSync?

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 278c8a9 and c9f1551.

📒 Files selected for processing (8)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventhub/eventhub.go
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
  • gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go
  • gateway/gateway-controller/pkg/storage/gateway-controller-db.sql
  • gateway/gateway-controller/pkg/storage/sqlite.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
📚 Learning: 2025-12-23T10:41:47.584Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventlistener/api_processor.go
📚 Learning: 2025-11-08T13:06:22.133Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.

Applied to files:

  • gateway/gateway-controller/pkg/utils/api_deployment.go
🧬 Code graph analysis (3)
gateway/gateway-controller/pkg/storage/sqlite.go (1)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/cmd/controller/main.go (8)
gateway/gateway-controller/pkg/eventhub/types.go (1)
  • EventHub (38-61)
gateway/gateway-controller/pkg/api/generated/generated.go (1)
  • Server (1223-1253)
gateway/gateway-controller/pkg/policyxds/server.go (1)
  • Server (38-46)
platform-api/src/internal/server/server.go (1)
  • Server (48-55)
platform-api/src/internal/dto/openapi_spec.go (1)
  • Info (31-37)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (24-35)
gateway/gateway-controller/pkg/eventlistener/listener.go (2)
  • EventListener (35-48)
  • NewEventListener (63-81)
gateway/gateway-controller/pkg/controlplane/client.go (1)
  • NewClient (111-155)
gateway/gateway-controller/pkg/eventhub/eventhub.go (3)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • EventhubImpl (10-46)
  • SQLiteBackendConfig (76-83)
gateway/gateway-controller/pkg/eventhub/types.go (4)
  • Config (65-72)
  • EventHub (38-61)
  • EventType (9-9)
  • Event (19-28)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (1)
  • NewSQLiteBackend (33-43)
🔇 Additional comments (7)
gateway/gateway-controller/pkg/storage/sqlite.go (1)

1635-1639: LGTM: GetDB() accessor enables EventHub initialization.

The accessor provides controlled access to the underlying database handle for EventHub schema operations while maintaining encapsulation for other use cases.

gateway/gateway-controller/cmd/controller/main.go (2)

271-289: LGTM: EventListener initialization is complete.

All required dependencies are properly passed, and the fail-fast error handling is appropriate for the startup phase.


317-317: LGTM: Dependency propagation is correct.

The updated constructor calls properly propagate policyManager, eventHub, and enableReplicaSync to enable multi-replica synchronization and policy coordination.

Also applies to: 346-346

gateway/gateway-controller/pkg/eventhub/eventhub.go (1)

66-86: LGTM: Delegation pattern is clean and consistent.

All methods properly delegate to the backend implementation, maintaining the separation between the EventHub interface and backend specifics.

gateway/gateway-controller/pkg/eventlistener/api_processor.go (2)

151-305: LGTM: Policy merging and route generation logic is sound.

The implementation correctly handles:

  • Policy precedence (operation-level overrides API-level)
  • Multiple API kinds (WebSub and REST)
  • Virtual host resolution with proper fallbacks
  • Route key generation consistent with xDS translator expectations

The nil return when no policies exist prevents unnecessary policy config creation.


307-322: LGTM: Policy conversion is straightforward.

The conversion properly maps all fields and handles nil parameters gracefully. Defaulting Enabled to true is a reasonable choice for newly created policy instances.

gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go (1)

176-176: The project's go.mod specifies Go 1.25.1, which is well after Go 1.23 when sync.Map.Clear() was introduced. No compatibility issue exists—the code is valid and will compile without errors.

Likely an incorrect or invalid review comment.

Comment on lines 113 to 130
// Initialize EventHub if multi-replica mode is enabled
var eventHub eventhub.EventHub
if cfg.GatewayController.Server.EnableReplicaSync {
if cfg.IsPersistentMode() && db != nil {
log.Info("Initializing EventHub for multi-replica mode")
eventHub = eventhub.New(db.GetDB(), log, eventhub.DefaultConfig())
ctx := context.Background()
if err := eventHub.Initialize(ctx); err != nil {
log.Fatal("Failed to initialize EventHub", zap.Error(err))
}
if err := eventHub.RegisterOrganization("default"); err != nil {
log.Fatal("Failed to register default organization", zap.Error(err))
}
log.Info("EventHub initialized successfully")
} else {
log.Fatal("EventHub requires persistent storage. Multi-replica mode will not function correctly in memory-only mode.")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle ErrOrganizationAlreadyExists to support restarts.

The current code fails with log.Fatal on any error from RegisterOrganization("default"). On controller restart with persistent storage, the "default" organization already exists in the database, causing ErrOrganizationAlreadyExists and preventing startup.

🐛 Proposed fix
+import (
+	"errors"
+	// ... existing imports
+)

 if err := eventHub.RegisterOrganization("default"); err != nil {
-	log.Fatal("Failed to register default organization", zap.Error(err))
+	// ErrOrganizationAlreadyExists is acceptable on restart
+	if !errors.Is(err, eventhub.ErrOrganizationAlreadyExists) {
+		log.Fatal("Failed to register default organization", zap.Error(err))
+	}
+	log.Debug("Default organization already registered")
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @gateway/gateway-controller/cmd/controller/main.go around lines 113 - 130,
When registering the default org, treat ErrOrganizationAlreadyExists as
non-fatal so restarts succeed: after calling
eventHub.RegisterOrganization("default") check if err != nil and if
errors.Is(err, eventhub.ErrOrganizationAlreadyExists) then log.Info that the
organization already exists and continue, otherwise call log.Fatal as before;
ensure you import the standard "errors" package and reference
eventhub.ErrOrganizationAlreadyExists and the eventHub.RegisterOrganization call
to locate the change.

Comment on lines 303 to 316
if len(events) > 0 {
if len(events) > 0 {
if b.deliverEvents(org, events) != nil{
org.updatePollState(state.VersionID, time.Now())
}
// If delivery failed (channel full), don't update timestamp
// so events will be retried on next poll
} else {
org.updatePollState(state.VersionID, time.Now())
}
}

org.updatePollState(state.VersionID, time.Now())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Malformed code with syntax errors and inverted logic.

The code block at lines 303-316 contains multiple critical issues:

  1. Line 304: Duplicate condition if len(events) > 0 with incorrect indentation
  2. Line 305: Inverted logic — checks != nil (error case) but then updates poll state, when it should update only on success (== nil)
  3. Line 315: Orphaned org.updatePollState call outside any condition block, causing unconditional timestamp updates (reintroducing the dropped-events bug from the past review)
  4. Syntax errors: The block structure is malformed and likely won't compile
🐛 Proposed fix
-		if len(events) > 0 {
-			 		if len(events) > 0 {
-			if b.deliverEvents(org, events) != nil{
-				org.updatePollState(state.VersionID, time.Now())
-			}
-			// If delivery failed (channel full), don't update timestamp
-			// so events will be retried on next poll
-		} else {
-			org.updatePollState(state.VersionID, time.Now())
-		}
-	}
-
-		org.updatePollState(state.VersionID, time.Now())
-	}
+		if len(events) > 0 {
+			if b.deliverEvents(org, events) == nil {
+				// Only update timestamp on successful delivery
+				org.updatePollState(state.VersionID, time.Now())
+			}
+			// If delivery failed (channel full), don't update timestamp
+			// so events will be retried on next poll
+		} else {
+			// No events, update version to avoid re-checking
+			org.updatePollState(state.VersionID, time.Now())
+		}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if len(events) > 0 {
if len(events) > 0 {
if b.deliverEvents(org, events) != nil{
org.updatePollState(state.VersionID, time.Now())
}
// If delivery failed (channel full), don't update timestamp
// so events will be retried on next poll
} else {
org.updatePollState(state.VersionID, time.Now())
}
}
org.updatePollState(state.VersionID, time.Now())
}
if len(events) > 0 {
if b.deliverEvents(org, events) == nil {
// Only update timestamp on successful delivery
org.updatePollState(state.VersionID, time.Now())
}
// If delivery failed (channel full), don't update timestamp
// so events will be retried on next poll
} else {
// No events, update version to avoid re-checking
org.updatePollState(state.VersionID, time.Now())
}
}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go around lines 303 -
316, Duplicate/malformed conditional block: remove the extra nested `if
len(events) > 0`, fix the inverted error check, and eliminate the stray
unconditional `org.updatePollState` so timestamps are only updated on successful
delivery or when there are no events. Concretely, in the function that handles
`events` (references: `events`, `b.deliverEvents`, `org.updatePollState`,
`state.VersionID`), implement a single `if len(events) > 0 { ... } else { ... }`
where inside the true branch you call `b.deliverEvents(org, events)` and only
call `org.updatePollState(state.VersionID, time.Now())` when `b.deliverEvents`
returns nil (success); in the false branch call
`org.updatePollState(state.VersionID, time.Now())`; remove any extra/duplicate
`if` and the orphaned `org.updatePollState` to restore correct control flow and
compilation.

Comment on lines +81 to +108
var storedPolicy *models.StoredPolicyConfig

if el.policyManager != nil {
storedPolicy = el.buildStoredPolicyFromAPI(config)
}

// TODO: (VirajSalaka) Handle failures in policy addition properly (rollback)
// TODO: (VirajSalaka) Use ErrGroup to parallelize XDS update and policy addition
// 3. Trigger async XDS snapshot update
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

if err := el.snapshotManager.UpdateSnapshot(ctx, correlationID); err != nil {
log.Error("Failed to update xDS snapshot", zap.Error(err))
} else {
log.Info("xDS snapshot updated successfully")
}
}()

// 4. Update PolicyManager (if configured)
if storedPolicy != nil {
if err := el.policyManager.AddPolicy(storedPolicy); err != nil {
log.Warn("Failed to add policy to PolicyManager", zap.Error(err))
} else {
log.Info("Added policy to PolicyManager")
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Policy update and xDS snapshot ordering causes race condition.

The policy is added to PolicyManager (line 103) AFTER the async xDS snapshot update goroutine is spawned (line 90). The xDS snapshot may be built and pushed to the data plane before the policy is applied, resulting in:

  • Stale or missing policies being enforced by Envoy
  • Inconsistent state between control plane policy store and data plane enforcement
🔧 Proposed fix: Apply policy before triggering snapshot
 var storedPolicy *models.StoredPolicyConfig
 
 if el.policyManager != nil {
 	storedPolicy = el.buildStoredPolicyFromAPI(config)
+	// Apply policy BEFORE snapshot to ensure consistency
+	if storedPolicy != nil {
+		if err := el.policyManager.AddPolicy(storedPolicy); err != nil {
+			log.Warn("Failed to add policy to PolicyManager", zap.Error(err))
+		} else {
+			log.Info("Added policy to PolicyManager")
+		}
+	}
 }
 
 // 3. Trigger async XDS snapshot update
 go func() {
 	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 	defer cancel()
 
 	if err := el.snapshotManager.UpdateSnapshot(ctx, correlationID); err != nil {
 		log.Error("Failed to update xDS snapshot", zap.Error(err))
 	} else {
 		log.Info("xDS snapshot updated successfully")
 	}
 }()
-
-// 4. Update PolicyManager (if configured)
-if storedPolicy != nil {
-	if err := el.policyManager.AddPolicy(storedPolicy); err != nil {
-		log.Warn("Failed to add policy to PolicyManager", zap.Error(err))
-	} else {
-		log.Info("Added policy to PolicyManager")
-	}
-}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventlistener/api_processor.go around lines
81 - 108, The snapshot update goroutine is started before the policy is applied,
causing a race where snapshotManager.UpdateSnapshot may push an xDS snapshot
that doesn't include the new policy; move the policy application so that
el.policyManager.AddPolicy(storedPolicy) is executed and confirmed (handle its
error) before launching the goroutine that calls
el.snapshotManager.UpdateSnapshot(ctx, correlationID); ensure you still build
the policy with el.buildStoredPolicyFromAPI(config) as before, and if AddPolicy
returns an error avoid or abort the snapshot update (or log and return) so the
control plane and data plane remain consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a todo statement to handle the issue properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, thank you for acknowledging the race condition. Adding a TODO is a reasonable approach to track this for future work. The issue will be captured for a subsequent fix when you address the ordering and error handling comprehensively.

Just to clarify the risk: until this is resolved, there's a small window where the xDS snapshot could be pushed before the policy is fully applied, which may result in temporary inconsistency between the control plane and data plane.


🧠 Learnings used
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Comment on lines +111 to +149
// handleAPIDelete removes the API from in-memory store and updates XDS
func (el *EventListener) handleAPIDelete(apiID string, correlationID string, log *zap.Logger) {
// 1. Check if config exists in store (for logging/policy removal)
config, err := el.store.Get(apiID)
if err != nil {
log.Warn("Config not found in store, may already be deleted")
// Continue anyway to ensure cleanup
}

// 2. Remove from in-memory store
if err := el.store.Delete(apiID); err != nil {
log.Warn("Failed to delete config from store", zap.Error(err))
// Continue - config may not exist
} else {
log.Info("Removed API config from in-memory store")
}

// 3. Trigger async XDS snapshot update
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

if err := el.snapshotManager.UpdateSnapshot(ctx, correlationID); err != nil {
log.Error("Failed to update xDS snapshot after delete", zap.Error(err))
} else {
log.Info("xDS snapshot updated after delete")
}
}()

// 4. Remove from PolicyManager (if configured)
if el.policyManager != nil && config != nil {
policyID := apiID + "-policies"
if err := el.policyManager.RemovePolicy(policyID); err != nil {
log.Warn("Failed to remove policy from PolicyManager", zap.Error(err))
} else {
log.Info("Removed policy from PolicyManager")
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Policy removal and xDS snapshot ordering causes race condition.

Similar to the CREATE/UPDATE flow, the policy is removed from PolicyManager (line 143) AFTER the async xDS snapshot update goroutine is spawned (line 129). The snapshot may be built before the policy is removed, leaving stale policies active in the data plane even after the API is deleted.

🔧 Proposed fix: Remove policy before triggering snapshot
 // 2. Remove from in-memory store
 if err := el.store.Delete(apiID); err != nil {
 	log.Warn("Failed to delete config from store", zap.Error(err))
 	// Continue - config may not exist
 } else {
 	log.Info("Removed API config from in-memory store")
 }
 
+// 3. Remove from PolicyManager (if configured) BEFORE snapshot
+if el.policyManager != nil && config != nil {
+	policyID := apiID + "-policies"
+	if err := el.policyManager.RemovePolicy(policyID); err != nil {
+		log.Warn("Failed to remove policy from PolicyManager", zap.Error(err))
+	} else {
+		log.Info("Removed policy from PolicyManager")
+	}
+}
+
-// 3. Trigger async XDS snapshot update
+// 4. Trigger async XDS snapshot update
 go func() {
 	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 	defer cancel()
 
 	if err := el.snapshotManager.UpdateSnapshot(ctx, correlationID); err != nil {
 		log.Error("Failed to update xDS snapshot after delete", zap.Error(err))
 	} else {
 		log.Info("xDS snapshot updated after delete")
 	}
 }()
-
-// 4. Remove from PolicyManager (if configured)
-if el.policyManager != nil && config != nil {
-	policyID := apiID + "-policies"
-	if err := el.policyManager.RemovePolicy(policyID); err != nil {
-		log.Warn("Failed to remove policy from PolicyManager", zap.Error(err))
-	} else {
-		log.Info("Removed policy from PolicyManager")
-	}
-}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventlistener/api_processor.go around lines
111 - 149, In handleAPIDelete, remove the policy from PolicyManager before
spawning the async xDS snapshot goroutine to avoid the race; specifically, call
el.policyManager.RemovePolicy(apiID + "-policies") (guarded by el.policyManager
!= nil and config != nil) and log any errors or success, then start the
goroutine that calls el.snapshotManager.UpdateSnapshot(ctx, correlationID); keep
existing timeouts and error logging but invert the order so policy removal
completes (synchronously) prior to triggering the snapshot.

Comment on lines +408 to +412
// Update xDS snapshot asynchronously
s.triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger)

// Update derived policy configuration
s.updatePolicyConfiguration(existing, params.Logger)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

UpdateAPIConfiguration bypasses EventHub in multi-replica mode.

Lines 408-412 directly call triggerXDSSnapshotUpdate and updatePolicyConfiguration, ignoring enableReplicaSync. This creates an inconsistency:

  • DeployAPIConfiguration (lines 281-316): conditionally publishes to EventHub when enableReplicaSync is true
  • UpdateAPIConfiguration (lines 408-412): always performs direct updates, never publishes to EventHub

In multi-replica deployments, updates won't be synchronized across replicas, leading to stale state on other instances.

♻️ Proposed fix
 	params.Logger.Info("API configuration updated",
 		zap.String("id", existing.ID),
 		zap.String("handle", handle))
 
-	// Update xDS snapshot asynchronously
-	s.triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger)
-
-	// Update derived policy configuration
-	s.updatePolicyConfiguration(existing, params.Logger)
+	// Update xDS snapshot and policy configuration
+	if s.enableReplicaSync {
+		// Multi-replica mode: Publish event to eventhub
+		if s.eventHub != nil {
+			organizationID := "default"
+			ctx := context.Background()
+			if err := s.eventHub.PublishEvent(ctx, organizationID, eventhub.EventTypeAPI, "UPDATE", existing.ID, params.CorrelationID, []byte{}); err != nil {
+				params.Logger.Error("Failed to publish update event to eventhub",
+					zap.Error(err),
+					zap.String("api_id", existing.ID),
+					zap.String("correlation_id", params.CorrelationID))
+			} else {
+				params.Logger.Info("Update event published to eventhub",
+					zap.String("api_id", existing.ID),
+					zap.String("correlation_id", params.CorrelationID))
+			}
+		} else {
+			params.Logger.Warn("Multi-replica mode enabled but eventhub is not initialized",
+				zap.String("api_id", existing.ID))
+		}
+	} else {
+		s.triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger)
+		s.updatePolicyConfiguration(existing, params.Logger)
+	}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 408 -
412, UpdateAPIConfiguration currently calls triggerXDSSnapshotUpdate and
updatePolicyConfiguration unconditionally, bypassing EventHub and
enableReplicaSync and causing desync in multi-replica setups; change
UpdateAPIConfiguration to mirror DeployAPIConfiguration: if enableReplicaSync is
true, publish the same replication event to EventHub (the same event payload
used by DeployAPIConfiguration) and skip direct local calls, otherwise perform
the existing direct calls to triggerXDSSnapshotUpdate(existing.ID,
params.CorrelationID, params.Logger) and s.updatePolicyConfiguration(existing,
params.Logger); ensure you reference and reuse the same enableReplicaSync check
and EventHub publish path used in DeployAPIConfiguration so behavior is
consistent across both functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be fixed later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, understood! Would you like me to create a GitHub issue to track this improvement for a future PR? I can open one with the relevant context about ensuring UpdateAPIConfiguration uses the same EventHub-based replication path as DeployAPIConfiguration when enableReplicaSync is enabled.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.

Comment on lines +570 to 589
if !s.enableReplicaSync {
if err := s.store.Add(storedCfg); err != nil {
// Check if it's a conflict (API already exists)
if storage.IsConflictError(err) {
logger.Info("API configuration already exists in memory, updating instead",
zap.String("api_id", storedCfg.ID),
zap.String("displayName", storedCfg.GetDisplayName()),
zap.String("version", storedCfg.GetVersion()))

// Try to update instead
return s.updateExistingConfig(storedCfg, logger)
} else {
// Rollback database write (only if persistent mode)
if s.db != nil {
_ = s.db.DeleteConfig(storedCfg.ID)
// Try to update instead
return s.updateExistingConfig(storedCfg, logger)
} else {
// Rollback database write (only if persistent mode)
if s.db != nil {
_ = s.db.DeleteConfig(storedCfg.ID)
}
return false, fmt.Errorf("failed to add config to memory store: %w", err)
}
return false, fmt.Errorf("failed to add config to memory store: %w", err)
}
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

In-memory store bypassed in multi-replica mode, breaking immediate reads.

When enableReplicaSync is true (lines 570-589), store.Add is skipped. This creates a consistency gap:

  1. Config is written to database successfully
  2. DeployAPIConfiguration returns
  3. In-memory store on the originating replica does not contain the config
  4. Immediate subsequent reads (e.g., GetAPIById) will fail with "not found" until the EventListener processes the event

This breaks the expectation that a successful deployment makes the config immediately queryable.

Recommendation: Either update the in-memory store immediately (even in multi-replica mode), or ensure read paths fall back to the database when the in-memory store misses.

♻️ Proposed fix: Update in-memory store immediately
 	// Try to add to in-memory store
-	if !s.enableReplicaSync {
-		if err := s.store.Add(storedCfg); err != nil {
-			// Check if it's a conflict (API already exists)
-			if storage.IsConflictError(err) {
-				logger.Info("API configuration already exists in memory, updating instead",
-					zap.String("api_id", storedCfg.ID),
-					zap.String("displayName", storedCfg.GetDisplayName()),
-					zap.String("version", storedCfg.GetVersion()))
-
-				// Try to update instead
-				return s.updateExistingConfig(storedCfg, logger)
-			} else {
-				// Rollback database write (only if persistent mode)
-				if s.db != nil {
-					_ = s.db.DeleteConfig(storedCfg.ID)
-				}
-				return false, fmt.Errorf("failed to add config to memory store: %w", err)
+	if err := s.store.Add(storedCfg); err != nil {
+		// Check if it's a conflict (API already exists)
+		if storage.IsConflictError(err) {
+			logger.Info("API configuration already exists in memory, updating instead",
+				zap.String("api_id", storedCfg.ID),
+				zap.String("displayName", storedCfg.GetDisplayName()),
+				zap.String("version", storedCfg.GetVersion()))
+
+			// Try to update instead
+			return s.updateExistingConfig(storedCfg, logger)
+		} else {
+			// Rollback database write (only if persistent mode)
+			if s.db != nil {
+				_ = s.db.DeleteConfig(storedCfg.ID)
 			}
+			return false, fmt.Errorf("failed to add config to memory store: %w", err)
 		}
 	}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 570 -
589, The code skips calling s.store.Add when s.enableReplicaSync is true,
causing the originating replica to return before its in-memory store contains
the new config and breaking immediate reads; modify DeployAPIConfiguration to
still add the storedCfg to s.store after the database write succeeds (even when
enableReplicaSync is true), handle storage.IsConflictError by calling
s.updateExistingConfig(storedCfg, logger) if Add returns a conflict, and on any
store.Add or update failure rollback the DB write
(s.db.DeleteConfig(storedCfg.ID)) and return an error so callers see a
consistent state; ensure GetAPIById continues to read from the in-memory store
reflecting the new config immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the expected behavior. In memory storage is used for syncing XDS snapshots and it should be available after event listener processes the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, thank you for the clarification! I understand now—this is an intentional design for multi-replica consistency. In multi-replica mode, all replicas (including the originating one) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state across the cluster.

This makes sense for distributed synchronization. To help future reviewers understand this design decision, consider adding a brief comment in the code explaining why the in-memory store update is deferred to the EventListener in multi-replica mode:

// Try to add to in-memory store
if !s.enableReplicaSync {
    // Single-replica mode: update in-memory store immediately
    if err := s.store.Add(storedCfg); err != nil {
        // ... error handling
    }
} else {
    // Multi-replica mode: in-memory store will be updated via EventListener
    // after event processing to ensure consistency across all replicas
}

✏️ Learnings added
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.

Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:
- Around line 45-71: The Initialize method creates poller and cleanup contexts
from the caller-provided ctx so background workers (pollLoop and cleanupLoop)
can be cancelled unexpectedly; change initialization to derive independent
cancellable contexts from context.Background() (e.g., b.pollerCtx,
b.pollerCancel = context.WithCancel(context.Background()) and same for
cleanupCtx) and ensure Close() calls the stored cancel functions
(b.pollerCancel(), b.cleanupCancel()) and waits on b.wg to perform graceful
shutdown; update any tests/usage that assume caller ctx controls these
goroutines.
- Around line 14-30: The initialized flag on SQLiteBackend is unsafely accessed
from multiple goroutines; add a sync.Mutex (e.g., mu sync.Mutex) to the
SQLiteBackend struct and use it to guard reads/writes of b.initialized in
Initialize and Close (lock at start of Initialize, defer unlock; in Close lock,
check initialized, set to false, unlock, then perform shutdown outside the lock
to avoid deadlock with wg.Wait). Ensure the unique symbols referenced are
SQLiteBackend, mu, Initialize, and Close so the mutex protects the
initialization lifecycle checks and updates.

In @gateway/gateway-controller/pkg/storage/gateway-controller-db.sql:
- Around line 165-176: The events table DDL contains a duplicate column
definition for processed_timestamp in the CREATE TABLE statement (table name:
events, column: processed_timestamp); remove the redundant processed_timestamp
line so the table defines processed_timestamp only once (keeping the NOT NULL
DEFAULT CURRENT_TIMESTAMP definition), leaving the rest of the columns
(originated_timestamp, event_type, action, entity_id, correlation_id,
event_data) and the PRIMARY KEY (correlation_id) intact.
🧹 Nitpick comments (2)
gateway/gateway-controller/pkg/storage/sqlite.go (1)

1638-1643: Exposing the internal *sql.DB breaks encapsulation.

While necessary for EventHub initialization, GetDB() allows callers to bypass SQLiteStorage methods and directly manipulate the database, risking schema violations or connection pool exhaustion. Consider documenting the intended usage or restricting access.

📝 Add documentation to clarify intended usage
+// GetDB returns the underlying *sql.DB instance.
+// This is intended ONLY for EventHub backend initialization and should not be
+// used for general database operations. Use SQLiteStorage methods instead.
+// WARNING: Direct DB access bypasses storage layer guarantees.
 func (s *SQLiteStorage) GetDB() *sql.DB {
 	return s.db
 }
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (1)

340-371: Consider adding a LIMIT clause to prevent memory exhaustion.

The getEventsSince query (line 342-349) has no LIMIT, meaning a burst of events could load thousands of rows into memory. The TODO acknowledges pagination is needed.

♻️ Proposed fix: Add configurable limit
 func (b *SQLiteBackend) getEventsSince(ctx context.Context, orgID string, since time.Time) ([]Event, error) {
-	// TODO: (VirajSalaka) Implement pagination if large number of events
 	query := `
 		SELECT processed_timestamp, originated_timestamp, event_type,
 		       action, entity_id, correlation_id, event_data
 		FROM events
 		WHERE organization_id = ? AND processed_timestamp > ?
 		ORDER BY processed_timestamp ASC
+		LIMIT 1000
 	`

Or make the limit configurable via SQLiteBackendConfig.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9f1551 and 5ab34ba.

📒 Files selected for processing (4)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/storage/gateway-controller-db.sql
  • gateway/gateway-controller/pkg/storage/sqlite.go
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.
📚 Learning: 2026-01-07T17:40:54.433Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
📚 Learning: 2025-12-23T10:41:47.584Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
🧬 Code graph analysis (2)
gateway/gateway-controller/pkg/storage/sqlite.go (1)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (5)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • SQLiteBackendConfig (76-83)
  • DefaultSQLiteBackendConfig (86-92)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventType (9-9)
  • Event (19-28)
  • OrganizationState (31-35)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (24-35)
gateway/gateway-controller/pkg/eventlistener/event_source.go (1)
  • Event (29-50)
🔇 Additional comments (13)
gateway/gateway-controller/pkg/storage/sqlite.go (1)

217-259: LGTM! Migration to version 6 with EventHub tables looks correct.

The schema changes address the previous review concerns:

  • Using correlation_id as PRIMARY KEY eliminates the timestamp collision risk
  • The index on (organization_id, processed_timestamp) supports efficient event polling queries
  • The organization_states table properly tracks per-organization versioning
gateway/gateway-controller/pkg/storage/gateway-controller-db.sql (2)

153-161: LGTM! Organization states table design is appropriate.

The table correctly tracks per-organization versioning with an indexed updated_at column for efficient queries.


178-181: LGTM! Index and version pragma are correctly defined.

The composite index on (organization_id, processed_timestamp) supports the polling query pattern in getEventsSince().

gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (5)

98-156: LGTM! Atomic event publishing with transaction.

The Publish method correctly uses a transaction to atomically insert the event and update the organization version. The deferred rollback pattern ensures cleanup on any error path.


182-213: LGTM! Cleanup methods use parameterized queries.

Both Cleanup and CleanupRange properly use parameterized queries, avoiding SQL injection risks.


215-240: LGTM! Graceful shutdown sequence.

The shutdown correctly cancels both context and waits for goroutines to complete via wg.Wait() before returning.


303-312: LGTM! Event delivery logic correctly handles failures.

The logic properly updates poll state only on successful delivery (or when there are no events), ensuring failed deliveries are retried on the next poll cycle.


373-402: Partial delivery risk with multiple subscribers.

If the first subscriber's channel accepts events but a subsequent subscriber's channel is full, the function returns an error, and the next poll will re-deliver events to all subscribers—including those that already received them. This could cause duplicate processing.

The TODO at line 385 acknowledges this limitation. For single-subscriber scenarios (the current use case per the comment), this is acceptable.

Ensure the EventListener implementation handles potential duplicate events idempotently, or restrict to single subscriber until multi-subscriber support is properly implemented.

gateway/gateway-controller/cmd/controller/main.go (5)

113-131: LGTM: EventHub initialization guards are correct.

The initialization correctly gates EventHub creation behind the EnableReplicaSync flag and enforces the persistent storage requirement with a clear fatal error message. The Initialize call with proper context and cleanup defer pattern are appropriate.


272-290: LGTM: EventListener wiring is correct.

The EventListener initialization properly checks for eventHub availability, passes all required dependencies (including nil-safe policyManager as documented), and includes appropriate error handling with fatal on startup failure. The EventHubAdapter abstraction cleanly bridges EventHub to the EventListener's event source interface.


393-396: LGTM: EventListener shutdown is properly sequenced.

The shutdown logic correctly checks for nil before calling Stop() and places EventListener teardown at the end of the shutdown sequence, ensuring event processing continues until other components have been gracefully stopped.


123-127: Improve error handling for organization registration restarts.

The current code logs all registration errors at Error level and continues. However, ErrOrganizationAlreadyExists is expected and acceptable on restart with persistent storage, while other errors should be fatal. The code should distinguish between these cases: log at Debug level for "already exists" and fatal for other errors.

This issue was previously flagged in past review comments.

⛔ Skipped due to learnings
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

347-347: The code properly handles the nil eventHub case. The APIDeploymentService uses a defensive two-level check at line 281-283 of api_deployment.go: it first verifies that enableReplicaSync is true, then checks that eventHub != nil before calling any methods on it. This aligns with the initialization logic in main.go where eventHub is only non-nil when both EnableReplicaSync is true and persistent storage is available.

Comment on lines +286 to +288
if err := evtListener.Start(context.Background()); err != nil {
log.Fatal("Failed to start EventListener", zap.Error(err))
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for EventListener.Start method implementation
ast-grep --pattern $'func ($_ *EventListener) Start($_ context.Context) error {
  $$$
}'

Repository: wso2/api-platform

Length of output: 1940


🏁 Script executed:

# Find the Subscribe method implementation
ast-grep --pattern $'func ($_ *$_) Subscribe($_ context.Context, $_ string, $_ chan $_) error {
  $$$
}'

Repository: wso2/api-platform

Length of output: 43


🏁 Script executed:

# Search for Subscribe method definitions
rg "func.*Subscribe.*context\.Context" -A 15

Repository: wso2/api-platform

Length of output: 3289


🏁 Script executed:

# Get the full Subscribe method from EventHubAdapter
sed -n '/^func (a \*EventHubAdapter) Subscribe(ctx context.Context, organizationID string, eventChan chan<- \[\]Event) error/,/^func/p' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go | head -100

Repository: wso2/api-platform

Length of output: 1567


🏁 Script executed:

# Search for EventHub interface definition and Subscribe/RegisterOrganization methods
rg "type.*EventHub.*interface|func.*Subscribe\(.*string.*chan|func.*RegisterOrganization" -A 5 gateway/gateway-controller/pkg/eventlistener/

Repository: wso2/api-platform

Length of output: 1376


🏁 Script executed:

# Search for EventHub interface
rg "type EventHub interface" -A 20 gateway/gateway-controller/

Repository: wso2/api-platform

Length of output: 2106


🏁 Script executed:

# Search for RegisterOrganization and Subscribe implementations in eventhub package
rg "func.*RegisterOrganization|func.*Subscribe\(organizationID" gateway/gateway-controller/pkg/eventhub/ -A 15

Repository: wso2/api-platform

Length of output: 5382


🏁 Script executed:

# Search for Subscribe implementation in backend
rg "func \(b \*SQLiteBackend\) Subscribe" gateway/gateway-controller/pkg/eventhub/ -A 20

Repository: wso2/api-platform

Length of output: 1887


EventListener.Start blocks on database operations and should use a timeout or spawn initialization in a goroutine.

The Start method calls eventSource.Subscribe(), which in turn calls eventHub.RegisterOrganization(). This method performs a synchronous database operation (db.ExecContext()) that can block indefinitely on the main goroutine until database I/O completes. Although goroutines are spawned afterward, the initial blocking on RegisterOrganization prevents the main goroutine from proceeding to start the HTTP server and other components. Either wrap the initialization in a goroutine or provide a timeout context to prevent indefinite blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the startup phase. So I think it is okay as it delays API processing until the event listener properly configured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, that makes sense. Blocking during the startup phase to ensure all critical components are properly initialized before accepting traffic is a reasonable approach, especially with the fail-fast pattern (using log.Fatal() on failure). This ensures the application doesn't start in a partially initialized state where it could accept requests before the EventListener is ready to process events.

The synchronous initialization here acts as a gate to prevent race conditions, which aligns well with the multi-replica synchronization goals of this PR.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @gateway/gateway-controller/cmd/controller/main.go:
- Around line 272-290: EventListener.Start is called synchronously and can block
startup; change the initialization to avoid blocking by either (A) creating a
cancellable timeout context and passing that to evtListener.Start to bound
startup DB ops, or (B) start evtListener in a background goroutine so it does
not block HTTP/server startup (ensure Start errors are logged and propagated via
a channel or health check). Locate the code that constructs evtListener
(eventlistener.NewEventListener) and the call
evtListener.Start(context.Background()), replace context.Background() with a
context with timeout when using option A, or call Start(...) inside a goroutine
and signal failures through a dedicated error channel/health probe when using
option B.

In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:
- Around line 45-71: The Initialize method currently derives pollerCtx and
cleanupCtx from the caller-provided ctx causing background workers to be
cancelled when the caller cancels; change Initialize to create independent
cancellable contexts using context.WithCancel(context.Background()) for
pollerCtx/pollerCancel and cleanupCtx/cleanupCancel so pollLoop and cleanupLoop
run independently and are only stopped by Close(), and ensure Close() calls
those cancel functions (pollerCancel and cleanupCancel) to perform graceful
shutdown.
- Around line 14-30: The initialized boolean on SQLiteBackend is racy; add a
mutex (e.g., sync.Mutex or sync.RWMutex) to the SQLiteBackend struct and use it
to guard all reads and writes of the initialized field in Initialize and Close
(and any other methods that check it). Wrap checks (e.g., "if b.initialized")
and assignments (e.g., "b.initialized = true/false") with the mutex (Lock/Unlock
or RLock/RUnlock with defer) so concurrent Initialize/Close calls are serialized
and the race is eliminated.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ab34ba and a4c6a1c.

📒 Files selected for processing (4)
  • gateway/gateway-controller/cmd/controller/main.go
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/storage/gateway-controller-db.sql
  • gateway/gateway-controller/pkg/storage/sqlite.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/gateway-controller/pkg/storage/gateway-controller-db.sql
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.
📚 Learning: 2026-01-07T17:40:54.433Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:40:54.433Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
📚 Learning: 2025-12-23T10:41:47.584Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/cmd/controller/main.go
🧬 Code graph analysis (2)
gateway/gateway-controller/pkg/storage/sqlite.go (1)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (4)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • SQLiteBackendConfig (76-83)
  • DefaultSQLiteBackendConfig (86-92)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventType (9-9)
  • Event (19-28)
  • OrganizationState (31-35)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (24-35)
🔇 Additional comments (9)
gateway/gateway-controller/pkg/storage/sqlite.go (1)

1638-1643: LGTM!

The GetDB() method cleanly exposes the underlying database handle for external EventHub initialization. The implementation is straightforward and appropriate for the use case.

gateway/gateway-controller/cmd/controller/main.go (3)

318-318: LGTM!

The controlplane.NewClient signature update correctly propagates the policyManager dependency to enable coordinated policy-driven updates, consistent with the multi-replica synchronization goals.


346-347: LGTM!

The API server constructor correctly accepts the EventHub and replica sync flag to enable multi-replica behavior. The eventHub parameter can safely be nil when replica sync is disabled, which is handled appropriately.


393-396: LGTM!

EventListener shutdown is properly guarded with a nil check and integrated into the graceful shutdown sequence.

gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (5)

73-96: LGTM!

RegisterOrganization correctly uses ON CONFLICT DO NOTHING to make re-registration idempotent, which supports restart scenarios.


158-180: LGTM!

Subscribe and Unsubscribe methods correctly delegate to the registry with appropriate logging.


182-213: LGTM!

Cleanup methods properly implement time-based event retention with appropriate error handling and logging.


215-240: Shutdown sequence is correct but lacks synchronization.

The Close method properly cancels contexts and waits for goroutines to finish. However, the initialized flag access (lines 218, 237) is not synchronized, which is already flagged in the earlier comment on the struct definition.


262-313: LGTM! Event delivery logic is correct.

The poll state update logic (lines 303-311) correctly handles event delivery:

  • Updates poll state only on successful delivery (line 304-305)
  • Skips timestamp update on delivery failure to enable retry (lines 307-308)
  • Updates poll state when no events exist (lines 309-311)

This addresses the malformed conditional block and inverted logic flagged in past reviews.

Comment on lines +272 to +290
// Initialize and start EventListener if EventHub is available
var evtListener *eventlistener.EventListener
if eventHub != nil {
log.Info("Initializing EventListener")
eventSource := eventlistener.NewEventHubAdapter(eventHub, log)
evtListener = eventlistener.NewEventListener(
eventSource,
configStore,
db,
snapshotManager,
policyManager, // Can be nil if policy server is disabled
&cfg.GatewayController.Router,
log,
)
if err := evtListener.Start(context.Background()); err != nil {
log.Fatal("Failed to start EventListener", zap.Error(err))
}
log.Info("EventListener started successfully")
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

EventListener.Start blocks the main goroutine on database operations.

evtListener.Start() is called synchronously on Line 286, which can block indefinitely if the underlying EventHub performs database operations (such as RegisterOrganization). This prevents the HTTP server and other components from starting until initialization completes, potentially causing startup hangs if the database is slow or locked.

🔧 Proposed fix: Add timeout or use goroutine

Option 1: Add timeout context

-	if err := evtListener.Start(context.Background()); err != nil {
-		log.Fatal("Failed to start EventListener", zap.Error(err))
-	}
+	startCtx, startCancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer startCancel()
+	if err := evtListener.Start(startCtx); err != nil {
+		log.Fatal("Failed to start EventListener", zap.Error(err))
+	}

Option 2: Initialize in goroutine (if non-blocking startup is acceptable)

+	go func() {
+		if err := evtListener.Start(context.Background()); err != nil {
+			log.Error("Failed to start EventListener", zap.Error(err))
+		}
+	}()
-	if err := evtListener.Start(context.Background()); err != nil {
-		log.Fatal("Failed to start EventListener", zap.Error(err))
-	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Initialize and start EventListener if EventHub is available
var evtListener *eventlistener.EventListener
if eventHub != nil {
log.Info("Initializing EventListener")
eventSource := eventlistener.NewEventHubAdapter(eventHub, log)
evtListener = eventlistener.NewEventListener(
eventSource,
configStore,
db,
snapshotManager,
policyManager, // Can be nil if policy server is disabled
&cfg.GatewayController.Router,
log,
)
if err := evtListener.Start(context.Background()); err != nil {
log.Fatal("Failed to start EventListener", zap.Error(err))
}
log.Info("EventListener started successfully")
}
// Initialize and start EventListener if EventHub is available
var evtListener *eventlistener.EventListener
if eventHub != nil {
log.Info("Initializing EventListener")
eventSource := eventlistener.NewEventHubAdapter(eventHub, log)
evtListener = eventlistener.NewEventListener(
eventSource,
configStore,
db,
snapshotManager,
policyManager, // Can be nil if policy server is disabled
&cfg.GatewayController.Router,
log,
)
startCtx, startCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer startCancel()
if err := evtListener.Start(startCtx); err != nil {
log.Fatal("Failed to start EventListener", zap.Error(err))
}
log.Info("EventListener started successfully")
}
Suggested change
// Initialize and start EventListener if EventHub is available
var evtListener *eventlistener.EventListener
if eventHub != nil {
log.Info("Initializing EventListener")
eventSource := eventlistener.NewEventHubAdapter(eventHub, log)
evtListener = eventlistener.NewEventListener(
eventSource,
configStore,
db,
snapshotManager,
policyManager, // Can be nil if policy server is disabled
&cfg.GatewayController.Router,
log,
)
if err := evtListener.Start(context.Background()); err != nil {
log.Fatal("Failed to start EventListener", zap.Error(err))
}
log.Info("EventListener started successfully")
}
// Initialize and start EventListener if EventHub is available
var evtListener *eventlistener.EventListener
if eventHub != nil {
log.Info("Initializing EventListener")
eventSource := eventlistener.NewEventHubAdapter(eventHub, log)
evtListener = eventlistener.NewEventListener(
eventSource,
configStore,
db,
snapshotManager,
policyManager, // Can be nil if policy server is disabled
&cfg.GatewayController.Router,
log,
)
go func() {
if err := evtListener.Start(context.Background()); err != nil {
log.Error("Failed to start EventListener", zap.Error(err))
}
}()
log.Info("EventListener started successfully")
}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/cmd/controller/main.go around lines 272 - 290,
EventListener.Start is called synchronously and can block startup; change the
initialization to avoid blocking by either (A) creating a cancellable timeout
context and passing that to evtListener.Start to bound startup DB ops, or (B)
start evtListener in a background goroutine so it does not block HTTP/server
startup (ensure Start errors are logged and propagated via a channel or health
check). Locate the code that constructs evtListener
(eventlistener.NewEventListener) and the call
evtListener.Start(context.Background()), replace context.Background() with a
context with timeout when using option A, or call Start(...) inside a goroutine
and signal failures through a dedicated error channel/health probe when using
option B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to stop rest api server starting until the event listener is started properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, understood! The current implementation already ensures that the REST API server won't start until the EventListener is successfully initialized. The evtListener.Start() call on line 286 is synchronous and will call log.Fatal() if it fails, which prevents any subsequent initialization including the REST API server.

The blocking behavior you want is already in place:

if err := evtListener.Start(context.Background()); err != nil {
    log.Fatal("Failed to start EventListener", zap.Error(err))  // ← Stops everything
}
log.Info("EventListener started successfully")  // ← Only reached on success

The REST API server setup happens much later in main(), so it will only start after EventListener has successfully completed its initialization.

Optional improvement: To prevent indefinite hangs during startup (e.g., if database operations are slow), you could add a timeout while maintaining the blocking behavior:

startCtx, startCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer startCancel()
if err := evtListener.Start(startCtx); err != nil {
    log.Fatal("Failed to start EventListener", zap.Error(err))
}

This keeps your desired startup ordering while adding safety against deadlocks.


✏️ Learnings added
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/cmd/controller/main.go:272-290
Timestamp: 2026-01-09T08:31:32.669Z
Learning: In gateway/gateway-controller/cmd/controller/main.go, the EventListener.Start() call must be synchronous (blocking) to ensure the EventListener is fully initialized before the REST API server starts. This blocking behavior is intentional to maintain proper startup ordering in multi-replica mode.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Comment on lines 45 to 71
// Initialize sets up the SQLite backend and starts background workers
func (b *SQLiteBackend) Initialize(ctx context.Context) error {

if b.initialized {
return nil
}

b.logger.Info("Initializing SQLite backend")

// Start poller
b.pollerCtx, b.pollerCancel = context.WithCancel(ctx)
b.wg.Add(1)
go b.pollLoop()

// Start cleanup goroutine
b.cleanupCtx, b.cleanupCancel = context.WithCancel(ctx)
b.wg.Add(1)
go b.cleanupLoop()

b.initialized = true
b.logger.Info("SQLite backend initialized",
zap.Duration("pollInterval", b.config.PollInterval),
zap.Duration("cleanupInterval", b.config.CleanupInterval),
zap.Duration("retentionPeriod", b.config.RetentionPeriod),
)
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Background workers use caller's context, risking unexpected cancellation.

Lines 55 and 60 derive pollerCtx and cleanupCtx from the caller-provided ctx. If the caller cancels this context before explicit shutdown, the background polling and cleanup loops will stop unexpectedly. Background workers should use context.Background() and rely on Close() for graceful shutdown.

♻️ Proposed fix: Use independent contexts
-	b.pollerCtx, b.pollerCancel = context.WithCancel(ctx)
+	b.pollerCtx, b.pollerCancel = context.WithCancel(context.Background())
 	b.wg.Add(1)
 	go b.pollLoop()

-	b.cleanupCtx, b.cleanupCancel = context.WithCancel(ctx)
+	b.cleanupCtx, b.cleanupCancel = context.WithCancel(context.Background())
 	b.wg.Add(1)
 	go b.cleanupLoop()

This ensures background workers run independently and are only stopped via Close().

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go around lines 45 -
71, The Initialize method currently derives pollerCtx and cleanupCtx from the
caller-provided ctx causing background workers to be cancelled when the caller
cancels; change Initialize to create independent cancellable contexts using
context.WithCancel(context.Background()) for pollerCtx/pollerCancel and
cleanupCtx/cleanupCancel so pollLoop and cleanupLoop run independently and are
only stopped by Close(), and ensure Close() calls those cancel functions
(pollerCancel and cleanupCancel) to perform graceful shutdown.

Comment on lines 98 to 156
// Publish publishes an event for an organization
func (b *SQLiteBackend) Publish(ctx context.Context, orgID string,
eventType EventType, action, entityID, correlationID string, eventData []byte) error {

// Verify organization is registered
_, err := b.registry.get(orgID)
if err != nil {
return err
}

// Publish atomically (event + state update in transaction)
tx, err := b.db.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("failed to begin transaction: %w", err)
}
defer tx.Rollback()

now := time.Now()

// Insert event
insertQuery := `
INSERT INTO events (organization_id, processed_timestamp, originated_timestamp,
event_type, action, entity_id, correlation_id, event_data)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`
_, err = tx.ExecContext(ctx, insertQuery,
string(orgID), now, now, string(eventType), action, entityID, correlationID, eventData)
if err != nil {
return fmt.Errorf("failed to record event: %w", err)
}

// Update organization state version
newVersion := uuid.New().String()
updateQuery := `
INSERT INTO organization_states (organization, version_id, updated_at)
VALUES (?, ?, ?)
ON CONFLICT(organization)
DO UPDATE SET version_id = excluded.version_id, updated_at = excluded.updated_at
`
_, err = tx.ExecContext(ctx, updateQuery, string(orgID), newVersion, now)
if err != nil {
return fmt.Errorf("failed to update state: %w", err)
}

if err := tx.Commit(); err != nil {
return fmt.Errorf("failed to commit transaction: %w", err)
}

b.logger.Debug("Event published",
zap.String("organization", string(orgID)),
zap.String("eventType", string(eventType)),
zap.String("action", action),
zap.String("entityID", entityID),
zap.String("correlationID", correlationID),
zap.String("version", newVersion),
)

return nil
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find and explore the sqlite_backend.go file
find . -name "sqlite_backend.go" -type f

Repository: wso2/api-platform

Length of output: 120


🏁 Script executed:

# Let's examine the sqlite_backend.go file, particularly around the Publish method and schema
cat -n ./gateway/gateway-controller/pkg/eventhub/sqlite_backend.go | head -200

Repository: wso2/api-platform

Length of output: 7014


🏁 Script executed:

# Let's continue viewing the file to see more context
wc -l ./gateway/gateway-controller/pkg/eventhub/sqlite_backend.go

Repository: wso2/api-platform

Length of output: 124


🏁 Script executed:

# Let's look for schema definition - likely in a schema initialization or in another file
cat -n ./gateway/gateway-controller/pkg/eventhub/sqlite_backend.go | tail -270

Repository: wso2/api-platform

Length of output: 9105


🏁 Script executed:

# Search for schema definition files - likely in sql, schema, or migration directories
find . -type f \( -name "*.sql" -o -name "*schema*" -o -name "*migration*" \) | grep -i -E "(eventhub|event)" | head -20

Repository: wso2/api-platform

Length of output: 43


🏁 Script executed:

# Search for all SQL files or schema initialization in the eventhub package directory
find ./gateway/gateway-controller/pkg/eventhub -type f

Repository: wso2/api-platform

Length of output: 388


🏁 Script executed:

# Let's look at backend.go and types.go for schema definition
cat -n ./gateway/gateway-controller/pkg/eventhub/backend.go

Repository: wso2/api-platform

Length of output: 4107


🏁 Script executed:

# Let's check types.go for Event structure and schema definition
cat -n ./gateway/gateway-controller/pkg/eventhub/types.go

Repository: wso2/api-platform

Length of output: 3422


🏁 Script executed:

# Search for CREATE TABLE statements in the codebase
rg -n "CREATE TABLE" --type=go

Repository: wso2/api-platform

Length of output: 957


🏁 Script executed:

# Let's examine the CREATE TABLE statements for events and organization_states
sed -n '222,260p' ./gateway/gateway-controller/pkg/storage/sqlite.go

Repository: wso2/api-platform

Length of output: 1594


🏁 Script executed:

# Find all calls to Publish method in the codebase
rg -n '\.Publish\s*\(' --type=go -A 5 -B 2

Repository: wso2/api-platform

Length of output: 2385


🏁 Script executed:

# Find all calls to PublishEvent in the gateway-controller code
rg -n '\.PublishEvent\s*\(' --type=go -A 8 -B 2 gateway/gateway-controller/

Repository: wso2/api-platform

Length of output: 9756


🏁 Script executed:

# Let's examine the api_deployment.go file to understand where CorrelationID comes from
sed -n '280,310p' ./gateway/gateway-controller/pkg/utils/api_deployment.go

Repository: wso2/api-platform

Length of output: 1290


🏁 Script executed:

# Let's find where params.CorrelationID is set/generated
rg -n 'CorrelationID' --type=go gateway/gateway-controller/ -B 3 -A 1 | head -60

Repository: wso2/api-platform

Length of output: 5422


🏁 Script executed:

# Let's look at the client code to understand where CorrelationID comes from in the event
sed -n '40,60p' ./gateway/gateway-controller/pkg/controlplane/events.go

Repository: wso2/api-platform

Length of output: 291


🏁 Script executed:

# Let's look at the controlplane client to see where the event comes from
sed -n '550,610p' ./gateway/gateway-controller/pkg/controlplane/client.go

Repository: wso2/api-platform

Length of output: 1743


correlation_id uniqueness constraint violation risk—PRIMARY KEY on external identifier.

The events table schema uses correlation_id as PRIMARY KEY (see gateway/gateway-controller/pkg/storage/sqlite.go line 245), yet the Publish method receives it as a parameter from external sources (control plane events via params.CorrelationID). Without a guarantee that control plane sends unique correlation IDs, duplicate inserts will fail with PRIMARY KEY constraint violations. This is evidenced by the test code at gateway/gateway-controller/pkg/eventhub/eventhub_test.go (lines 92, 125, 163, 202, 244) which reuses the same "test-correlation-id"—these tests would fail if they attempted concurrent or sequential invocations against a persistent database.

Either:

  1. Ensure the control plane service guarantees uniqueness of correlation IDs (e.g., via UUID generation on every event), or
  2. Modify the schema to use a surrogate primary key and add a UNIQUE constraint or index on correlation_id with appropriate conflict resolution if duplicates are valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume correlationID we expect to be a unique all the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, understood. I've noted that correlationID is expected to be unique at all times by design. This makes the PRIMARY KEY constraint on correlation_id appropriate.

To ensure this requirement is maintained, you may want to consider:

  1. Documenting this uniqueness requirement in the EventHub interface or PublishEvent method comments
  2. Ensuring that all callers (control plane client, API handlers, etc.) generate unique correlation IDs (e.g., using UUIDs)

This will help future contributors understand and maintain the invariant.


✏️ Learnings added
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:98-156
Timestamp: 2026-01-09T08:39:19.158Z
Learning: In gateway/gateway-controller/pkg/eventhub/sqlite_backend.go, the events table uses correlation_id as PRIMARY KEY. The correlationID is expected to be unique at all times by design across all event sources (control plane, API handlers, etc.).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Comment on lines +217 to +259
if version == 5 {
// Add EventHub tables for multi-replica synchronization
s.logger.Info("Migrating to EventHub schema (version 6)")

// Create organization_states table
if _, err := s.db.Exec(`CREATE TABLE organization_states (
organization TEXT PRIMARY KEY,
version_id TEXT NOT NULL DEFAULT '',
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);`); err != nil {
return fmt.Errorf("failed to create organization_states table: %w", err)
}

if _, err := s.db.Exec(`CREATE INDEX IF NOT EXISTS idx_organization_states_updated ON organization_states(updated_at);`); err != nil {
return fmt.Errorf("failed to create organization_states index: %w", err)
}

// Create unified events table
if _, err := s.db.Exec(`CREATE TABLE events (
organization_id TEXT NOT NULL,
processed_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
originated_timestamp TIMESTAMP NOT NULL,
event_type TEXT NOT NULL,
action TEXT NOT NULL CHECK(action IN ('CREATE', 'UPDATE', 'DELETE')),
entity_id TEXT NOT NULL,
correlation_id TEXT NOT NULL,
event_data TEXT NOT NULL,
PRIMARY KEY (correlation_id)
);`); err != nil {
return fmt.Errorf("failed to create events table: %w", err)
}

if _, err := s.db.Exec(`CREATE INDEX IF NOT EXISTS idx_events_org_time ON events(organization_id, processed_timestamp);`); err != nil {
return fmt.Errorf("failed to create events organization-time index: %w", err)
}

if _, err := s.db.Exec("PRAGMA user_version = 6"); err != nil {
return fmt.Errorf("failed to set schema version to 6: %w", err)
}

s.logger.Info("Schema migrated to version 6 (EventHub tables)")
version = 6
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for INSERT statements into events table
rg -n "INSERT INTO events" --type=go

# Also search for correlation_id population/generation
rg -n "correlation_id" --type=go -A 3 -B 3

# Search for event insertion in the storage package
fd . gateway/gateway-controller/pkg/storage/ --type=f -name "*.go" | head -20

Repository: wso2/api-platform

Length of output: 50621


🏁 Script executed:

# Find the main storage implementation file
fd sqlite.go gateway/gateway-controller/pkg/storage/ --type=f

Repository: wso2/api-platform

Length of output: 109


🏁 Script executed:

# Search for event-related functions/methods in storage package
rg -n "func.*[Ee]vent" gateway/gateway-controller/pkg/storage/ --type=go -A 5

Repository: wso2/api-platform

Length of output: 43


🏁 Script executed:

# Look for correlation_id generation logic across the entire codebase
rg -n "correlation.?id|CorrelationID|correlation_id" --type=go | grep -i "uuid\|generate\|new\|create" | head -20

Repository: wso2/api-platform

Length of output: 1742


🏁 Script executed:

# Look at the eventhub sqlite_backend.go file around the INSERT statement
sed -n '110,130p' gateway/gateway-controller/pkg/eventhub/sqlite_backend.go

Repository: wso2/api-platform

Length of output: 698


🏁 Script executed:

# Find the PublishEvent function signature and implementation
rg -A 15 "func.*PublishEvent" gateway/gateway-controller/pkg/eventhub/ --type=go

Repository: wso2/api-platform

Length of output: 1669


🏁 Script executed:

# Check the correlation middleware for UUID/uniqueness handling
cat gateway/gateway-controller/pkg/api/middleware/correlation.go

Repository: wso2/api-platform

Length of output: 2925


🏁 Script executed:

# Look at the Event struct definition
rg -B 5 -A 15 "type.*Event.*struct" gateway/gateway-controller/pkg/eventhub/ --type=go | head -40

Repository: wso2/api-platform

Length of output: 2029


🏁 Script executed:

# Look at full Publish function in sqlite_backend.go to see if there's validation before INSERT
rg -B 10 -A 30 "func.*Publish\(" gateway/gateway-controller/pkg/eventhub/sqlite_backend.go --type=go

Repository: wso2/api-platform

Length of output: 1269


🏁 Script executed:

# Check if there's any unique constraint validation in eventhub package
rg -n "UNIQUE\|duplicate\|already exists\|conflict" gateway/gateway-controller/pkg/eventhub/ --type=go

Repository: wso2/api-platform

Length of output: 43


🏁 Script executed:

# Check how correlation_id is generated/passed in actual business logic, not just middleware
rg -B 5 "PublishEvent.*correlationID\|correlation_id" gateway/gateway-controller/pkg/utils/api_deployment.go --type=go | head -40

Repository: wso2/api-platform

Length of output: 43


Correlation_id must have application-level uniqueness validation before database insertion.

The migration correctly addresses the timestamp collision risk by changing PRIMARY KEY from (organization_id, processed_timestamp) to correlation_id. However, the Publish() function in sqlite_backend.go has no validation or deduplication before inserting events. If duplicate correlation_ids are submitted (by client header reuse or UUID collision), the PRIMARY KEY constraint will fail at the database level, causing PublishEvent to error. Add application-level uniqueness enforcement in the Publish function before the INSERT to prevent runtime failures.

Copy link
Contributor Author

@VirajSalaka VirajSalaka Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not supposed to duplicate the correlationID. Hence we can leave this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VirajSalaka, understood. Since the correlation_id is guaranteed to be unique by design at the generation point, the database PRIMARY KEY constraint serves as an appropriate safeguard without requiring additional application-level checks before insertion. Thank you for the clarification.


🧠 Learnings used
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
gateway/gateway-controller/pkg/utils/api_deployment.go (2)

252-317: Replica-sync publish path should use bounded context + fail fast if event publication is required for correctness.
In replica-sync mode you:

  • use context.Background() (no timeout),
  • log publish failures but still return success,
  • skip local snapshot/policy updates.

If the system relies on EventListener for convergence, a failed publish can leave the cluster stale while callers think deployment succeeded. Consider:

  • ctx, cancel := context.WithTimeout(context.Background(), ...)
  • returning an error (or at least surfacing a retryable error) when publish fails.

Also: in replica-sync mode, hard-coded organizationID := "default" looks like a placeholder; if org-scoping matters later, this will be a migration footgun.


549-594: Critical: replica-sync + db==nil yields “success but nothing stored”.
When enableReplicaSync is true, you skip s.store.Add(...). If s.db is also nil, the config is persisted nowhere, but DeployAPIConfiguration will still return success.

Add a guard in saveOrUpdateConfig (or earlier) to reject this configuration.

🤖 Fix all issues with AI agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:
- Around line 377-406: deliverEvents currently returns immediately when any
subscriber channel is full, causing partial delivery and duplicate retries;
change deliverEvents (and its use of org.getSubscribers) to attempt delivery to
all subscribers: for each subscriber channel, try a non-blocking send, log
success or failure, and record which subscriber IDs failed; after looping,
update the poll state only once (after all sends) and return nil if all
succeeded or a composed error indicating which subscribers failed so callers can
decide retry semantics; alternatively, if only one subscriber is supported,
enforce that in Subscribe (reject multiple subscribers) and simplify
deliverEvents to assume a single channel.
- Around line 57-64: The Initialize(ctx) function currently derives the worker
contexts from the caller-supplied ctx which can cause background workers
(pollLoop, cleanupLoop) to exit prematurely; change the worker context creation
to use context.Background() instead of the passed ctx (i.e., set b.pollerCtx,
b.pollerCancel = context.WithCancel(context.Background()) and b.cleanupCtx,
b.cleanupCancel = context.WithCancel(context.Background())) so worker lifecycle
is controlled solely by Close() (which must call pollerCancel/cleanupCancel and
wait on b.wg), or alternatively remove the ctx parameter from Initialize and
document that Close() manages lifecycle; update Initialize, the fields
pollerCtx/pollerCancel/cleanupCtx/cleanupCancel, and Close accordingly.

In @gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go:
- Around line 60-96: The Subscribe method is racy and contradicts your
idempotent comment; replace the Load()/Store() sequence with a single
a.activeSubscriptions.LoadOrStore call so concurrent Subscribe callers don't
duplicate subscriptions. Only perform RegisterOrganization, create bridgeChan,
call a.eventHub.Subscribe, start the bridgeEvents goroutine, and log the
subscription when LoadOrStore indicates we actually stored the new subscription;
if LoadOrStore returns an existing value just return nil (idempotent). On
failure after storing (e.g., a.eventHub.Subscribe error), roll back by deleting
the key from a.activeSubscriptions and closing bridgeChan so state remains
consistent.
- Around line 174-186: Close currently only clears activeSubscriptions and
closes the underlying eventHub but never cancels the subscription contexts or
waits for bridge goroutines to finish; modify EventHubAdapter.Close to (1)
signal cancellation for each active subscription's context (the bridges listen
on ctx.Done() in bridgeEvents) instead of just calling
activeSubscriptions.Clear(), (2) track bridge goroutines with a sync.WaitGroup
incremented where bridge goroutines are started and decremented on exit, and (3)
wait for that WaitGroup to complete (ensuring in-flight events are delivered)
before calling a.eventHub.Close() and returning; update any types holding
subscription state to store the cancel functions and ensure Close invokes them
and then waits for the WaitGroup.

In @gateway/gateway-controller/pkg/utils/api_deployment.go:
- Around line 132-168: NewAPIDeploymentService must enforce that enabling
replica-sync has required dependencies; change NewAPIDeploymentService to
validate that if enableReplicaSync == true then db and eventHub (and optionally
snapshotManager and store) are non-nil, return an error if any required
dependency is nil (i.e. change signature to return (*APIDeploymentService,
error)), update callers to handle the error, and keep construction otherwise
identical; this prevents creating a service that claims replica-sync but will be
a silent no-op.
- Around line 889-916: The UpdateAPIConfiguration flow currently calls
s.updatePolicyConfiguration(...) unconditionally; gate that call behind the
replica-sync flag so it only runs when replica sync is disabled: check
!s.enableReplicaSync (same logic used in Deploy's non-replica-sync path) and
only invoke s.updatePolicyConfiguration(storedCfg, logger) in that branch,
skipping policy updates when enableReplicaSync is true until multi-replica
semantics are implemented.
- Around line 324-418: UpdateAPIConfiguration currently updates the in-memory
store and triggers xDS/policy updates directly, which breaks replica-sync;
change it so that after persisting with s.db.UpdateConfig (and only if
s.enableReplicaSync is true) it does NOT call s.store.Update,
s.triggerXDSSnapshotUpdate, or s.updatePolicyConfiguration directly — instead
publish the config-updated event using the same event path used by
saveOrUpdateConfig/EventListener so replicas reconcile via the listener; when
enableReplicaSync is false, keep the existing immediate s.store.Update,
s.triggerXDSSnapshotUpdate, and s.updatePolicyConfiguration behavior.
🧹 Nitpick comments (6)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (2)

75-98: Consider removing redundant string conversions.

The orgID parameter is already of type string, so conversions at lines 89 and 95 (string(orgID)) are unnecessary. While harmless, removing them improves clarity.

♻️ Proposed cleanup
-	_, err := b.db.ExecContext(ctx, query, string(orgID), "", time.Now())
+	_, err := b.db.ExecContext(ctx, query, orgID, "", time.Now())
 	if err != nil {
 		return fmt.Errorf("failed to initialize organization state: %w", err)
 	}
 
 	b.logger.Info("Organization registered in SQLite backend",
-		zap.String("organization", string(orgID)),
+		zap.String("organization", orgID),
 	)

346-375: Address the pagination TODO for production readiness.

The lack of pagination (line 346) could cause performance degradation or memory pressure when many events accumulate between poll cycles. Consider implementing cursor-based pagination or a configurable batch size limit.

Do you want me to generate an implementation or open an issue to track this improvement?

gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go (1)

50-56: Guard against nil logger (avoid panics in error paths).
NewEventHubAdapter accepts logger *zap.Logger but never defends against nil; several paths call a.logger.*. Consider defaulting to zap.NewNop().

gateway/gateway-controller/pkg/utils/api_deployment.go (3)

58-129: Good direction: typed errors + explicit update params; consider preserving root cause in ParseError.
ParseError{Message: "Failed to parse configuration"} drops err (harder to debug). Suggest storing/wrapping the underlying error (even if you don’t expose it to HTTP).


420-504: Bound topic lifecycle concurrency + avoid hard-coded hub host/port.
Spawning a goroutine per topic (go func(topic string) { ... }) is unbounded and can overwhelm the controller / WebSubHub for large topic sets. Consider errgroup.Group with SetLimit(N) (or a semaphore worker pool).

Also, "localhost", 8083 should come from config (or routerConfig) rather than being embedded here.


918-931: Add nil-guard for snapshotManager (avoid panic in tests/modes).
triggerXDSSnapshotUpdate dereferences s.snapshotManager without checking. If any wiring path allows it to be nil, this will panic asynchronously (hard to trace).

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4c6a1c and 1f75156.

📒 Files selected for processing (3)
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.
📚 Learning: 2026-01-09T08:39:25.435Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:98-156
Timestamp: 2026-01-09T08:39:25.435Z
Learning: In gateway/gateway-controller/pkg/eventhub/sqlite_backend.go, the events table uses correlation_id as PRIMARY KEY. The correlationID is expected to be unique at all times by design across all event sources (control plane, API handlers, etc.).

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2026-01-09T08:19:21.846Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/storage/interface.go:233-234
Timestamp: 2026-01-09T08:19:21.846Z
Learning: In gateway/gateway-controller/pkg/storage/interface.go, the Storage interface design philosophy of being "database-agnostic" means it should support any SQL database (SQLite, PostgreSQL, MySQL, etc.), not non-SQL storage backends. Therefore, exposing *sql.DB through GetDB() is appropriate and maintains the intended abstraction.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2026-01-07T17:41:01.083Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2025-12-18T08:33:51.340Z
Learnt from: O-sura
Repo: wso2/api-platform PR: 485
File: gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go:21-90
Timestamp: 2025-12-18T08:33:51.340Z
Learning: In gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go, the JSON tag casing differences between DefaultFaultEvent (correlationID, applicationID, organizationID, environmentID, regionID with uppercase 'ID') and DefaultResponseEvent (correlationId, applicationId, organizationId, environmentId, regionId with lowercase 'Id') are intentional to maintain consistency with respective event type conventions and config file property standards.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2025-12-06T11:57:08.456Z
Learnt from: BLasan
Repo: wso2/api-platform PR: 140
File: gateway/gateway-controller/pkg/utils/api_deployment.go:88-105
Timestamp: 2025-12-06T11:57:08.456Z
Learning: In the gateway-controller Go codebase (file: gateway/gateway-controller/pkg/utils/api_deployment.go), only the async/websub API kind is currently supported. async/websocket and async/sse API kinds are not yet implemented.

Applied to files:

  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2025-11-08T13:06:22.133Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 110
File: platform-api/src/internal/service/api.go:432-476
Timestamp: 2025-11-08T13:06:22.133Z
Learning: In the platform-api Go codebase (file: platform-api/src/internal/service/api.go), the DeployAPIRevision method is designed to automatically create API-gateway associations during deployment if they don't already exist. There is no requirement that associations must pre-exist before deployment; the system checks for existing associations and creates them on-the-fly as needed.

Applied to files:

  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2026-01-09T05:58:05.318Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 644
File: gateway/gateway-controller/pkg/storage/memory.go:484-503
Timestamp: 2026-01-09T05:58:05.318Z
Learning: In gateway/gateway-controller/pkg/storage/memory.go, during API key rotation via the StoreAPIKey method, the API key ID is never modified. When an existing API key is found by name (existingKeyID != ""), the incoming apiKey.ID is guaranteed to be the same as existingKeyID, so no ID mismatch can occur during updates.

Applied to files:

  • gateway/gateway-controller/pkg/utils/api_deployment.go
📚 Learning: 2025-12-23T10:41:47.584Z
Learnt from: thivindu
Repo: wso2/api-platform PR: 389
File: gateway/gateway-controller/pkg/utils/api_key.go:257-380
Timestamp: 2025-12-23T10:41:47.584Z
Learning: In gateway/gateway-controller/pkg/utils/api_key.go, the RevokeAPIKey method intentionally proceeds with database deletion and xDS policy engine revocation even when the API key is not found in the memory store (apiKey is nil). This is by design to handle partial failure recovery: if a previous revocation attempt removed the key from memory but failed to clean up the database or policy engine, subsequent retries will still perform the cleanup operations to ensure eventual consistency.

Applied to files:

  • gateway/gateway-controller/pkg/utils/api_deployment.go
🧬 Code graph analysis (3)
gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go (2)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventHub (38-61)
  • Event (19-28)
  • EventType (9-9)
gateway/gateway-controller/pkg/eventlistener/event_source.go (2)
  • EventSource (60-101)
  • Event (29-50)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (3)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • SQLiteBackendConfig (76-83)
  • DefaultSQLiteBackendConfig (86-92)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventType (9-9)
  • Event (19-28)
  • OrganizationState (31-35)
gateway/gateway-controller/pkg/utils/api_deployment.go (8)
gateway/gateway-controller/pkg/config/validator.go (2)
  • ValidationError (24-27)
  • Validator (32-34)
gateway/gateway-controller/pkg/storage/interface.go (1)
  • Storage (66-235)
gateway/gateway-controller/pkg/policyxds/manager.go (1)
  • PolicyManager (33-37)
gateway/gateway-controller/pkg/config/parser.go (2)
  • Parser (30-30)
  • NewParser (33-35)
gateway/gateway-controller/pkg/eventhub/types.go (2)
  • EventHub (38-61)
  • EventTypeAPI (13-13)
gateway/gateway-controller/pkg/models/stored_config.go (2)
  • StoredConfig (38-48)
  • StatusPending (32-32)
sdk/gateway/policyengine/v1/config.go (2)
  • Configuration (41-47)
  • PolicyInstance (18-37)
gateway/gateway-controller/pkg/xds/translator.go (1)
  • GenerateRouteName (114-117)
🔇 Additional comments (10)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (9)

14-43: LGTM: Clean backend structure and constructor.

The struct design and constructor follow good practices with proper default configuration handling and dependency injection.


100-158: LGTM: Proper transactional event publishing.

The atomic transaction ensures event insertion and state update consistency. The defer tx.Rollback() pattern is correct (Rollback is a no-op after Commit).

Note: Lines 126 and 139 have the same redundant string(orgID) conversions mentioned in the previous comment.


160-182: LGTM: Clean subscription management.

Proper delegation to the registry with appropriate logging.


184-215: LGTM: Cleanup implementation is sound.

The error from RowsAffected() (lines 192, 208) is safely ignored since it's only used for logging, and the main operation error is properly checked.


217-244: LGTM: Graceful shutdown implementation.

Proper context cancellation and wait group synchronization ensure clean shutdown of background workers.


246-264: LGTM: Standard polling loop pattern.

Clean ticker-based implementation with proper context handling.


266-317: LGTM: Correct change detection and retry logic.

The conditional poll state update (lines 308-314) correctly implements retry semantics: failed deliveries preserve the timestamp to ensure events are redelivered on the next poll cycle.


319-342: LGTM: Standard query pattern.

Proper row iteration and error handling throughout.


408-426: LGTM: Clean periodic cleanup implementation.

Proper ticker-based cleanup with context cancellation support.

gateway/gateway-controller/pkg/utils/api_deployment.go (1)

720-870: The concern about zero-value PolicyInstance due to map mismatch is unfounded. Both cfg.GetPolicies() and apiData.Policies are populated from identical deserialization of cfg.Configuration.Spec.AsAPIConfigData(), ensuring they reference the same policy collection. A mismatch cannot occur in practice.

While alternative implementations (e.g., building the map from apiData.Policies directly or avoiding the map pattern entirely) might be stylistically cleaner, the current approach is functionally correct and safe.

Comment on lines +57 to +64
b.pollerCtx, b.pollerCancel = context.WithCancel(ctx)
b.wg.Add(1)
go b.pollLoop()

// Start cleanup goroutine
b.cleanupCtx, b.cleanupCancel = context.WithCancel(ctx)
b.wg.Add(1)
go b.cleanupLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -nP -C5 '\bInitialize\s*\(' --type=go

Repository: wso2/api-platform

Length of output: 13308


🏁 Script executed:

cat -n gateway/gateway-controller/pkg/eventhub/sqlite_backend.go | head -100

Repository: wso2/api-platform

Length of output: 3181


🏁 Script executed:

rg -n 'func.*Close\(' gateway/gateway-controller/pkg/eventhub/sqlite_backend.go -A 15

Repository: wso2/api-platform

Length of output: 374


🏁 Script executed:

rg -n 'func.*pollLoop\|func.*cleanupLoop' gateway/gateway-controller/pkg/eventhub/sqlite_backend.go -A 20

Repository: wso2/api-platform

Length of output: 43


🏁 Script executed:

rg -n 'pollLoop|cleanupLoop' gateway/gateway-controller/pkg/eventhub/sqlite_backend.go -A 15

Repository: wso2/api-platform

Length of output: 1796


🏁 Script executed:

cat -n gateway/gateway-controller/pkg/eventhub/sqlite_backend.go | sed -n '150,220p'

Repository: wso2/api-platform

Length of output: 2612


Context parameter in Initialize() should be context.Background() or documented as initialization-only.

The background worker contexts (pollerCtx, cleanupCtx) are derived from the input ctx parameter using WithCancel(). If a caller passes a cancellable context that gets cancelled before Close() is called, the workers will terminate prematurely. In current usage, context.Background() is always passed (in main.go and all tests), so this doesn't manifest as a practical issue. However, the API signature suggests the context controls the operation and should be clarified: either document that a non-cancellable context is required, or refactor to not expose this parameter since the workers' lifecycle should be managed exclusively through Close().

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go around lines 57 -
64, The Initialize(ctx) function currently derives the worker contexts from the
caller-supplied ctx which can cause background workers (pollLoop, cleanupLoop)
to exit prematurely; change the worker context creation to use
context.Background() instead of the passed ctx (i.e., set b.pollerCtx,
b.pollerCancel = context.WithCancel(context.Background()) and b.cleanupCtx,
b.cleanupCancel = context.WithCancel(context.Background())) so worker lifecycle
is controlled solely by Close() (which must call pollerCancel/cleanupCancel and
wait on b.wg), or alternatively remove the ctx parameter from Initialize and
document that Close() manages lifecycle; update Initialize, the fields
pollerCtx/pollerCancel/cleanupCtx/cleanupCancel, and Close accordingly.

Comment on lines +377 to +406
// deliverEvents sends events to all subscribers of an organization
func (b *SQLiteBackend) deliverEvents(org *organization, events []Event) error {
subscribers := org.getSubscribers()

if len(subscribers) == 0 {
b.logger.Debug("No subscribers for organization",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
return nil
}

// TODO: (VirajSalaka) One subscriber is considered here. Handle multiple subscribers properly.
for _, ch := range subscribers {
select {
case ch <- events:
b.logger.Debug("Delivered events to subscriber",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
default:
b.logger.Error("Subscriber channel full, dropping events",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
return fmt.Errorf("subscriber channel full")
}
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix multi-subscriber delivery logic to prevent partial delivery and duplicate events.

The current implementation has a critical flaw (lines 390-403): when one subscriber's channel is full, the method returns an error immediately, which:

  1. Prevents delivery to remaining subscribers in the loop, causing partial event delivery
  2. Blocks poll state update (line 308), forcing ALL subscribers—including those who already received the events—to receive duplicates on the next poll cycle

This violates consistency guarantees across replicas.

Recommended fix options:

  1. Track per-subscriber delivery state: Record which subscribers successfully received events and only retry failed deliveries
  2. Continue delivery on failure: Log the full channel but continue delivering to remaining subscribers, then return a partial failure indicator
  3. Architectural constraint: If truly only one subscriber is supported (per TODO comment), enforce this with a check and clear error message at Subscribe time
🔧 Option 2 implementation (continue on failure)
 func (b *SQLiteBackend) deliverEvents(org *organization, events []Event) error {
 	subscribers := org.getSubscribers()
 
 	if len(subscribers) == 0 {
 		b.logger.Debug("No subscribers for organization",
 			zap.String("organization", string(org.id)),
 			zap.Int("events", len(events)),
 		)
 		return nil
 	}
 
-	// TODO: (VirajSalaka) One subscriber is considered here. Handle multiple subscribers properly.
+	var deliveryErrors []error
 	for _, ch := range subscribers {
 		select {
 		case ch <- events:
 			b.logger.Debug("Delivered events to subscriber",
 				zap.String("organization", string(org.id)),
 				zap.Int("events", len(events)),
 			)
 		default:
 			b.logger.Error("Subscriber channel full, dropping events",
 				zap.String("organization", string(org.id)),
 				zap.Int("events", len(events)),
 			)
-			return fmt.Errorf("subscriber channel full")
+			deliveryErrors = append(deliveryErrors, fmt.Errorf("subscriber channel full"))
 		}
 	}
+	if len(deliveryErrors) > 0 {
+		return fmt.Errorf("failed to deliver to %d subscriber(s)", len(deliveryErrors))
+	}
 	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// deliverEvents sends events to all subscribers of an organization
func (b *SQLiteBackend) deliverEvents(org *organization, events []Event) error {
subscribers := org.getSubscribers()
if len(subscribers) == 0 {
b.logger.Debug("No subscribers for organization",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
return nil
}
// TODO: (VirajSalaka) One subscriber is considered here. Handle multiple subscribers properly.
for _, ch := range subscribers {
select {
case ch <- events:
b.logger.Debug("Delivered events to subscriber",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
default:
b.logger.Error("Subscriber channel full, dropping events",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
return fmt.Errorf("subscriber channel full")
}
}
return nil
}
// deliverEvents sends events to all subscribers of an organization
func (b *SQLiteBackend) deliverEvents(org *organization, events []Event) error {
subscribers := org.getSubscribers()
if len(subscribers) == 0 {
b.logger.Debug("No subscribers for organization",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
return nil
}
var deliveryErrors []error
for _, ch := range subscribers {
select {
case ch <- events:
b.logger.Debug("Delivered events to subscriber",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
default:
b.logger.Error("Subscriber channel full, dropping events",
zap.String("organization", string(org.id)),
zap.Int("events", len(events)),
)
deliveryErrors = append(deliveryErrors, fmt.Errorf("subscriber channel full"))
}
}
if len(deliveryErrors) > 0 {
return fmt.Errorf("failed to deliver to %d subscriber(s)", len(deliveryErrors))
}
return nil
}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventhub/sqlite_backend.go around lines 377 -
406, deliverEvents currently returns immediately when any subscriber channel is
full, causing partial delivery and duplicate retries; change deliverEvents (and
its use of org.getSubscribers) to attempt delivery to all subscribers: for each
subscriber channel, try a non-blocking send, log success or failure, and record
which subscriber IDs failed; after looping, update the poll state only once
(after all sends) and return nil if all succeeded or a composed error indicating
which subscribers failed so callers can decide retry semantics; alternatively,
if only one subscriber is supported, enforce that in Subscribe (reject multiple
subscribers) and simplify deliverEvents to assume a single channel.

Comment on lines +60 to +96
func (a *EventHubAdapter) Subscribe(ctx context.Context, organizationID string, eventChan chan<- []Event) error {
// Check if already subscribed
if _, exists := a.activeSubscriptions.Load(organizationID); exists {
return fmt.Errorf("already subscribed to organization: %s", organizationID)
}

// Register organization with EventHub (idempotent operation)
if err := a.eventHub.RegisterOrganization(organizationID); err != nil {
a.logger.Debug("Organization may already be registered",
zap.String("organization", organizationID),
zap.Error(err),
)
// Continue - registration errors are usually not fatal
}

// Create a bridge channel that receives eventhub.Event and converts to generic Event
bridgeChan := make(chan []eventhub.Event, 10)

// Subscribe to EventHub
if err := a.eventHub.Subscribe(organizationID, bridgeChan); err != nil {
close(bridgeChan)
return fmt.Errorf("failed to subscribe to eventhub: %w", err)
}

// Track this subscription
a.activeSubscriptions.Store(organizationID, eventChan)

// Start goroutine to convert and forward events
go a.bridgeEvents(ctx, organizationID, bridgeChan, eventChan)

a.logger.Info("Subscribed to event source",
zap.String("organization", organizationID),
zap.String("source", "eventhub"),
)

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix duplicate-subscribe race + make behavior consistent with your own “idempotent” comment.
Load() then Store() is racy under concurrent callers; use LoadOrStore. Also: comment says “idempotent subscription” but you return an error if already subscribed.

Proposed direction (sketch)
- // Check if already subscribed
- if _, exists := a.activeSubscriptions.Load(organizationID); exists {
-   return fmt.Errorf("already subscribed to organization: %s", organizationID)
- }
+ if _, loaded := a.activeSubscriptions.LoadOrStore(organizationID, eventChan); loaded {
+   // Either treat as idempotent success, or return a typed error.
+   return nil
+ }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *EventHubAdapter) Subscribe(ctx context.Context, organizationID string, eventChan chan<- []Event) error {
// Check if already subscribed
if _, exists := a.activeSubscriptions.Load(organizationID); exists {
return fmt.Errorf("already subscribed to organization: %s", organizationID)
}
// Register organization with EventHub (idempotent operation)
if err := a.eventHub.RegisterOrganization(organizationID); err != nil {
a.logger.Debug("Organization may already be registered",
zap.String("organization", organizationID),
zap.Error(err),
)
// Continue - registration errors are usually not fatal
}
// Create a bridge channel that receives eventhub.Event and converts to generic Event
bridgeChan := make(chan []eventhub.Event, 10)
// Subscribe to EventHub
if err := a.eventHub.Subscribe(organizationID, bridgeChan); err != nil {
close(bridgeChan)
return fmt.Errorf("failed to subscribe to eventhub: %w", err)
}
// Track this subscription
a.activeSubscriptions.Store(organizationID, eventChan)
// Start goroutine to convert and forward events
go a.bridgeEvents(ctx, organizationID, bridgeChan, eventChan)
a.logger.Info("Subscribed to event source",
zap.String("organization", organizationID),
zap.String("source", "eventhub"),
)
return nil
}
func (a *EventHubAdapter) Subscribe(ctx context.Context, organizationID string, eventChan chan<- []Event) error {
if _, loaded := a.activeSubscriptions.LoadOrStore(organizationID, eventChan); loaded {
// Either treat as idempotent success, or return a typed error.
return nil
}
// Register organization with EventHub (idempotent operation)
if err := a.eventHub.RegisterOrganization(organizationID); err != nil {
a.logger.Debug("Organization may already be registered",
zap.String("organization", organizationID),
zap.Error(err),
)
// Continue - registration errors are usually not fatal
}
// Create a bridge channel that receives eventhub.Event and converts to generic Event
bridgeChan := make(chan []eventhub.Event, 10)
// Subscribe to EventHub
if err := a.eventHub.Subscribe(organizationID, bridgeChan); err != nil {
close(bridgeChan)
return fmt.Errorf("failed to subscribe to eventhub: %w", err)
}
// Track this subscription
a.activeSubscriptions.Store(organizationID, eventChan)
// Start goroutine to convert and forward events
go a.bridgeEvents(ctx, organizationID, bridgeChan, eventChan)
a.logger.Info("Subscribed to event source",
zap.String("organization", organizationID),
zap.String("source", "eventhub"),
)
return nil
}
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go around
lines 60 - 96, The Subscribe method is racy and contradicts your idempotent
comment; replace the Load()/Store() sequence with a single
a.activeSubscriptions.LoadOrStore call so concurrent Subscribe callers don't
duplicate subscriptions. Only perform RegisterOrganization, create bridgeChan,
call a.eventHub.Subscribe, start the bridgeEvents goroutine, and log the
subscription when LoadOrStore indicates we actually stored the new subscription;
if LoadOrStore returns an existing value just return nil (idempotent). On
failure after storing (e.g., a.eventHub.Subscribe error), roll back by deleting
the key from a.activeSubscriptions and closing bridgeChan so state remains
consistent.

Comment on lines +98 to +152
// bridgeEvents converts eventhub.Event to generic Event and forwards to the listener.
// This goroutine runs until the context is cancelled or the bridge channel is closed.
func (a *EventHubAdapter) bridgeEvents(
ctx context.Context,
organizationID string,
from <-chan []eventhub.Event,
to chan<- []Event,
) {
defer func() {
// Clean up subscription tracking
a.activeSubscriptions.Delete(organizationID)
a.logger.Debug("Bridge goroutine exiting",
zap.String("organization", organizationID),
)
}()

for {
select {
case <-ctx.Done():
return

case hubEvents, ok := <-from:
if !ok {
// EventHub channel closed
a.logger.Warn("EventHub channel closed unexpectedly",
zap.String("organization", organizationID),
)
return
}

// Convert eventhub.Event to generic Event
genericEvents := make([]Event, len(hubEvents))
for i, hubEvent := range hubEvents {
eventCorrelationID := "event-" + hubEvent.CorrelationID
genericEvents[i] = Event{
OrganizationID: string(hubEvent.OrganizationID),
EventType: string(hubEvent.EventType),
Action: hubEvent.Action,
EntityID: hubEvent.EntityID,
EventData: hubEvent.EventData,
CorrelationID: eventCorrelationID,
Timestamp: hubEvent.ProcessedTimestamp,
}
}

// Forward to listener
select {
case to <- genericEvents:
// Successfully forwarded
case <-ctx.Done():
return
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsubscribe contract is currently broken (events can still be sent after Unsubscribe).
bridgeEvents keeps forwarding to to even after Unsubscribe() because it doesn’t observe any per-org stop signal (map deletion doesn’t affect the captured to). This violates EventSource.Unsubscribe doc: “After calling Unsubscribe, no more events should be sent to the channel”.

Also, prefixing correlation IDs ("event-"+...) can break request tracing if other components expect the original correlation ID.

Comment on lines +154 to +172
// Unsubscribe implements EventSource.Unsubscribe.
// Note: The current EventHub implementation doesn't have an explicit unsubscribe method,
// so we just stop the bridge goroutine by removing the subscription tracking.
func (a *EventHubAdapter) Unsubscribe(organizationID string) error {
if _, exists := a.activeSubscriptions.Load(organizationID); !exists {
// Not subscribed - this is fine, make it idempotent
return nil
}

// Remove from tracking - the bridge goroutine will detect context cancellation
// when the listener stops
a.activeSubscriptions.Delete(organizationID)

a.logger.Info("Unsubscribed from event source",
zap.String("organization", organizationID),
)

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

sync.Map.Clear() Go version introduced minimum requirement

💡 Result:

sync.Map.Clear was added in Go 1.23 — minimum required Go version: 1.23. [1][2]

Sources:
[1] pkg.go.dev sync package (Clear added in go1.23.0).
[2] go.dev source (sync.Map.Clear implementation).


🏁 Script executed:

# Find the EventHubAdapter implementation to understand the context better
find . -name "eventhub_adapter.go" -type f

Repository: wso2/api-platform

Length of output: 127


🏁 Script executed:

# Search for the full EventHubAdapter struct and methods
rg -A 200 "type EventHubAdapter struct" --max-count=1

Repository: wso2/api-platform

Length of output: 14853


🏁 Script executed:

# Look for goroutine/bridge/forwarding logic referenced in the comment
rg "bridge goroutine|Stop|Unsubscribe" gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -B 2 -A 5

Repository: wso2/api-platform

Length of output: 764


Unsubscribe doesn't actually stop the bridge goroutine.

Simply deleting from activeSubscriptions leaves the bridgeEvents goroutine still running and listening on bridgeChan. If the EventHub sends events after unsubscribe, they'll accumulate in an unconsumable channel, potentially blocking the EventHub if it does blocking sends. The comment about "context cancellation" is misleading—Unsubscribe() doesn't cancel the context.

You need to either:

  • Cancel the subscription's context (requires tracking per-subscription contexts), or
  • Close the bridge channel to signal the goroutine to exit, or
  • Add explicit unsubscribe support to the EventHub interface.

Comment on lines +174 to +186
// Close implements EventSource.Close by delegating to EventHub.Close.
func (a *EventHubAdapter) Close() error {
// Clean up all subscriptions
a.activeSubscriptions.Clear()

// Close the underlying EventHub
if err := a.eventHub.Close(); err != nil {
return fmt.Errorf("failed to close eventhub: %w", err)
}

a.logger.Info("Event source closed", zap.String("source", "eventhub"))
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, let's locate the file and understand its structure
find . -name "eventhub_adapter.go" -type f | head -5

# Show the Close() method
rg -n --type=go 'func.*Close.*error' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -C 10

# Search for EventSource interface definition
rg -n --type=go 'type EventSource interface' --type=go -C 5

# Check activeSubscriptions field type
rg -n --type=go 'activeSubscriptions' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -B 5 -A 2

Repository: wso2/api-platform

Length of output: 4496


🏁 Script executed:

#!/bin/bash
# Let's also check the Subscribe method and goroutine handling
rg -n --type=go 'func.*Subscribe' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -A 15

# Look for any existing WaitGroup or goroutine tracking
rg -n --type=go 'sync.WaitGroup\|go \w+\(' gateway/gateway-controller/pkg/eventlistener/ --type=go

Repository: wso2/api-platform

Length of output: 820


🏁 Script executed:

#!/bin/bash
# Check the EventHub interface/struct definition
rg -n --type=go 'type EventHub' --type=go -A 10

# Look for the underlying EventHub.Close() implementation
find . -name "*.go" -type f -exec rg -l 'type EventHub' {} \;

Repository: wso2/api-platform

Length of output: 2406


🏁 Script executed:

#!/bin/bash
# Get the full EventSource interface definition, especially Close()
rg -n --type=go 'type EventSource interface' -A 50 gateway/gateway-controller/pkg/eventlistener/

# Check the bridge goroutine implementation
rg -n --type=go 'func.*bridgeEvents' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -A 50

# Look for context usage in the adapter
rg -n --type=go 'ctx.*context\|context.*Done\|context.*Cancel' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go

Repository: wso2/api-platform

Length of output: 5686


🏁 Script executed:

#!/bin/bash
# Check if there's a context stored in EventHubAdapter
rg -n --type=go 'type EventHubAdapter' -A 15 gateway/gateway-controller/pkg/eventlistener/

# Look for any shutdown or stop mechanism
rg -n --type=go 'stop\|cancel\|shutdown\|done' gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go -i

Repository: wso2/api-platform

Length of output: 1659


Close() doesn't meet EventSource.Close contract (missing goroutine shutdown + wait for in-flight events).

activeSubscriptions.Clear() only removes tracking and doesn't signal the bridge goroutines to stop. The bridge goroutines listen on ctx.Done() (line 116 of bridgeEvents), but Close() never cancels these contexts. Additionally, the contract requires "wait for in-flight events to be delivered" (event_source.go:98), but there's no synchronization mechanism (WaitGroup) to wait for goroutine cleanup before returning.

Add context cancellation for all active subscriptions and use sync.WaitGroup to track and wait for all bridge goroutines to exit before closing the underlying EventHub.

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/eventlistener/eventhub_adapter.go around
lines 174 - 186, Close currently only clears activeSubscriptions and closes the
underlying eventHub but never cancels the subscription contexts or waits for
bridge goroutines to finish; modify EventHubAdapter.Close to (1) signal
cancellation for each active subscription's context (the bridges listen on
ctx.Done() in bridgeEvents) instead of just calling activeSubscriptions.Clear(),
(2) track bridge goroutines with a sync.WaitGroup incremented where bridge
goroutines are started and decremented on exit, and (3) wait for that WaitGroup
to complete (ensuring in-flight events are delivered) before calling
a.eventHub.Close() and returning; update any types holding subscription state to
store the cancel functions and ensure Close invokes them and then waits for the
WaitGroup.

Comment on lines 132 to 168
type APIDeploymentService struct {
store *storage.ConfigStore
db storage.Storage
snapshotManager *xds.SnapshotManager
parser *config.Parser
validator config.Validator
routerConfig *config.RouterConfig
httpClient *http.Client
store *storage.ConfigStore
db storage.Storage
snapshotManager *xds.SnapshotManager
policyManager *policyxds.PolicyManager
parser *config.Parser
validator config.Validator
routerConfig *config.RouterConfig
httpClient *http.Client
eventHub eventhub.EventHub
enableReplicaSync bool
}

// NewAPIDeploymentService creates a new API deployment service
func NewAPIDeploymentService(
store *storage.ConfigStore,
db storage.Storage,
snapshotManager *xds.SnapshotManager,
policyManager *policyxds.PolicyManager,
validator config.Validator,
routerConfig *config.RouterConfig,
eventHub eventhub.EventHub,
enableReplicaSync bool,
) *APIDeploymentService {
return &APIDeploymentService{
store: store,
db: db,
snapshotManager: snapshotManager,
parser: config.NewParser(),
validator: validator,
httpClient: &http.Client{Timeout: 10 * time.Second},
routerConfig: routerConfig,
store: store,
db: db,
snapshotManager: snapshotManager,
policyManager: policyManager,
parser: config.NewParser(),
validator: validator,
httpClient: &http.Client{Timeout: 10 * time.Second},
routerConfig: routerConfig,
eventHub: eventHub,
enableReplicaSync: enableReplicaSync,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Enforce replica-sync prerequisites at construction time (avoid “successful no-op” deployments).
With enableReplicaSync=true, later code can skip in-memory updates and only optionally use db/eventHub. If any of those are nil, you can return success while persisting nothing and/or never triggering reconciliation.

Consider validating invariants in NewAPIDeploymentService (or at the start of deploy/update):

  • replica-sync requires db != nil and eventHub != nil (and likely snapshotManager != nil / store != nil depending on architecture).
🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 132 -
168, NewAPIDeploymentService must enforce that enabling replica-sync has
required dependencies; change NewAPIDeploymentService to validate that if
enableReplicaSync == true then db and eventHub (and optionally snapshotManager
and store) are non-nil, return an error if any required dependency is nil (i.e.
change signature to return (*APIDeploymentService, error)), update callers to
handle the error, and keep construction otherwise identical; this prevents
creating a service that claims replica-sync but will be a silent no-op.

Comment on lines +324 to +418
// UpdateAPIConfiguration handles the complete API configuration update process
func (s *APIDeploymentService) UpdateAPIConfiguration(params APIUpdateParams) (*APIDeploymentResult, error) {
handle := params.Handle

// Parse configuration
var apiConfig api.APIConfiguration
err := s.parser.Parse(params.Data, params.ContentType, &apiConfig)
if err != nil {
params.Logger.Error("Failed to parse configuration", zap.Error(err))
return nil, &ParseError{Message: "Failed to parse configuration"}
}

// Validate that the handle in the YAML matches the path parameter
if apiConfig.Metadata.Name != "" {
if apiConfig.Metadata.Name != handle {
params.Logger.Warn("Handle mismatch between path and YAML metadata",
zap.String("path_handle", handle),
zap.String("yaml_handle", apiConfig.Metadata.Name))
return nil, &HandleMismatchError{
PathHandle: handle,
YamlHandle: apiConfig.Metadata.Name,
}
}
}

// Validate configuration
validationErrors := s.validator.Validate(&apiConfig)
if len(validationErrors) > 0 {
params.Logger.Warn("Configuration validation failed",
zap.String("handle", handle),
zap.Int("num_errors", len(validationErrors)))
return nil, &APIValidationError{Errors: validationErrors}
}

if s.db == nil {
return nil, &DatabaseUnavailableError{}
}

// Check if config exists
existing, err := s.db.GetConfigByHandle(handle)
if err != nil {
params.Logger.Warn("API configuration not found", zap.String("handle", handle))
return nil, &NotFoundError{Handle: handle}
}

// Update stored configuration
now := time.Now()
existing.Configuration = apiConfig
existing.Status = models.StatusPending
existing.UpdatedAt = now
existing.DeployedAt = nil
existing.DeployedVersion = 0

// Handle AsyncWebSub topic lifecycle management
if apiConfig.Kind == api.Asyncwebsub {
if err := s.handleTopicLifecycle(existing, params.Logger); err != nil {
return nil, err
}
}

// Update database first (only if persistent mode)
if s.db != nil {
if err := s.db.UpdateConfig(existing); err != nil {
params.Logger.Error("Failed to update config in database", zap.Error(err))
return nil, fmt.Errorf("failed to persist configuration update: %w", err)
}
}

// Update in-memory store
if err := s.store.Update(existing); err != nil {
if storage.IsConflictError(err) {
params.Logger.Info("API configuration handle already exists",
zap.String("id", existing.ID),
zap.String("handle", handle))
return nil, &ConflictError{Message: err.Error()}
}
params.Logger.Error("Failed to update config in memory store", zap.Error(err))
return nil, fmt.Errorf("failed to update configuration in memory store: %w", err)
}

params.Logger.Info("API configuration updated",
zap.String("id", existing.ID),
zap.String("handle", handle))

// Update xDS snapshot asynchronously
s.triggerXDSSnapshotUpdate(existing.ID, params.CorrelationID, params.Logger)

// Update derived policy configuration
s.updatePolicyConfiguration(existing, params.Logger)

return &APIDeploymentResult{
StoredConfig: existing,
IsUpdate: true,
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

UpdateAPIConfiguration bypasses replica-sync design (in multi-replica it updates local store directly).
Based on the retrieved learning for saveOrUpdateConfig (multi-replica must update in-memory via EventListener for consistency), UpdateAPIConfiguration should likely follow the same rule: persist + publish event, and avoid immediate s.store.Update(...) / local xDS/policy update when enableReplicaSync is true.

Today it unconditionally:

  • updates DB,
  • updates in-memory store,
  • triggers xDS snapshot,
  • updates derived policies.

That can diverge replica state and/or double-apply changes.

Proposed direction (high level)
 func (s *APIDeploymentService) UpdateAPIConfiguration(params APIUpdateParams) (*APIDeploymentResult, error) {
+   if s.enableReplicaSync {
+     // 1) update persistent storage
+     // 2) publish EventHub event (UPDATE)
+     // 3) return without touching in-memory store/xDS/policy locally
+   }
    ...
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 324 -
418, UpdateAPIConfiguration currently updates the in-memory store and triggers
xDS/policy updates directly, which breaks replica-sync; change it so that after
persisting with s.db.UpdateConfig (and only if s.enableReplicaSync is true) it
does NOT call s.store.Update, s.triggerXDSSnapshotUpdate, or
s.updatePolicyConfiguration directly — instead publish the config-updated event
using the same event path used by saveOrUpdateConfig/EventListener so replicas
reconcile via the listener; when enableReplicaSync is false, keep the existing
immediate s.store.Update, s.triggerXDSSnapshotUpdate, and
s.updatePolicyConfiguration behavior.

Comment on lines +889 to +916
// TODO: (VirajSalaka) Fix working in multi-replica mode
// updatePolicyConfiguration builds and updates/removes derived policy config for an API
func (s *APIDeploymentService) updatePolicyConfiguration(storedCfg *models.StoredConfig, logger *zap.Logger) {
if s.policyManager == nil {
return
}

storedPolicy := s.BuildStoredPolicyFromAPI(storedCfg)
if storedPolicy != nil {
if err := s.policyManager.AddPolicy(storedPolicy); err != nil {
logger.Error("Failed to update derived policy configuration", zap.Error(err))
} else {
logger.Info("Derived policy configuration updated",
zap.String("policy_id", storedPolicy.ID),
zap.Int("route_count", len(storedPolicy.Configuration.Routes)))
}
} else {
// API no longer has policies, remove the existing policy configuration
policyID := storedCfg.ID + "-policies"
if err := s.policyManager.RemovePolicy(policyID); err != nil {
// Log at debug level since policy may not exist if API never had policies
logger.Debug("No policy configuration to remove", zap.String("policy_id", policyID))
} else {
logger.Info("Derived policy configuration removed (API no longer has policies)",
zap.String("policy_id", policyID))
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t call updatePolicyConfiguration in replica-sync mode until multi-replica semantics are implemented.
There’s an explicit TODO “Fix working in multi-replica mode”, but UpdateAPIConfiguration currently calls s.updatePolicyConfiguration(...) unconditionally. Gate it behind !s.enableReplicaSync (same as Deploy’s non-replica-sync path) until the event-driven policy update path exists.

🤖 Prompt for AI Agents
In @gateway/gateway-controller/pkg/utils/api_deployment.go around lines 889 -
916, The UpdateAPIConfiguration flow currently calls
s.updatePolicyConfiguration(...) unconditionally; gate that call behind the
replica-sync flag so it only runs when replica sync is disabled: check
!s.enableReplicaSync (same logic used in Deploy's non-replica-sync path) and
only invoke s.updatePolicyConfiguration(storedCfg, logger) in that branch,
skipping policy updates when enableReplicaSync is true until multi-replica
semantics are implemented.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@gateway/gateway-controller/pkg/eventhub/sqlite_backend.go`:
- Around line 461-483: Subscribe and Unsubscribe currently skip the
initialization guard; add a call to ensureInitialized() at the start of both
SQLiteBackend.Subscribe and SQLiteBackend.Unsubscribe and return any error it
produces before proceeding to registry.addSubscriber/removeSubscriber and
logging. Specifically, in the methods named Subscribe and Unsubscribe on type
SQLiteBackend, perform: if err := b.ensureInitialized(); err != nil { return err
} as the first statement so behavior matches RegisterOrganization, Publish,
Cleanup and CleanupRange.
- Around line 524-554: The Close method currently holds b.mu while calling
b.wg.Wait which can deadlock with background goroutines that try to acquire the
mutex; fix by (1) acquire b.mu, check b.initialized and set b.initialized =
false and capture/call b.pollerCancel and b.cleanupCancel while still holding
the lock, (2) release b.mu before calling b.wg.Wait, and (3) after wg.Wait
return, re-acquire b.mu (if needed) and call b.closeStatements and final
logging; update SQLiteBackend.Close to call pollerCancel/cleanupCancel under the
lock but move wg.Wait to occur after unlocking and then perform closeStatements
under lock again to keep state mutations safe (references: SQLiteBackend.Close,
b.mu, b.wg, b.pollerCancel, b.cleanupCancel, b.closeStatements,
ensureInitialized/pollLoop which acquire the mutex).
♻️ Duplicate comments (2)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (2)

85-92: Background workers still derive context from caller's ctx.

Lines 85 and 90 create pollerCtx and cleanupCtx from the caller-provided ctx. If the caller cancels this context before Close() is called, background workers will terminate unexpectedly. Background workers should use context.Background() and rely exclusively on Close() for lifecycle management.


703-732: Partial delivery issue when multiple subscribers exist.

The loop returns immediately when any subscriber's channel is full (line 728), preventing delivery to remaining subscribers. Additionally, the TODO at line 715 acknowledges this limitation. If multiple subscribers are supported, subsequent subscribers won't receive events when an earlier one's channel is full, and all subscribers will receive duplicates on retry.

🧹 Nitpick comments (2)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (2)

298-375: Duplicate query definitions between prepareStatement and prepareStatements.

The SQL queries are defined twice: once in prepareStatement() (lines 160-200) for re-preparation during retry, and again here for initial preparation. This creates a maintenance risk where changes to one location might not be reflected in the other, potentially causing subtle inconsistencies.

Consider having prepareStatements() call prepareStatement() for each key to ensure consistency:

♻️ Proposed refactor
 func (b *SQLiteBackend) prepareStatements() (err error) {
 	defer func() {
 		if err != nil {
 			b.closeStatements()
 		}
 	}()
 
-	b.stmtGetAllStates, err = b.db.Prepare(`
-		SELECT organization, version_id, updated_at
-		FROM organization_states
-		ORDER by organization
-	`)
-	if err != nil {
-		return fmt.Errorf("failed to prepare getAllStates: %w", err)
+	keys := []statementKey{
+		stmtKeyGetAllStates,
+		stmtKeyGetEventsSince,
+		stmtKeyInsertEvent,
+		stmtKeyUpdateOrgState,
+		stmtKeyInsertOrgState,
+		stmtKeyCleanup,
+		stmtKeyCleanupRange,
 	}
-	// ... (remove all other inline preparations)
+
+	for _, key := range keys {
+		stmt, prepErr := b.prepareStatement(key)
+		if prepErr != nil {
+			return prepErr
+		}
+		b.setStatement(key, stmt)
+	}
+
+	b.logger.Info("Prepared statements initialized successfully")
+	return nil
 }

680-681: Address pagination TODO for large event sets.

Without pagination, querying events since a timestamp could return an unbounded number of rows, potentially causing memory pressure. The TODO acknowledges this.

Would you like me to help implement cursor-based pagination for this query, or open an issue to track this task?

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f75156 and 48f0f0f.

📒 Files selected for processing (1)
  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:98-156
Timestamp: 2026-01-09T08:39:25.435Z
Learning: In gateway/gateway-controller/pkg/eventhub/sqlite_backend.go, the events table uses correlation_id as PRIMARY KEY. The correlationID is expected to be unique at all times by design across all event sources (control plane, API handlers, etc.).
📚 Learning: 2026-01-09T08:39:25.435Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/eventhub/sqlite_backend.go:98-156
Timestamp: 2026-01-09T08:39:25.435Z
Learning: In gateway/gateway-controller/pkg/eventhub/sqlite_backend.go, the events table uses correlation_id as PRIMARY KEY. The correlationID is expected to be unique at all times by design across all event sources (control plane, API handlers, etc.).

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2026-01-09T08:19:21.846Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/storage/interface.go:233-234
Timestamp: 2026-01-09T08:19:21.846Z
Learning: In gateway/gateway-controller/pkg/storage/interface.go, the Storage interface design philosophy of being "database-agnostic" means it should support any SQL database (SQLite, PostgreSQL, MySQL, etc.), not non-SQL storage backends. Therefore, exposing *sql.DB through GetDB() is appropriate and maintains the intended abstraction.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2026-01-07T17:41:01.083Z
Learnt from: VirajSalaka
Repo: wso2/api-platform PR: 642
File: gateway/gateway-controller/pkg/utils/api_deployment.go:570-589
Timestamp: 2026-01-07T17:41:01.083Z
Learning: In gateway/gateway-controller/pkg/utils/api_deployment.go, when enableReplicaSync is true (multi-replica mode), the in-memory store (s.store) is intentionally not updated immediately during DeployAPIConfiguration. Instead, all replicas (including the originating replica) update their in-memory stores through the event-driven path via the EventListener to ensure consistent state synchronization across the cluster. This means immediate reads may not find the config until the EventListener processes the event, which is expected behavior.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
📚 Learning: 2025-12-18T08:33:51.340Z
Learnt from: O-sura
Repo: wso2/api-platform PR: 485
File: gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go:21-90
Timestamp: 2025-12-18T08:33:51.340Z
Learning: In gateway/policy-engine/internal/analytics/dto/unwrappedEvent.go, the JSON tag casing differences between DefaultFaultEvent (correlationID, applicationID, organizationID, environmentID, regionID with uppercase 'ID') and DefaultResponseEvent (correlationId, applicationId, organizationId, environmentId, regionId with lowercase 'Id') are intentional to maintain consistency with respective event type conventions and config file property standards.

Applied to files:

  • gateway/gateway-controller/pkg/eventhub/sqlite_backend.go
🧬 Code graph analysis (1)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (4)
platform-api/src/internal/database/connection.go (1)
  • DB (32-34)
gateway/gateway-controller/pkg/eventhub/backend.go (2)
  • SQLiteBackendConfig (76-83)
  • DefaultSQLiteBackendConfig (86-92)
gateway/gateway-controller/pkg/eventhub/types.go (3)
  • EventType (9-9)
  • Event (19-28)
  • OrganizationState (31-35)
gateway/gateway-controller/pkg/eventhub/eventhub.go (1)
  • New (24-35)
🔇 Additional comments (4)
gateway/gateway-controller/pkg/eventhub/sqlite_backend.go (4)

28-66: LGTM!

The struct properly includes mutex protection for the initialized flag, addressing the previous data race concern. The constructor correctly separates object creation from lifecycle management (goroutine spawning happens in Initialize).


400-459: LGTM!

The Publish method correctly uses transactions with proper rollback handling. The mutex scope is appropriately narrow (only held during prepared statement access), and the atomic update of both event insertion and organization state version is well-implemented.


598-649: LGTM!

The polling logic efficiently fetches all organization states in a single query, correctly detects version changes, and properly handles the poll state update—only advancing the timestamp when delivery succeeds or when there are no new events. This ensures events are retried on the next poll cycle if delivery fails.


734-752: LGTM!

The cleanup loop properly handles context cancellation and uses the retention period configuration to determine the cutoff time. Error handling is appropriate with logging on failure.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +461 to +483
// Subscribe registers a channel to receive events for an organization
func (b *SQLiteBackend) Subscribe(orgID string, eventChan chan<- []Event) error {
if err := b.registry.addSubscriber(orgID, eventChan); err != nil {
return err
}

b.logger.Info("Subscription registered",
zap.String("organization", string(orgID)),
)
return nil
}

// Unsubscribe removes a subscription channel for an organization
func (b *SQLiteBackend) Unsubscribe(orgID string, eventChan chan<- []Event) error {
if err := b.registry.removeSubscriber(orgID, eventChan); err != nil {
return err
}

b.logger.Info("Subscription removed",
zap.String("organization", string(orgID)),
)
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing ensureInitialized() check in Subscribe and Unsubscribe.

Unlike other methods (RegisterOrganization, Publish, Cleanup, CleanupRange), these methods don't verify the backend is initialized before proceeding. While the registry is created in the constructor so this technically works, it creates an inconsistent API where some methods require initialization and others don't.

🔧 Suggested fix
 func (b *SQLiteBackend) Subscribe(orgID string, eventChan chan<- []Event) error {
+	if err := b.ensureInitialized(); err != nil {
+		return err
+	}
 	if err := b.registry.addSubscriber(orgID, eventChan); err != nil {
 		return err
 	}
 	// ...
 }

 func (b *SQLiteBackend) Unsubscribe(orgID string, eventChan chan<- []Event) error {
+	if err := b.ensureInitialized(); err != nil {
+		return err
+	}
 	if err := b.registry.removeSubscriber(orgID, eventChan); err != nil {
 		return err
 	}
 	// ...
 }
🤖 Prompt for AI Agents
In `@gateway/gateway-controller/pkg/eventhub/sqlite_backend.go` around lines 461 -
483, Subscribe and Unsubscribe currently skip the initialization guard; add a
call to ensureInitialized() at the start of both SQLiteBackend.Subscribe and
SQLiteBackend.Unsubscribe and return any error it produces before proceeding to
registry.addSubscriber/removeSubscriber and logging. Specifically, in the
methods named Subscribe and Unsubscribe on type SQLiteBackend, perform: if err
:= b.ensureInitialized(); err != nil { return err } as the first statement so
behavior matches RegisterOrganization, Publish, Cleanup and CleanupRange.

Comment on lines +524 to +554
// Close gracefully shuts down the SQLite backend
func (b *SQLiteBackend) Close() error {

b.mu.Lock()
defer b.mu.Unlock()
if !b.initialized {
return nil
}

b.logger.Info("Shutting down SQLite backend")

// Stop poller
if b.pollerCancel != nil {
b.pollerCancel()
}

// Stop cleanup loop
if b.cleanupCancel != nil {
b.cleanupCancel()
}

// Wait for goroutines
b.wg.Wait()

// Close all prepared statements
b.closeStatements()

b.initialized = false
b.logger.Info("SQLite backend shutdown complete")
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential deadlock: Close() holds mutex during wg.Wait().

Close() acquires the mutex at line 527 and holds it through wg.Wait() at line 546. If a background worker (e.g., pollLooppollAllOrganizationsgetAllStatesensureInitialized) is mid-execution when Close() is called, it will attempt to acquire RLock and block indefinitely because Close() holds the exclusive Lock. Meanwhile, Close() waits for the goroutine to finish, creating a deadlock.

🐛 Proposed fix: Release lock before waiting
 func (b *SQLiteBackend) Close() error {
 	b.mu.Lock()
-	defer b.mu.Unlock()
 	if !b.initialized {
+		b.mu.Unlock()
 		return nil
 	}
+	b.initialized = false // Prevent new operations
+	b.mu.Unlock()
 
 	b.logger.Info("Shutting down SQLite backend")
 
 	// Stop poller
 	if b.pollerCancel != nil {
 		b.pollerCancel()
 	}
 
 	// Stop cleanup loop
 	if b.cleanupCancel != nil {
 		b.cleanupCancel()
 	}
 
 	// Wait for goroutines (without holding lock)
 	b.wg.Wait()
 
+	b.mu.Lock()
 	// Close all prepared statements
 	b.closeStatements()
-
-	b.initialized = false
+	b.mu.Unlock()
+
 	b.logger.Info("SQLite backend shutdown complete")
 	return nil
 }
🤖 Prompt for AI Agents
In `@gateway/gateway-controller/pkg/eventhub/sqlite_backend.go` around lines 524 -
554, The Close method currently holds b.mu while calling b.wg.Wait which can
deadlock with background goroutines that try to acquire the mutex; fix by (1)
acquire b.mu, check b.initialized and set b.initialized = false and capture/call
b.pollerCancel and b.cleanupCancel while still holding the lock, (2) release
b.mu before calling b.wg.Wait, and (3) after wg.Wait return, re-acquire b.mu (if
needed) and call b.closeStatements and final logging; update SQLiteBackend.Close
to call pollerCancel/cleanupCancel under the lock but move wg.Wait to occur
after unlocking and then perform closeStatements under lock again to keep state
mutations safe (references: SQLiteBackend.Close, b.mu, b.wg, b.pollerCancel,
b.cleanupCancel, b.closeStatements, ensureInitialized/pollLoop which acquire the
mutex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant