Skip to content

Fix Intermittent Test Failures#1301

Open
KaveeshaPiumini wants to merge 1 commit intoasgardeo:mainfrom
KaveeshaPiumini:bug-fix
Open

Fix Intermittent Test Failures#1301
KaveeshaPiumini wants to merge 1 commit intoasgardeo:mainfrom
KaveeshaPiumini:bug-fix

Conversation

@KaveeshaPiumini
Copy link
Contributor

@KaveeshaPiumini KaveeshaPiumini commented Feb 5, 2026

Purpose

This pull request improves the reliability and robustness of integration tests for SMS and organizational unit (OU) registration flows, especially in resource-constrained environments. The main changes include increasing timeouts for waiting on asynchronous operations and introducing a generic retry mechanism with exponential backoff to handle transient failures during test execution.

Test reliability improvements:

  • Increased timeouts (from 500ms or 1s to 2s) when waiting for OTP/SMS messages to be sent in several test cases in sms_registration_test.go and ou_registration_test.go, reducing flakiness in slower environments. [1] [2] [3] [4]

  • Updated the helper function generateUniqueMobileNumber to produce more unique numbers by using the full UnixNano value, further reducing the chance of collisions in parallel test runs.

Resilience to transient failures:

  • Added a new RetryWithBackoff utility in utils.go to retry operations with exponential backoff, improving test resilience to transient resource-related errors.

  • Refactored relevant test cases in ou_registration_test.go and sms_registration_test.go to use RetryWithBackoff when completing flows, ensuring tests only fail on permanent errors and not on temporary issues. [1] [2]

Related Issues

  • N/A

Related PRs

  • N/A

Checklist

  • Followed the contribution guidelines.
  • Manual test round performed and verified.
  • Documentation provided. (Add links if there are any)
  • Tests provided. (Add links if there are any)
    • Unit Tests
    • Integration Tests
  • Breaking changes. (Fill if applicable)
    • Breaking changes section filled.
    • breaking change label added.

Security checks

  • Followed secure coding standards in WSO2 Secure Coding Guidelines
  • Confirmed that this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.

Summary by CodeRabbit

  • Tests
    • Added a reusable retry-with-backoff utility for integration tests to handle transient failures.
    • Wrapped flow completion with retry loops (up to 3 attempts with short backoff), aborting early on error states.
    • Increased several timeouts/waits in registration and SMS tests for resource-constrained environments.
    • Strengthened status checks and conditional waits to improve OTP delivery resilience and overall test robustness.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request improves the reliability of integration tests for SMS and organizational unit (OU) registration flows by addressing intermittent failures that occur in resource-constrained environments. The changes focus on two main areas: increasing timeouts for asynchronous operations and introducing a retry mechanism with exponential backoff.

Changes:

  • Added a new RetryWithBackoff utility function in tests/integration/flow/common/utils.go to handle transient failures with exponential backoff
  • Increased wait times from 500ms-1s to 2 seconds in multiple test cases across sms_registration_test.go and ou_registration_test.go to accommodate slower environments
  • Modified generateUniqueMobileNumber to use the full UnixNano value instead of modulo for better uniqueness in parallel test runs

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/integration/flow/common/utils.go Added RetryWithBackoff utility function for handling transient failures with exponential backoff
tests/integration/flow/registration/sms_registration_test.go Increased SMS wait timeout to 2 seconds and added retry logic with backoff for flow completion in one test case
tests/integration/flow/registration/ou_registration_test.go Increased OTP wait timeout to 2 seconds, added retry logic with backoff for flow completion, and updated mobile number generation to use full UnixNano for better uniqueness

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.94%. Comparing base (1c5065f) to head (5bc4cd6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1301   +/-   ##
=======================================
  Coverage   89.94%   89.94%           
=======================================
  Files         629      629           
  Lines       41116    41116           
  Branches     2390     2390           
=======================================
  Hits        36981    36981           
  Misses       2235     2235           
  Partials     1900     1900           
Flag Coverage Δ
backend-integration-postgres 53.03% <ø> (ø)
backend-integration-sqlite 53.00% <ø> (ø)
backend-unit 82.49% <ø> (ø)
frontend-apps-develop-unit 90.75% <ø> (ø)
frontend-apps-gate-unit 84.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a RetryWithBackoff utility and integrates exponential-backoff retries plus longer waits into SMS and OU registration integration tests; also updates a mobile number formatting string in an OU registration test. All changes are within test code and a test utility file.

Changes

Cohort / File(s) Summary
Retry Utility
tests/integration/flow/common/utils.go
Adds RetryWithBackoff(operation func() error, maxRetries int, initialDelay time.Duration) implementing exponential backoff retries and returning a wrapped error after final failure.
OU Registration Tests
tests/integration/flow/registration/ou_registration_test.go
Replaces short waits with longer waits (500ms → 2s) and wraps CompleteFlow calls in backoff retries (up to 3 attempts, 1s base, exponential); stops retrying on ERROR, continues only when INCOMPLETE; updates generateUniqueMobileNumber format from "+1234567%d" to "+1%d".
SMS Registration Tests
tests/integration/flow/registration/sms_registration_test.go
Increases several fixed waits to 2s and wraps CompleteFlow in RetryWithBackoff for single-request flows (3 attempts, exponential backoff from 1s); retries halted on ERROR, require INCOMPLETE to continue.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop the tests with careful flair,
I wait and try with backoff care,
Three hops, then one more patient peek,
OTPs arrive when timing's meek,
A nibble, a twitch — the flow's complete! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix Intermittent Test Failures' clearly and accurately summarizes the main objective of the PR: improving test reliability by fixing transient failures.
Description check ✅ Passed The description is comprehensive and well-structured, covering the Purpose and Approach sections with specific details about changes. All required template sections are present with appropriate content.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/integration/flow/registration/ou_registration_test.go`:
- Around line 885-888: generateUniqueMobileNumber currently uses
time.Now().UnixNano() producing >15-digit numbers that violate E.164; change
generateUniqueMobileNumber to produce a country-coded E.164-compliant string by
limiting the unique suffix to at most 10 digits (e.g., use UnixNano() %
10000000000 and format with zero padding so the total digits including the "+1"
country code do not exceed 15). Also update formatPhoneNumber to assert/validate
the final length is <=15 (and return an error or fail the test if not) so the
helper cannot produce non-compliant numbers; reference functions:
generateUniqueMobileNumber and formatPhoneNumber.
🧹 Nitpick comments (3)
tests/integration/flow/common/utils.go (1)

503-524: Consider handling edge case when maxRetries is 0 or negative.

If maxRetries is 0, the loop never executes, and the function returns nil without ever calling the operation. This could silently skip operations if a caller passes an invalid value.

🛠️ Suggested defensive check
 func RetryWithBackoff(operation func() error, maxRetries int, initialDelay time.Duration) error {
+	if maxRetries <= 0 {
+		return fmt.Errorf("maxRetries must be positive, got %d", maxRetries)
+	}
 	var lastErr error
 	delay := initialDelay
tests/integration/flow/registration/ou_registration_test.go (1)

824-846: Retry logic returns success on ERROR status, which may confuse readers.

When flowStep.FlowStatus == "ERROR", the closure returns nil, signaling success to RetryWithBackoff. The test then fails at the assertion on line 846. While functionally correct, this pattern is non-obvious since returning nil typically means "operation succeeded."

Consider returning a distinguishable error or using a different pattern to make the intent clearer.

♻️ Alternative approach using a sentinel error
+var errPermanentFailure = fmt.Errorf("permanent failure, stop retrying")
+
 			err = common.RetryWithBackoff(func() error {
 				flowStep, err = common.CompleteFlow(flowStep.FlowID, inputs, "action_001")
 				if err != nil {
 					return err
 				}
 				// Don't retry on ERROR status - these are permanent errors
 				if flowStep.FlowStatus == "ERROR" {
-					return nil // Stop retrying, let the test assertion handle it
+					return errPermanentFailure
 				}
 				if flowStep.FlowStatus != "INCOMPLETE" {
 					return fmt.Errorf("unexpected flow status: %s", flowStep.FlowStatus)
 				}
 				return nil
 			}, 3, 1*time.Second)
-			ts.Require().NoError(err)
+			if err != nil && err.Error() != errPermanentFailure.Error() {
+				ts.Require().NoError(err)
+			}
tests/integration/flow/registration/sms_registration_test.go (1)

551-567: Variable err from outer scope is reassigned inside the retry closure.

Line 555 reassigns err from the outer scope inside the closure. While this works in Go, it's error-prone and reduces readability. Consider using a distinct variable name inside the closure.

♻️ Suggested clarification
 	var otpStep *common.FlowStep
 	err = common.RetryWithBackoff(func() error {
-		otpStep, err = common.CompleteFlow(flowStep.FlowID, inputs, "action_001")
-		if err != nil {
-			return err
+		var completeErr error
+		otpStep, completeErr = common.CompleteFlow(flowStep.FlowID, inputs, "action_001")
+		if completeErr != nil {
+			return completeErr
 		}
 		// Don't retry on ERROR status - these are permanent errors
 		if otpStep.FlowStatus == "ERROR" {
 			return nil // Stop retrying, let the test assertion handle it
 		}
 		if otpStep.FlowStatus != "INCOMPLETE" {
 			return fmt.Errorf("unexpected flow status: %s", otpStep.FlowStatus)
 		}
 		return nil
 	}, 3, 1*time.Second)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/integration/flow/common/utils.go`:
- Around line 503-524: Add unit tests exercising RetryWithBackoff: one test
where the supplied operation fails a fixed number of times then succeeds (verify
it returns nil and is called expected attempts), and one where it always fails
(verify the returned error wraps the final operation error and indicates
maxRetries). Use small initialDelay (e.g., 1ms) to keep tests fast, drive the
operation with a counter/closure to control when it succeeds, and assert call
counts and error wrapping; reference the RetryWithBackoff function, its
maxRetries and initialDelay parameters to locate where to invoke it.

Comment on lines +503 to +524
// RetryWithBackoff retries an operation with exponential backoff
// This helps handle transient failures that may occur under resource constraints
func RetryWithBackoff(operation func() error, maxRetries int, initialDelay time.Duration) error {
var lastErr error
delay := initialDelay

for attempt := 0; attempt < maxRetries; attempt++ {
lastErr = operation()
if lastErr == nil {
return nil
}

// Don't sleep after the last attempt
if attempt < maxRetries-1 {
time.Sleep(delay)
// Exponential backoff: double the delay for next attempt
delay *= 2
}
}

return fmt.Errorf("operation failed after %d attempts: %w", maxRetries, lastErr)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add tests for RetryWithBackoff to meet the 80% coverage requirement.

This helper adds non‑trivial retry behavior but has no dedicated coverage (e.g., success-after-N attempts and failure-after-max). Please add unit/integration tests to keep the required threshold.

As per coding guidelines, "Add unit and integration tests for each new feature or bug fix to achieve combined coverage of at least 80%".

🤖 Prompt for AI Agents
In `@tests/integration/flow/common/utils.go` around lines 503 - 524, Add unit
tests exercising RetryWithBackoff: one test where the supplied operation fails a
fixed number of times then succeeds (verify it returns nil and is called
expected attempts), and one where it always fails (verify the returned error
wraps the final operation error and indicates maxRetries). Use small
initialDelay (e.g., 1ms) to keep tests fast, drive the operation with a
counter/closure to control when it succeeds, and assert call counts and error
wrapping; reference the RetryWithBackoff function, its maxRetries and
initialDelay parameters to locate where to invoke it.

@KaveeshaPiumini KaveeshaPiumini added the skip-changelog Skip generating changelog for a particular PR label Feb 9, 2026
flowStep, err = common.CompleteFlow(flowStep.FlowID, inputs, "action_001")
// Retry flow completion with backoff to handle resource constraints
// Only retry if flow doesn't complete (INCOMPLETE status expected), not on ERROR
err = common.RetryWithBackoff(func() error {
Copy link
Contributor

@ThaminduDilshan ThaminduDilshan Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is not a good approach to fix this issue. With this, the test tries to retry the failed operation which may/ may not result in a successful status.
Even if it gets successful eventually, IMO this could hide an actual issue with the implementation as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you able to flag an issue with the dummy http server implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Skip generating changelog for a particular PR Type/Improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants