Skip to content

Conversation

@sophiatev
Copy link
Contributor

This PR addresses an issue where a customer is running into an error when attempting to start a new orchestration:

Previous channel ManagedChannelImpl{logId=57, target=localhost:4001} was not shutdown properly!!!

The stack trace indicates that this occurs when attempting to build a new managedSideCarChannel in the DurableTaskGrpcClient. It appears that the previous managedSidecarChannel was not disposed properly in a call to close(). This can happen since we do nothing if the channel is not successfully terminated via a graceful call to shutdown() within the timeout (5 seconds). It can also happen if an InterruptedException is thrown, since we don't attempt any shutdown in that case.

To fix this, this PR initiates a forceful shutdown if the graceful shutdown fails in the timeframe. It also initiates a forceful shutdown in the case of an InterruptedException (but manually sets the thread's state to "interrupted" so that we do not lose this information).

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes are added to the CHANGELOG.md
  • I have added all required tests (Unit tests, E2E tests)

Additional information

Additional PR information

@sophiatev sophiatev requested a review from a team as a code owner December 15, 2025 19:25
Copy link
Member

@halspang halspang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to test or verify that this doesn't leave the system in a bad state? Even a test where we call shutdownNow and then recreate the connection and complete a request?

@sophiatev
Copy link
Contributor Author

Is there anyway to test or verify that this doesn't leave the system in a bad state? Even a test where we call shutdownNow and then recreate the connection and complete a request?

After spending much too long attempting to make this test, I think we are actually implicitly doing this for every test in the IntegrationTests class. This is because this method is called after each test which does a forceful shutdown on both gRPC channels being used.

@sophiatev sophiatev merged commit de93757 into main Dec 16, 2025
7 checks passed
@sophiatev sophiatev deleted the stevosyan/fix-grpc-channel-shutdown branch December 16, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants