Skip to content

Conversation

@pdoerner
Copy link
Contributor

@pdoerner pdoerner commented Dec 11, 2025

What changed?

Many changes to support the Temporal SDK sending Temporal Failures instead of errors.

Depends on temporalio/api#682 and nexus-rpc/sdk-go#69

Why?

Consistency with other APIs and more rich information.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Still need to validate older SDK + newer server and vice versa still work correctly.

@pdoerner pdoerner requested a review from bergundy December 11, 2025 17:07
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I will have time to fully review this before I go on PTO but the overall direction looks great.

When you plan on marking this ready for review, I would suggest adding the following tests:

  • Old server caller is compatible with a new server handler for start, cancel and callback requests
  • Encoded attributes returned from the SDK are passed through

@pdoerner pdoerner marked this pull request as ready for review December 17, 2025 18:10
@pdoerner pdoerner requested review from a team as code owners December 17, 2025 18:10
@pdoerner pdoerner requested review from gow and stephanos December 17, 2025 18:14
@pdoerner pdoerner requested a review from bergundy January 7, 2026 17:58
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took about a half hour reviewing this and I still don't think I caught everything. Before we can merge this we need:

  1. Definitions and tests for what happens at each boundary for new and old components on either side.
  2. End to end tests that cover the behavior with old SDKs.
  3. End to end tests that cover the behavior with a new SDK after we have an implementation for the new paths.

}
apiFailure.FailureInfo = &failurepb.Failure_ApplicationFailureInfo{
ApplicationFailureInfo: &failurepb.ApplicationFailureInfo{
// Make up a type here, it's not part of the Nexus Failure spec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have NexusSDKFailureErrorFailureInfo in the API PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NexusSDKFailureErrorFailureInfo doesn't have a Details field. We only get to this point if we get an unexpected error type, so I thought it was better to capture the full information.

}

func OperationErrorToTemporalFailure(opErr *nexus.OperationError) (*failurepb.Failure, error) {
func OperationErrorToTemporalFailure(opErr *nexus.OperationError, retryable bool) (*failurepb.Failure, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need the retryable flag here, operation errors are non-retryable by definition and the resulting failure object should be a NexusOperationFailureInfo. This function doesn't seem necessary anymore, you should already have the original nexus failure on the operation error so all you need to do is convert to a temporal failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought it would be unnecessary but I had some trouble removing it as this is the only place where we have the specific handling for CanceledFailureInfo.

}

func (c *HTTPClient) bestEffortHandlerErrorFromResponse(response *http.Response, body []byte) error {
func httpStatusCodeToHandlerErrorType(response *http.Response) (nexus.HandlerErrorType, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please double check that we cover all of the error types in the nexus SDK?

}
return &nexus.HandlerError{
Type: errorType,
Message: response.Status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes the HTTP status code, we need to trim that.

RetryBehavior: retryBehavior,
},
},
nf, err := nexus.DefaultFailureConverter().ErrorToFailure(handlerErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the original failure here and then convert it to a temporal failure.

require.Equal(t, enumsspb.NEXUS_OPERATION_STATE_BACKING_OFF, op.State())
require.NotNil(t, op.LastAttemptFailure.GetNexusHandlerFailureInfo())
require.Equal(t, "handler error (INTERNAL): internal server error", op.LastAttemptFailure.Message)
require.Equal(t, "internal server error", op.LastAttemptFailure.Message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check the handler error type in all of these tests.

var failureErr *nexus.FailureError
var operationErr *nexus.OperationError
switch {
case errors.As(r.Error, &failureErr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always get an operation error here.

switch t := response.GetOutcome().(type) {
case *matchingservice.DispatchNexusTaskResponse_Failure:
oc.metricsHandler = oc.metricsHandler.WithTags(metrics.OutcomeTag("handler_error:" + t.Failure.GetNexusHandlerFailureInfo().GetType()))
nf, err := commonnexus.APIFailureToNexusFailure(t.Failure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to create a properly structured nexus failure here with metadata type set to nexus.HandlerError or nexus.OperationError and make sure the cause chain is populated correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants