Skip to content

Replace stored contexts with explicit tracing and cancellation metadata #67

@iand

Description

@iand

When an incoming event is queued by a behaviour's Notify method the context supplied in the method call is also queued alongside the event and reused when the event is actioned by Perform. This was primarily intended to preserve the tracing context for the event so that the event and its consequent outbound events could be traced through the system of coordinator, behaviours and state machines. Secondarily it was intended to allow context cancellation to be effective through the state machines. (Originally the context would be checked for cancellation before actioning the associated event but this has been lost through refactorings)

However these goals can only be attained if the context is consistently preserved everywhere. Currently the coordinator uses its own independent context when dispatching events between state machines and the events emitted by a behaviour's Perform method are done so without their associated context.

Additionally this storing of the context can be harmful if the context is used for an event generated as a side effect, such as a rouuting notification that adds a node to the include queue. This should have its own independent context that is not subject to the parent context's cancellation.

We should remove the storage of context and use a different mechanism to carry tracing and cancellation metadata.

Proposal

Tracing

Extend BehaviourEvent to have a SpanContext method:

// SpanContext returns tracing information associated with the event.
SpanContext() trace.SpanContext

A SpanContext holds the trace id, span id and other tracing flags that should be associated with the event. See spancontext in the specification.

Each outbound event that is generated as a direct result of actioning an inbound event should copy the SpanContext to the new event. Functions that process an event should use the SpanContext, for example:

ctx, span := c.tele.Tracer.Start(trace.ContextWithSpanContext(ctx, ev.SpanContext()), "Coordinator.AddNodes")
defer span.End()

When an event is submitted to the coordinator's Notify method (from an external source or as a result of calling a helper method like Coordinator.Bootstrap) an SpanContext should be created that using a method like trace.SpanContextFromContext.

Cancellation/Deadlines

Events that initiate queries (EventStartFindCloserQuery, EventStartMessageQuery) and broadcasts (EventStartBroadcast) should include a Deadline field that can be used to specify a deadline for the query. The query state machines should use this to terminate the query once it has passed its deadline and the relevant waitForQuery or waitForBroadcast functions can use to create a context with an appropriate deadline.

Events that initiate outbound network requests (EventOutboundGetCloserNodes and EventOutboundSendMessage) should also carry a deadline, inherited from the query that ultimately generated the request event.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions