Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 49 additions & 2 deletions text/0000-opentelemetry-tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ nav_order: 3

- Feature Name: OpenTelemetry Tracing Integration
- Start Date: 2021-01-12
- Update Date: 2021-12-06
- RFC PR: (leave this empty)
- Fabric Component: core, sdks
- Fabric Issue: (leave this empty)
Expand Down Expand Up @@ -70,9 +71,50 @@ The trace and span ID must be sent to the chain as a message header.
The SDK should use the standard environment variable environment to let users define how and if they want to report
trace data to an endpoint of their choosing.

## Changes to peers and orderers
To find bottleneck for fabric processing. At SDK side, we can add open tracing base on txid. Not only at GRPC side, but also each tx processing, a sample here: https://github.com/Hyperledger-TWGC/tape/tree/alpha, looks like:

![A sample for full id tracing](https://user-images.githubusercontent.com/7820992/141783841-a4cee4c5-3275-4b58-b2f6-0c34e37d6e6f.png)

Peers and orderers capture and propagate trace information using an optional gRPC metadata header, if enabled.
## Changes to peers and orderers
Peers and orderers capture and propagate trace information using an optional gRPC metadata header, if enabled.

To find bottleneck for fabric processing, a sample here: https://github.com/SamYuan1990/fabric/tree/opentracing23 if we enable open tracing for peer and orderer it looks like:
![A sample for peer commit phase](https://user-images.githubusercontent.com/7820992/144846677-6044b69f-d72b-490f-bcf6-0d161a7c1755.png)

Reminder, OpenTelemetry/OpenTracing have created systems to coordinate and correlate traces based on trace ID/parent trace ID/span ID, and are better suited for correlation.
The txid can be added to the trace as an attribute, but is not a valid identifier as the trace can originate outside Fabric.
As we have two hash exists(one generated by OpenTelemetry and another one is txid). It’s best to leave the id of the trace or span to the system generating the trace, especially as traces will not just originate from Fabric.
We are able to put txid as business hash value as an attribute, is there any way for us to search the span id by attribute value.
Then we are able to use txid as attribute to search in OpenTelemetry. For details see [Usage](###Usage).

### Peer
Open tracing at peer nodes can be discussed in two different point of views.

- From the workflow point of view:
1. Endorsement considering, basing on txid and tracing. The endorsement interface process tx via txid as unique identifier, we can easily apply with open tracing.
1. Commit considering, basing on block processing. From the workflow point of considering, it's better to tracing block for each channel.
1. Gossip considering, basing on block processing?

- From business point of view: tx id for transaction only.

### Orderer
Envelopes and consensus phase, as orderer's interface with line:
https://github.com/hyperledger/fabric/blob/main/orderer/consensus/consensus.go#L63
we can tracing envelops in two phases, business envelops process or consensus at fundation level.
- From business point of view: tx id for transaction only.

### Usage:
As screen shot below
![txid based tracing](https://user-images.githubusercontent.com/7820992/150667822-9dfc93f7-2757-4008-8db1-0d4d589f3c7d.png)
we are able to use
```
txid=${txid}
```
as condition for our search among OpenTelemetry system. For ex, we searched value for specific txid with value below:
```
txid=31a230b23c67f854ec4e9505283270055e16259d0a07e210f27cb8decb64d086
```
and from result, we get both endorsement and commit phase time usage/span from OpenTelemetry.

# Drawbacks
[drawbacks]: #drawbacks
Expand All @@ -82,6 +124,11 @@ OpenTelemetry is still relatively young, yet has reached maturity for traces sup
The OpenTelemetry reporting system happens securely over Protobuf, with containers and client applications sending data.
This requires that an OpenTelemetry-compatible endpoint is present to receive the data.

For ex,
- At SDK side, to process with open telemetry for each txid. We need analysis blocks and it will make additional cost.

- Together with opentelemetry which will adding more effort on operation and monitoring/integeration works.

# Rationale and alternatives
[alternatives]: #alternatives

Expand Down