feat: Implement Cloud Run Logging trace correlation with contextvars #235

aaronjae22 · 2025-12-02T07:17:00Z

Closes #234

This could become the implementation we can use in both Trust Registry and Schema Index.

This PR enables Cloud Run trace correlation and enhanced Google Cloud logs metadata.

In the previous PR I was able to successfully get our loggers in Cloud Run so this is a iteration over that with a slightly different approach but adding more metadata to the logs.

Cloud Run automatically injects the X-Cloud-Trace-Context header into every request. And it has this format: TRACE_ID/SPAN_ID;o=TRACE_ENABLED.

I decided to somehow capture this header to get some useful information out of it. By doing so we can add important information such as request.path, request.method and some very useful labels but most importantly is that we can get the full request trace and get the entire request flow which will make debugging way more easier and complete since now we can see the complete request context.

Our current implementation (from the previous PR) achieved the task of displaying loggers into Cloud Logging but they are isolated logs.

For example, in the following screenshot we see that there is a log that indicates that OAuth callback received with no state parameter. This is a logger that exists in our codebase.

And the same user could generates different loggers such as the following but it's hard to understand how they are connected.

[Request Log] GET /callback 404
[App Log] OAuth callback received with no state parameter
[App Log] Successfully received authorization code in callback
[App Log] LOLA access denied: insufficient_scope

With Trace Correlation we'll have the following logging structure:

[Request Log] GET /callback 404  [Click on View Trace]
     └─ [App Log] OAuth callback received with no state parameter
     └─ [App Log] Successfully received authorization code in callback
     └─ [App Log] LOLA access denied: insufficient_scope

All logs grouped under the request and expandable. By clicking on "View Trace" we'll see full timeline with spans.

ContextVars

One of the key decision I had to make in this feature was how I'll get the X-Cloud-Trace-Context since google-cloud-logging library does not support this and it's known that trace correlation has to be done manually for these cases.

To achieve the goal of having better cloud logging structure we need to get per-request context (trace header, request info, etc). In my previous PR I started working with threading.local() but the implementation was not yet completely since I wanted to get the first phase of the implementation ready which was getting Cloud Logging to work as expected.

So I started researching about which approach could work the best for us and I took into consideration that we are in a WSGI environment.

ContextVars is thought for ASGI/async environments but it works perfectly in WSGI environments too since contextvars behave essentially like threading.

There are several pros that took me to consider contextvars instead of threading:

There is no performance issue
It's the modern Python standard moving forward
Django utilizes it internally
It allow us flexibility if we decide to move to an async env
Context is garbage collected when request completes
Less risk of cross-request leaks
Ensure true per-execution-context isolation

Implementation

LoggingTraceContextMiddleware serves as the entry and exit point for the trace correlation process. It captures trace information when a request first arrives and ensure that this information is cleaned up when the request completes.

When Cloud Run receives an HTTP request it automatically injects the X-Cloud-Trace-Context header into every incoming request. This header contains a unique trace identifier that Cloud Logging uses to group all logs related to that specific request. The middleware's first job is to extract this header value at the very beginning of request processing, before any application code has a chance to execute.

First, process_request extracts the trace header from the incoming request metadata and also the request path and method (we can capture more information if we want to). These information are then stored in contextvars.

Then process_response is called after view has executed and returned a response. The middleware job now is to clean up by clearing all the context variables that it set earlier. This avoids that trace information from one request could leak into subsequent requests.

Once the middleware handles the process of capturing and storing trace information, CloudRunTraceFilter is responsible for the actual enrichment of logs records. It access the contextvars that contains the trace information that was gathered previously.

This filter parses the header and extracts trace id and span id and formats it according to what Cloud Logging expects.

I decided to implement an error handling process that consists on having the parsing logic in a try-except block because I believed that we would prefer logs appear in Cloud Logging without trace correlation than risk breaking the application's logging entirely due to a malformed trace header. This will allow us to better debug too if something happens.

And lastly, the get_cloud_logging_handler checks if Cloud Logging should be enabled in the current environment by searching for USE_GCLOUD_LOGGING environment variable which is a feature flag that controls where we should use Cloud Logging or local console logging.

Google Cloud Logging is enabled only in Staging and Production. When it detects USE_GCLOUD_LOGGING first imports the necessary google cloud logging libraries and then it creates a CloudLoggingClient that makes the connection to Google Cloud Logging's API. This client handles authentication automatically via the service account that is in charge of executing the Cloud Run service (but it has to have the right permissions on Google Cloud IAM section). And lastly, it creates an instance of CloudRunTraceFilter and is where the whole process comes together.

HTTP Request
↓
LoggingTraceContextMiddleware.process_request()
→ Extracts X-Cloud-Trace-Context header
→ set_trace_context(trace, path, method)
→ Stores in contextvars
↓
Application Code Executes
→ logger.warning("OAuth callback...")
↓
CloudRunTraceFilter.filter()
→ Reads contextvars
→ Adds trace, spanId, request_path, request_method
→ Adds environment, service, revision
↓
CloudLoggingHandler
→ Sends enriched log to Cloud Logging
↓
LoggingTraceContextMiddleware.process_response()
→ clear_trace_context()
→ Cleans up for next request
↓
HTTP Response

lisad

The biggest question I have is if we're really going to copy all this and have 3 copies (one in schemaindex, one in trust registry, one in activitypub testbed. That feels like quite a bit of code we're going to have to maintain separately. I'm really annoyed at Google that this isn't easier and part of their libraries for google logging.

I can see a strong case for the investment for the Data Trust Registry. if we have errors there, we have to be able to figure out what happened. Especially with complicated things like domain verification.
I don't think we should rush to add it to SchemaIndex. That's a very simple site with simple interactions and we don't know if it's going to take off.
I am really conflicted about whether this goes in Testbed. On the one hand, debugging OAuth interactions with a trace ID is quite useful. On the other hand, it's an open source project. If anybody else were to run it, it is more than likely not on Google Cloud. They might only run it on localhost. I hate to add a bunch of stuff that is not directly related to ActivityPub and probably not useful to other users of the project. If we hand over the testbed to some other party to maintain, this gets in the way. I'd like to not have the cost of running the testbed forever under DTI.
If we refactor the middleware to be responsible for the context vars, then we perhaps have something we can maintain across 2 different projects - e.g. a note that explains what the middleware is for and that the copy in Data Trust Registry is the same as the copy elsewhere (if we have any elsewheres, that is). I also am considering what it would look like for the middleware (and possibly also the logging filter) to be packaged so it's the same code across 2 different projects, but that's complicated by the fact that one is open source and one isn't.

Let's discuss... these are all my thoughts for the moment

lisad · 2025-12-02T16:39:51Z

testbed/core/utils/logging_filters.py

+                if len(x_cloud_trace_context_parts) >= 1:
+                    trace_id = x_cloud_trace_context_parts[0]
+                    if trace_id:
+                        record.trace = f"projects/{self.project_id}/traces/{trace_id}"


If the record is being passed by value instead of by reference, then changes to the object wouldn't be carried along. Are we sure that this is an intent of the logging.Filter interface? It seems more like taking advantage of a (possibly temporary or situational) side effect of the interface.

https://docs.python.org/3/library/logging.html says :
Changed in version 3.12: You can now return a LogRecord instance from filters to replace the log record rather than modifying it in place. This allows filters attached to a Handler to modify the log record before it is emitted, without having side effects on other handlers.

lisad · 2025-12-02T16:54:03Z

testbed/core/utils/logging_filters.py

+    _request_method_var.set(request_method)


 def get_trace_context():


This is no longer used

lisad · 2025-12-02T19:17:09Z

testbed/core/utils/logging_filters.py


-# Thread-local storage for trace context
-_trace_local = threading.local()
+_trace_context_var = contextvars.ContextVar("trace_context", default=None)


I think we should move the contextvars and management of them to Middleware. The reason is that the middleware can be the utility that other things depend on. If we needed the request method, request path or trace_id anywhere else that we didn't have it, a ContextMiddleware could be called by other code as well as LoggingFilter.

It also just seems a little more robust for the logging filter to get answers from the middleware, rather than for the middleware to import the logging filter to push data to it. So this would reverse the import relationship.

aaronjae22 · 2026-01-11T18:20:53Z

UPDATE. We agreed on not implementing Cloud Run logging in this project and we moved forward to iterate over this implementation in Trust Registry and a minimal logging implementation in Schemas Pub.

Both implementation are working as expected now. Trust Registry has a middleware + filter architecture and it uses contextvars.

While doing some research about Google Cloud logging documentation I found this setup_logging information and I wanted to test it out to check if its works exactly like our Trust Registry implementation.

I found that we won't be able to add some custom fields that we included in our Trust Registry implementation but it should handle the trace correlation automatically.

This new implementation sits in the middle in our logging implementations, being Trust Registry the most complete, and Schemas Pub the most minimal.

When we call setup_logging() it will detect Django, then patches the logging system to be trace-aware and it handlers will access Django request object, extract X-Cloud-Trace-Context header and build LogEntry with trace fields in the right places.

If this works as I hope, I would like to keep this for tracking OAuth, and other useful information.

feat: Implement Cloud Run Logging trace correlation with contextvars

a51ed59

aaronjae22 self-assigned this Dec 2, 2025

aaronjae22 requested a review from lisad December 2, 2025 15:25

lisad requested changes Dec 2, 2025

View reviewed changes

feat: Implement Cloud Run logging with setup_logging()

448b312

aaronjae22 requested a review from lisad January 11, 2026 18:21

lisad approved these changes Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Cloud Run Logging trace correlation with contextvars #235

feat: Implement Cloud Run Logging trace correlation with contextvars #235

Uh oh!

aaronjae22 commented Dec 2, 2025 •

edited

Loading

Uh oh!

lisad left a comment

Uh oh!

lisad Dec 2, 2025

Uh oh!

lisad Dec 2, 2025

Uh oh!

lisad Dec 2, 2025

Uh oh!

lisad Dec 2, 2025

Uh oh!

aaronjae22 commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		_request_method_var.set(request_method)


		def get_trace_context():

feat: Implement Cloud Run Logging trace correlation with contextvars #235

Are you sure you want to change the base?

feat: Implement Cloud Run Logging trace correlation with contextvars #235

Uh oh!

Conversation

aaronjae22 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ContextVars

Implementation

Uh oh!

lisad left a comment

Choose a reason for hiding this comment

Uh oh!

lisad Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

lisad Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

lisad Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

lisad Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

aaronjae22 commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aaronjae22 commented Dec 2, 2025 •

edited

Loading