Skip to content

Conversation

@ExactDoug
Copy link


Selective Environment Variable Inheritance

📋 Summary

Implements a security-first tier-based environment variable inheritance system that ensures MCP servers receive necessary system variables while preventing accidental credential leakage.


Problem Solved

Previously, MCP servers started by mcp-debug did not inherit any environment variables from the parent process. This caused servers to fail because they lacked critical system variables like:

  • PATH (couldn't find executables)
  • HOME, USER (couldn't resolve user context)
  • TMPDIR/TEMP (couldn't write temporary files)
  • SSL/TLS certificate paths (couldn't make secure network requests)

Simply inheriting all variables would fix functionality but create security risks by exposing sensitive credentials (AWS_ACCESS_KEY_ID, GITHUB_TOKEN, SSH_AUTH_SOCK, etc.) to untrusted or experimental MCP servers.


Solution: Tier-Based Inheritance

Tier 1 (Baseline) - Always inherited by default

  • PATH, HOME, USER, SHELL
  • LANG, LC_ALL, TZ
  • TMPDIR, TEMP, TMP

Tier 2 (Network/TLS) - Opt-in via mode

  • SSL_CERT_FILE, SSL_CERT_DIR
  • REQUESTS_CA_BUNDLE, CURL_CA_BUNDLE
  • NODE_EXTRA_CA_CERTS

Security Model

Mode Behavior
none No inherited variables (only explicit env: entries)
tier1 Tier 1 baseline variables (default)
tier1+tier2 Tier 1 + Tier 2 network/TLS variables
all Tier 1 + Tier 2 only (does NOT inherit all parent variables)

Files Changed

  • Environment inheritance implementation in proxy core
  • DRAFT_ENV_INHERITANCE.md - Comprehensive documentation
  • Configuration schema updates
---

Selective Environment Variable Inheritance

📋 Summary

Implements a security-first tier-based environment variable inheritance system that ensures MCP servers receive necessary system variables while preventing accidental credential leakage.


Problem Solved

Previously, MCP servers started by mcp-debug did not inherit any environment variables from the parent process. This caused servers to fail because they lacked critical system variables like:

  • PATH (couldn't find executables)
  • HOME, USER (couldn't resolve user context)
  • TMPDIR/TEMP (couldn't write temporary files)
  • SSL/TLS certificate paths (couldn't make secure network requests)

Simply inheriting all variables would fix functionality but create security risks by exposing sensitive credentials (AWS_ACCESS_KEY_ID, GITHUB_TOKEN, SSH_AUTH_SOCK, etc.) to untrusted or experimental MCP servers.


Solution: Tier-Based Inheritance

Tier 1 (Baseline) - Always inherited by default

  • PATH, HOME, USER, SHELL
  • LANG, LC_ALL, TZ
  • TMPDIR, TEMP, TMP

Tier 2 (Network/TLS) - Opt-in via mode

  • SSL_CERT_FILE, SSL_CERT_DIR
  • REQUESTS_CA_BUNDLE, CURL_CA_BUNDLE
  • NODE_EXTRA_CA_CERTS

Security Model

Mode Behavior
none No inherited variables (only explicit env: entries)
tier1 Tier 1 baseline variables (default)
tier1+tier2 Tier 1 + Tier 2 network/TLS variables
all Tier 1 + Tier 2 only (does NOT inherit all parent variables)

Key security principle: Variables beyond the defined tiers must be explicitly requested via extra or prefix. There is no mode that inherits all parent environment variables.

Additional Controls

  • extra: Explicit allow list of additional variables
  • prefix: Pattern matching (e.g., "MY_APP_*")
  • deny: Block specific variables even if otherwise allowed
  • allow_denied_if_explicit: Override denials for explicit env: entries

Configuration Examples

proxy:
  defaultInherit:
    mode: tier1  # Security-first default

servers:
  - name: untrusted-server
    transport: stdio
    command: python3
    args: ["-m", "experimental_server"]
    # No inherit = uses proxy default (tier1)

  - name: trusted-server
    transport: stdio
    command: python3
    args: ["-m", "my_server"]
    inherit:
      mode: tier1+tier2  # Add network/TLS variables
      extra: ["PYTHONPATH", "VIRTUAL_ENV"]
      prefix: ["MY_APP_"]
      deny: ["SSH_AUTH_SOCK"]
    env:
      CUSTOM_VAR: "value"  # Explicit values always work

Commits

Commit Description
[f6f181d](ExactDoug@f6f181d) - [9c58f91](ExactDoug@9c58f91) Core implementation
[cbbfab1](ExactDoug@cbbfab1) Documentation corrections

Files Changed

  • Environment inheritance implementation in proxy core
  • DRAFT_ENV_INHERITANCE.md - Comprehensive documentation
  • Configuration schema updates

Doug Mortensen and others added 23 commits January 12, 2026 12:49
- Add MergeEnvironment() to merge parent env with overrides
- Update Connect() to use inherited environment
- Handle Windows case-insensitive env vars (PATH vs Path)
- Add comprehensive unit tests with platform-specific cases
- Maintain 100% backward compatibility (no schema changes)

Fixes: Environment variables now inherit from parent process
and apply per-server overrides, ensuring servers receive
critical system variables (PATH, HOME, SSL certs, proxies)
while allowing custom configuration.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add tier-based inheritance modes (none, tier1, tier1+tier2, all)
- Tier1: Baseline vars (PATH, HOME, USER, SHELL, LANG, LC_ALL, TZ, TMPDIR, TEMP, TMP)
- Tier2: Network/TLS vars (SSL_CERT_FILE, SSL_CERT_DIR, REQUESTS_CA_BUNDLE, CURL_CA_BUNDLE, NODE_EXTRA_CA_CERTS)
- Support extras, prefixes, denylist, allow_denied_if_explicit
- Block HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy/NO_PROXY/no_proxy by default (httpoxy mitigation)
- Add comprehensive unit tests with 12 test cases
- Update all 4 integration points (ProxyServer, Discoverer, DynamicProxyServer, DynamicWrapper)
- Add validation for inheritance config
- Default mode: tier1 (security-first)

Security improvements:
- Prevents accidental secret leakage to upstream MCP servers
- Controlled inheritance with auditable configuration
- Per-server and proxy-level defaults
- Explicit override support

Test coverage:
- 12 new unit tests for envbuilder (all passing)
- 9 existing tests maintained (zero regressions)
- Manual testing verified tier1, tier1+tier2, and mode=all scenarios
- HTTP_PROXY blocking and allow_denied_if_explicit override tested

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ance

When DynamicWrapper creates ProxyServer, it pre-assigns the baseServer
instance (which has management tools registered) to proxyServer.mcpServer
before calling Initialize().

Previously, ProxyServer.Initialize() unconditionally created a new mcpServer
instance, causing static server tools to be registered to an orphaned server
that never handled requests. This resulted in only management tools being
exposed (5 tools) while static config server tools (26 drmm_* tools) were
invisible.

Fix: Only create a new mcpServer if one doesn't already exist. This ensures
that static tools are registered to the same server instance that handles
incoming requests.

Result: All 31 tools now properly exposed (5 management + 26 static server).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove trailing whitespace in main.go
- Add clarifying comment about mark3labs/mcp-go stdio compatibility
- Remove mcp-debug binary from root (builds to ./bin/mcp-debug instead)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add ENV_INHERITANCE.md with complete guide (31 KB)
- Update README.md with environment inheritance section
- Add 4 example configs (basic, tier2, advanced, none)
- Add 4 test fixtures for inheritance testing
- Document tier system, modes, security rationale, troubleshooting

Documentation marked as DRAFT pending broader real-world testing.

Successfully tested with Datto RMM MCP server:
- Tier1 vars inherited (PATH, HOME, USER, SHELL, locale)
- DATTO_* prefix matching working
- Secrets properly blocked (AWS keys, etc.)
- All 31 tools exposed and functional

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add CLAUDE.md with project context and development guidelines for AI
assistants working on the mcp-debug codebase.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed json.MarshalIndent() to json.Marshal() in EnableRecording()
to fix critical playback incompatibility. The playback parser expects
line-delimited JSON (JSONL), but MarshalIndent creates multi-line
pretty-printed JSON that breaks parsing.

The header was being written as:
{
  "start_time": "...",
  "server_info": "..."
}

But the parser reads line-by-line, so each line failed JSON parsing
and the entire header was silently ignored.

Now writes single-line JSON as expected:
{"start_time":"...","server_info":"...","messages":[]}

Verified:
- Recording file creates valid JSONL format
- Playback parser successfully loads sessions
- Header fields are correctly parsed

Fixes: Recording → playback pipeline now functional

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Static servers from config.yaml were using proxy.CreateProxyHandler
which had no recording capability. Only dynamically added servers
(via server_add) used DynamicWrapper.createDynamicProxyHandler with
recording.

Changes:
- Add RecorderFunc type and optional recorder parameter to proxy.CreateProxyHandler
- Add recording calls for requests, responses, and errors in CreateProxyHandler
- Add recorderFunc field to ProxyServer
- Inject DynamicWrapper.recordMessage into ProxyServer when recording enabled
- Update CreateHandlerForTool to accept optional recorder

This fixes the critical bug where recording files were created but
captured zero messages when using static servers.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added detailed documentation for session recording feature in proxy mode:

New File:
- docs/RECORDING.md - Complete guide covering:
  - Recording format (JSONL structure)
  - What gets recorded (static/dynamic servers, management tools, errors)
  - Playback modes (client/server)
  - Common workflows (debugging, regression testing, documentation)
  - Tips & best practices (filtering, sensitive data, file management)
  - Troubleshooting guide
  - Current limitations and future enhancements

Updated:
- README.md - Added link to recording documentation
- README.md - Clarified what gets recorded (all servers, management tools)

The recording feature is now fully functional and documented for:
- Static servers (from config.yaml)
- Dynamic servers (via server_add)
- All management operations
- Request/response pairs with timestamps
- Error responses

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix: Enable Recording for Proxy Mode (Static & Dynamic Servers)
Corrected documentation to match the more secure implementation:
- Clarified that mode "none" means no inheritance (only explicit env: values)
- Clarified that mode "all" only inherits Tier 1 + Tier 2 (not all parent vars)
- Emphasized Tier 1 as guaranteed baseline (always inherited unless explicitly denied)
- Updated mode descriptions and examples throughout
- Corrected troubleshooting section to reflect actual behavior
- Added security benefits of the actual implementation
- Documentation now accurately represents the implemented security model

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
docs: correct ENV_INHERITANCE documentation to match implementation
…ation

Allows static servers (from config.yaml) to be managed dynamically using
server_disconnect/server_reconnect while preserving all configuration:
- Environment variables (including sensitive credentials)
- Inheritance settings (tier1/tier2 modes)
- Timeout configuration
- All other config.yaml settings

Key changes:
- populateStaticServers() adds static servers to dynamicServers map on init
- server_reconnect command parameter now optional
- When command omitted, uses stored ServerConfig (preserves env vars)
- When command provided, creates new config (existing behavior)

Use case: Hot-swap server binaries during development without exposing
secrets to MCP client. Particularly useful for servers with API keys
and credentials in config.yaml.

Example workflow:
  server_disconnect: {name: "my-server"}
  server_reconnect: {name: "my-server"}  # Uses stored config with secrets

⚠️  NEEDS TESTING: Implementation complete but requires validation with
real-world MCP servers (e.g., Datto RMM with API credentials).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…nagement

Enable dynamic management of static servers with config preservation
When recording is enabled, all tool responses now include metadata
showing the recording file path and purpose. This helps users
understand that their interactions are being logged.

Metadata format:
📹 Recording: session.jsonl
   Full path: /path/to/session.jsonl
   Purpose: JSON-RPC message log for debugging and playback testing

Changes:
- Add recordFilename field to DynamicWrapper to store recording path
- Create addRecordingMetadata() helper to inject metadata into responses
- Update all tool response injection points (static, dynamic, management)
- Add metadataFunc parameter to CreateProxyHandler for static tools
- Update documentation in RECORDING.md and README.md

The metadata is:
- MCP protocol compliant (uses standard ContentItem array)
- Only active when --record flag is used
- Thread-safe with existing recording mutex
- Includes both relative and absolute paths

Tested with datto-rmm MCP server - all 8 tool types working correctly
with metadata appearing in responses and properly recorded to JSONL.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…cement

Add recording metadata to tool responses
This commit fixes three critical bugs introduced by recent feature merges:

1. Static Server Registration Failure
   - populateStaticServers() now tracks failed servers instead of silently skipping them
   - Failed servers are added to dynamicServers with IsConnected=false
   - Enables server_reconnect to work for never-connected servers
   - Stores discovery error messages for debugging
   Files: integration/proxy_server.go, integration/dynamic_wrapper.go

2. Stdio Communication Race Conditions
   - Added requestMu mutex to serialize all I/O operations in StdioClient
   - Removed async goroutine in sendRequest() to prevent concurrent pipe access
   - Implemented copy-on-use pattern in createDynamicProxyHandler to prevent use-after-free
   - Fixes "write |1: broken pipe" errors during concurrent tool calls
   Files: client/stdio_client.go, integration/dynamic_wrapper.go

3. Recording Metadata Content Mutation
   - Implemented copy-on-write in addRecordingMetadata() to avoid mutating input
   - Creates new CallToolResult instead of appending to existing Content slice
   - Prevents protocol violations from shared slice mutations
   File: integration/dynamic_wrapper.go

All fixes are backward compatible and maintain existing functionality.
Tests pass with race detector enabled.

Fixes issues reported in user's Datto RMM server workflow.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The sendRequest() method was checking c.connected while holding requestMu, but the connected field is protected by c.mu (used in Connect() and Close()). This caused a data race where sendRequest could read stale values of the connected field.

Fix: Read connected field while holding c.mu, then proceed with requestMu for I/O operations.

This resolves the issue where reconnected servers would fail with 'client not connected' errors despite successful reconnection.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ferences

This commit fixes three critical bugs in the server_disconnect/server_reconnect workflow:

1. Stale Client Removal (CRITICAL)
   - server_disconnect now removes the closed client from w.proxyServer.clients list
   - Previously, dead clients remained in the list and could be found by populateStaticServers()
   - This caused "client not connected" errors after reconnect
   File: integration/dynamic_wrapper.go (lines 550-561)

2. Mutex Protection for Client List Access (CRITICAL)
   - server_reconnect now acquires w.proxyServer.mu before accessing w.proxyServer.clients
   - Handles case where client was removed by disconnect (appends if not found)
   - Prevents data race on slice access between reconnect and other operations
   File: integration/dynamic_wrapper.go (lines 703-722)

3. Atomic State Transition (HIGH)
   - Deferred setting serverInfo.IsConnected=true until AFTER all state updates complete
   - Client list, registry, and all internal state updated before marking as connected
   - Eliminates race window where IsConnected=true but proxy doesn't know about new client
   File: integration/dynamic_wrapper.go (lines 700, 747-748)

Root Cause:
The disconnect handler was not cleaning up the proxy server's client list, leaving
stale references that would be found and used after reconnect, causing tool calls
to fail with "client not connected" errors.

Testing:
- Build successful
- All tests pass with race detector (go test -race ./...)
- No data races detected

This resolves the issue where server_reconnect succeeds but subsequent tool calls
fail because they use the old dead client instead of the new connected client.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…esses

This commit fixes remaining race conditions in StdioClient where the connected
flag was being read without proper mutex protection.

Bugs Fixed:
1. CallTool() was checking c.connected without holding c.mu
2. Initialize() was checking c.connected without holding c.mu
3. ListTools() was checking c.connected without holding c.mu

These created race conditions where methods could read stale values of the
connected flag, causing "client not connected" errors even when the client
was actually connected.

Changes:
- Added mutex-protected reads of connected flag in CallTool, Initialize, ListTools
- Added debug logging to track connected flag state transitions:
  * Connect() logs when connected=true is set
  * CallTool() logs the connected state when called
  * Close() logs when connected=false is set
- Added "log" import to client package

Pattern Used:
All methods now use the same pattern as sendRequest():
```go
c.mu.Lock()
connected := c.connected
c.mu.Unlock()
```

This ensures atomic reads of the connected flag without holding the lock
during expensive operations.

Testing:
- Build successful
- Race detector clean (go test -race ./...)
- Debug logging added for troubleshooting

Created docs/debugging-plan.md documenting the investigation and fixes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
After implementing mutex protection and debug logging (commit 81c9f04),
user testing revealed the issue persists. Debug logs provided critical evidence:

    04:33:54 - Connect() SUCCESS: datto-rmm - connected=true  (NEW CLIENT)
    04:34:57 - CallTool(): connected=false                    (OLD CLIENT)

This proves tool calls use the OLD client, not the NEW one after reconnect.

Root Cause Identified:
- Static servers use CreateProxyHandler() which captures client in closure
- Closures are immutable - reconnect creates new client but handlers use old one
- Dynamic servers use createDynamicProxyHandler() which looks up current client
- Dynamic pattern works because it doesn't capture client reference

Solution:
Make static servers use the dynamic handler pattern to eliminate closure capture.
Next commit will implement this fix.

Updated debugging-plan.md to document this investigation and findings.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This fixes the root cause of tool calls failing after server_reconnect.

Problem:
Static servers from config.yaml used CreateProxyHandler() which captured
the client reference in a closure at initialization time. When server_reconnect
created a new client instance, the old handler closures still referenced the
disconnected client, causing "client not connected" errors.

Debug evidence showed tool calls using OLD client (connected=false) instead
of NEW client (connected=true) after reconnect.

Solution:
Make all servers (static and dynamic) use the dynamic handler pattern that
looks up the current client at call time instead of capturing it in a closure.

Changes:

1. integration/proxy_server.go (lines 102-115):
   - Removed static handler creation with CreateProxyHandler()
   - Tools still registered in registry (needed for discovery)
   - Note added that handlers will be created by DynamicWrapper

2. integration/dynamic_wrapper.go:
   - Added createHandlersForAllTools() method
   - Called after populateStaticServers() in Initialize()
   - Creates dynamic handlers for ALL tools using createDynamicProxyHandler()
   - Dynamic handlers look up current client from dynamicServers map at call time

Pattern:
```go
// OLD (broken): Client captured in closure at init time
handler := proxy.CreateProxyHandler(mcpClient, ...)

// NEW (working): Client looked up at call time
handler := w.createDynamicProxyHandler(serverName, toolName)
  -> looks up w.dynamicServers[serverName].Client every call
```

Benefits:
- Static servers can now be hot-swapped like dynamic servers
- server_reconnect updates dynamicServers map with new client
- All handlers automatically use the new client on next call
- No need to re-register handlers or update closures

Testing:
- Build successful
- Race detector clean (go test -race ./...)
- Ready for disconnect/reconnect testing

This completes the fix designed in plan mode.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…tration

Fix: Enable hot-swapping for all servers by unifying handler pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant