feat(server): add VenvStackExecutor for stateful, sandboxed code execution#74
feat(server): add VenvStackExecutor for stateful, sandboxed code execution#74bxff wants to merge 5 commits intotilesprivacy:mainfrom
Conversation
…ution Introduces a secure code execution engine using venvstacks with macOS sandbox-exec isolation. Features: - VenvStackExecutor class for session-based code execution - Persistent state across multiple tool calls within a session - Isolated virtual environments per session via venvstacks - Kernel-level sandboxing via Apple's sandbox-exec (macOS) - Automatic cleanup of session environments - Network access denial and file system restrictions Components: - engine.py: VenvStackExecutor class, execute_sandboxed_venvstack function - sandbox.sb: Template sandbox profile for macOS isolation - test_venvstack_sandbox.py: Comprehensive test suite This provides the foundation for safe, stateful code interpreter functionality.
- Remove unused 'types' import - Add cross-platform support (Windows Scripts vs Unix bin) - Use base64+pickle for state serialization instead of repr() - Fix possibly unbound variable 'profile_path' issue - Replace bare except clauses with specific exception handling - Add cleanup_executor() function to manage session cache - Align use_sandbox default to False consistently - Improve docstrings with clearer parameter documentation
📝 WalkthroughWalkthroughAdds a stateful, per-session sandboxed execution system centered on a new VenvStackExecutor and supporting utilities. Implements session ID sanitization, safe pickle deserialization, sandbox profile generation, per-session temp directories, package installation logic, session state persistence, and optional macOS sandbox-exec integration. Introduces a VenvStacksManager to manage venvstacks lifecycles and resolve interpreters. Exposes get_or_create_executor, cleanup_executor, and execute_sandboxed_venvstack, and integrates the executor into the existing execute_sandboxed_code flow. Adds sandbox profile, venvstacks config, tests, and a new dependency on venvstacks. Sequence Diagram(s)sequenceDiagram
participant Client
participant API as MemAgentAPI
participant Manager as VenvStacksManager
participant Executor as VenvStackExecutor
participant Sandbox as sandbox-exec / subprocess
participant FS as Workspace/Temp FS
Client->>API: execute_sandboxed_venvstack(code, session_id, workspace, timeout, use_sandbox)
API->>Executor: get_or_create_executor(session_id, workspace)
Executor->>Manager: get_interpreter_python(framework) / ensure_stacks_ready (if using venvstacks)
Manager-->>Executor: interpreter_path or fallback info
Executor->>FS: create session temp dirs, validate paths, load session state (safe pickle)
Executor->>FS: prepare sandbox profile (if use_sandbox)
Executor->>Sandbox: launch python subprocess (optionally via sandbox-exec) with venv/venvstack
Sandbox->>FS: read/write workspace, venv, session state files
Sandbox-->>Executor: stdout/stderr, exit code, serialized session state
Executor->>FS: persist new session state (safe pickle), cleanup per policy
Executor-->>API: (result dict, combined output)
API-->>Client: return execution result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@server/mem_agent/engine.py`:
- Around line 417-426: The sandbox profile substitution in _get_sandbox_profile
uses raw self.venv_path and self.workspace_path which can contain special
characters and enable sandbox bypass; before doing the replacements (and before
reading/using the profile), validate or escape these paths: in
_get_sandbox_profile validate self.venv_path and self.workspace_path against a
strict safe-path pattern (e.g. allow only alphanumerics, hyphen, underscore,
dot, slash) or explicitly reject/control newlines, quotes, parentheses and other
metacharacters; if a path fails validation, log a warning via logger.warning and
return "" (or raise), otherwise perform the profile.replace using the validated
values; reference SANDBOX_PROFILE_PATH, _get_sandbox_profile, self.venv_path and
self.workspace_path when making the change.
- Around line 606-621: The executor cache (_executor_cache) and accessors
get_or_create_executor and cleanup_executor are not thread-safe; protect
accesses by adding a module-level lock (e.g., _executor_cache_lock =
threading.Lock()) and use it to guard the check-then-create path in
get_or_create_executor (acquire lock before checking session_id and creating
VenvStackExecutor, then release), and also acquire the same lock inside
cleanup_executor when calling cleanup() and deleting the entry to avoid race
conditions and duplicate executor creation for the same session_id; ensure you
import threading and use the same lock for both functions.
- Around line 375-380: The code constructs venv_path using the raw session_id
which permits path traversal; change the initialization in the constructor that
sets self.session_id and self.venv_path to validate or sanitize session_id
(e.g., allow only a safe whitelist like [A-Za-z0-9_-] or replace/encode unsafe
chars) and then compute venv_path; additionally, after joining base_venv_path
and the sanitized id, verify the resulting path is inside base_venv_path using a
secure check (os.path.normpath + os.path.commonpath) and raise an error if
validation fails; update references to self.session_id/self.venv_path
accordingly.
- Around line 562-576: The pickle.loads call that deserializes subprocess output
(the pickle.loads(result.stdout) in server/mem_agent/engine.py which then calls
self._session_state.update) is an accepted-but-documented risk because the
subprocess runs user code; update the repository threat model/docs to explicitly
state this deserialization escalation vector and why it is accepted, and then
either replace pickle.loads with a safer alternative (e.g., JSON via json.loads
for simple state types and updating state accordingly) or implement a
RestrictedUnpickler that only permits safe types before calling
self._session_state.update so that if the sandbox is bypassed the parent process
isn’t trivially compromised.
In `@server/mem_agent/sandbox.sb`:
- Around line 40-44: The sandbox rule currently grants broad file-read* and
file-write* access to subpath "/private/var/folders" and "/tmp" via the allow
expression, which risks cross-session data exposure; change this to only allow
the specific session/workspace temp subdirectory (e.g., the same pattern used
for ${WORKSPACE_PATH}) by replacing the generic subpath entries with a subpath
that references the session-specific temp directory (or construct one and use
that symbol) so only files under that workspace/session are readable and
writable by the process.
- Around line 31-38: The sandbox profile is vulnerable because unescaped
${VENV_PATH} and ${WORKSPACE_PATH} are directly substituted from
StartRequest.memory_path into the profile (engine.py), allowing injection of
sandbox rules; fix it by validating and sanitizing any path before substitution:
either reject paths containing sandbox-significant characters (quotes,
parentheses, semicolons, newlines, backslashes) or escape them safely (escape
quotes, backslashes and remove/control newlines), and centralize this logic in
the function that performs the substitution (the code that replaces
"${VENV_PATH}" / "${WORKSPACE_PATH}" in engine.py) so all inputs are
normalized/validated before being interpolated into the sandbox profile.
🧹 Nitpick comments (5)
server/tests/test_venvstack_sandbox.py (2)
145-154: Unusederrorvariable should be prefixed with underscore.The
errorvariable is unpacked but never used in this test. Per the static analysis hint, prefix it with an underscore to indicate it's intentionally unused.💡 Suggested fix
- result, error = self.executor.execute(code, use_sandbox=True) + result, _error = self.executor.execute(code, use_sandbox=True)
165-170: Same unused variable issue here.💡 Suggested fix
- result, error = self.executor.execute(code, use_sandbox=True) + result, _error = self.executor.execute(code, use_sandbox=True)server/mem_agent/sandbox.sb (1)
80-81: Remove(trace deny)before production deployment.As the comment notes, this is for debugging. In production, tracing denied operations could leak information about sandbox behavior and may have performance overhead. Consider gating this with a build-time flag or removing it entirely.
server/mem_agent/engine.py (2)
541-541: Use spread operator for cleaner list construction.Per static analysis hint, prefer unpacking over concatenation.
💡 Suggested fix
- cmd = ["sandbox-exec", "-f", profile_path] + cmd + cmd = ["sandbox-exec", "-f", profile_path, *cmd]
578-583: Consider usinglogger.exceptionfor better stack traces.In exception handlers,
logger.exceptionautomatically includes the stack trace. Line 582 already uses it but redundantly passese. Lines 579 loses the traceback.💡 Suggested fix
except subprocess.TimeoutExpired: - logger.error(f"[SANDBOX] Timeout after {timeout}s") + logger.exception(f"[SANDBOX] Timeout after {timeout}s") return None, f"Execution timeout ({timeout}s)" except Exception as e: - logger.exception(f"[SANDBOX] Exception during execution: {e}") + logger.exception("[SANDBOX] Exception during execution") return None, f"Execution error: {e}"
…tacks
Security improvements:
1. **Sandbox profile path injection prevention**
- Add _validate_sandbox_path() to reject paths with metacharacters
- Validate venv_path, workspace_path, and session_temp_path before
substituting into sandbox profile
- Reject paths containing quotes, parentheses, semicolons, newlines
2. **Thread-safe executor cache**
- Add _executor_cache_lock (threading.Lock) to protect cache operations
- Guard get_or_create_executor() and cleanup_executor() with lock
- Prevent race conditions and duplicate executor creation
3. **Path traversal prevention**
- Add _sanitize_session_id() to whitelist alphanumerics, hyphen, underscore
- Add _validate_path_containment() using os.commonpath for secure check
- Verify venv_path stays inside base_venv_path
4. **Safer pickle deserialization**
- Implement RestrictedUnpickler with whitelist of safe types
- Add _safe_pickle_loads() to use restricted unpickler
- Limits damage if sandbox is bypassed and malicious pickle returned
5. **Session-specific temp directory isolation**
- Add session_temp_path for per-session temp files
- Update sandbox.sb to use ${SESSION_TEMP_PATH} instead of broad /tmp access
- Prevents cross-session data exposure via temp files
- Clean up session temp directory on cleanup()
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/mem_agent/engine.py`:
- Around line 576-586: The venv creation subprocess.run call in engine.py (the
block that checks python_path and runs [sys.executable, "-m", "venv",
self.venv_path]) has no timeout and can hang; add a reasonable timeout parameter
to subprocess.run (e.g., timeout=30 or configurable via self or class constant),
wrap the call in a try/except that catches subprocess.TimeoutExpired, log the
timeout (including self.venv_path and the timeout value) and raise or handle
cleanup appropriately (ensure self._initialized is not set on failure); keep the
change localized to the venv creation block where subprocess.run is invoked and
reference self.venv_path, python_path, and self._initialized.
🧹 Nitpick comments (4)
server/mem_agent/engine.py (4)
432-458: Consider removing("builtins", "type")from the safe types whitelist.The whitelist includes
("builtins", "type")which allows deserializing type objects. While unlikely to be exploited directly, type objects are not typical session state data and including them slightly expands the attack surface.If type objects aren't needed in session state, consider removing this entry:
("builtins", "bool"), - ("builtins", "type"), ("builtins", "NoneType"),
539-541: Remove unnecessary f-string prefix.The error message string has an
fprefix but contains no placeholders.- raise ValueError( - f"Invalid session_id: resulting venv_path escapes base directory" - ) + raise ValueError( + "Invalid session_id: resulting venv_path escapes base directory" + )
760-777: Uselogger.exceptionfor better error logging.When catching exceptions, use
logger.exceptioninstead oflogger.errorto automatically include the stack trace. This aids debugging.🔧 Suggested fix
except pickle.UnpicklingError as e: error_msg = f"Unsafe pickle data rejected: {e}" - logger.error(f"[SANDBOX] Security: {error_msg}") + logger.exception("[SANDBOX] Security: Unsafe pickle data rejected") return None, error_msg except Exception as e: error_msg = ( f"Failed to parse output: {e}\nStderr: {result.stderr.decode()}" ) - logger.error(f"[SANDBOX] Parse error: {error_msg}") + logger.exception("[SANDBOX] Parse error") return None, error_msg except subprocess.TimeoutExpired: - logger.error(f"[SANDBOX] Timeout after {timeout}s") + logger.exception(f"[SANDBOX] Timeout after {timeout}s") return None, f"Execution timeout ({timeout}s)"
733-736: Use spread operator for list concatenation.Minor style improvement per Ruff RUF005.
- cmd = ["sandbox-exec", "-f", profile_path] + cmd + cmd = ["sandbox-exec", "-f", profile_path, *cmd]
| if not os.path.exists(python_path): | ||
| # Create venv using system Python | ||
| subprocess.run( | ||
| [sys.executable, "-m", "venv", self.venv_path], | ||
| check=True, | ||
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.DEVNULL, | ||
| ) | ||
| logger.info(f"Created venv at {self.venv_path}") | ||
|
|
||
| self._initialized = True |
There was a problem hiding this comment.
Add timeout to venv creation subprocess.
The subprocess.run call for venv creation has no timeout. If the venv creation hangs (e.g., due to disk issues or filesystem locks), the calling thread will block indefinitely.
🔧 Suggested fix
if not os.path.exists(python_path):
# Create venv using system Python
subprocess.run(
[sys.executable, "-m", "venv", self.venv_path],
check=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
+ timeout=60, # Reasonable timeout for venv creation
)
logger.info(f"Created venv at {self.venv_path}")🧰 Tools
🪛 Ruff (0.14.13)
578-578: subprocess call: check for execution of untrusted input
(S603)
🤖 Prompt for AI Agents
In `@server/mem_agent/engine.py` around lines 576 - 586, The venv creation
subprocess.run call in engine.py (the block that checks python_path and runs
[sys.executable, "-m", "venv", self.venv_path]) has no timeout and can hang; add
a reasonable timeout parameter to subprocess.run (e.g., timeout=30 or
configurable via self or class constant), wrap the call in a try/except that
catches subprocess.TimeoutExpired, log the timeout (including self.venv_path and
the timeout value) and raise or handle cleanup appropriately (ensure
self._initialized is not set on failure); keep the change localized to the venv
creation block where subprocess.run is invoked and reference self.venv_path,
python_path, and self._initialized.
Add proper venvstacks integration to VenvStackExecutor: - VenvStacksManager for building/exporting portable venvstacks - venvstacks.toml config with datascience and minimal frameworks - Automatic fallback to per-session venv when venvstacks not available - Shared Python environment reduces session startup time - Package installation blocked in shared mode (by design) The executor now tries venvstacks first, falling back to creating a per-session venv if stacks aren't built. This enables: - Fast session startup with pre-built environments - Consistent package versions across all sessions - Lower disk usage (shared layers) Added tests for security helpers and venvstacks integration.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/mem_agent/engine.py (1)
322-326: UseRestrictedUnpicklerfor legacy sandbox output parsing.
execute_sandboxed_codestill deserializes user-controlled subprocess output withpickle.loads, reintroducing the escalation vector you just mitigated elsewhere.🔧 Suggested fix
- local_vars, error_msg = pickle.loads(result.stdout) + local_vars, error_msg = _safe_pickle_loads(result.stdout) + except pickle.UnpicklingError as e: + return None, f"Unsafe pickle data rejected: {e}"
🤖 Fix all issues with AI agents
In `@server/mem_agent/engine.py`:
- Around line 365-381: The whitelist currently rejects spaces and callers like
_get_sandbox_profile silently disable sandboxing when validation fails; update
_validate_sandbox_path to permit space characters in allowed paths (adjust the
_SAFE_PATH_PATTERN to include literal space) and change _get_sandbox_profile so
that when use_sandbox is True and _validate_sandbox_path(path) returns False it
fails closed (raise a clear exception such as ValueError or RuntimeError)
instead of falling back to no sandbox; apply the same fail-closed behavior to
the other related validation usage around the _get_sandbox_profile logic.
- Line 775: The execute() function currently logs raw user code with logger.info
which can leak secrets; change that call to either logger.debug or remove the
code payload and log only safe metadata (e.g., code length, language, or a
redacted preview). Update the call to logger.debug(f"[SANDBOX] execute() called
with code preview: {redacted_preview}") or log metadata like
logger.info(f"[SANDBOX] execute() called, code_length=%d", len(code)) and ensure
any preview is redacted and only emitted when logger.isEnabledFor(logging.DEBUG)
to avoid PII leakage; modify the existing logger.info usage accordingly.
- Around line 685-691: The sandbox whitelist currently always uses
venvstacks_manager.export_dir which fails for built-but-not-exported stacks;
change the selection logic in _get_sandbox_profile()/where python_env_path is
set so that when self._using_venvstacks and self._venvstacks_manager is true you
prefer export_dir only if the stack was exported (or export_dir is
present/valid), otherwise use self._venvstacks_manager.build_dir; if no
venvstacks manager or not using venvstacks, keep using self.venv_path. Update
references to python_env_path, _using_venvstacks, _venvstacks_manager,
export_dir and build_dir accordingly.
- Around line 785-817: The generated multi-line string assigned to wrapped_code
is indented inside the method, causing leading spaces in the produced script and
an IndentationError; import textwrap at the top of the file and wrap the
f-string template with textwrap.dedent(...) (i.e., set wrapped_code =
textwrap.dedent(f'''...''')) so the script has no unintended leading indentation
while keeping references like wrapped_code and self.workspace_path unchanged.
- Around line 383-411: The current _sanitize_session_id function can collapse
distinct session IDs into the same sanitized string; instead, detect when
sanitization changes the input (use _SAFE_SESSION_ID_PATTERN to check) and if it
does either raise ValueError or (preferred) append a stable hash suffix to
preserve uniqueness: compute a short hex digest (e.g. first 8 chars of
hashlib.sha256(session_id.encode()).hexdigest()) and append it (preceded by an
underscore) to the sanitized value; also add import hashlib near the top and
update tests to expect the hash-suffixed sanitized ID when invalid characters
are present.
In `@server/mem_agent/venvstacks_manager.py`:
- Around line 371-389: The function ensure_stacks_ready currently returns
success if manager.is_exported() is true even when the interpreter from
manager.get_interpreter_python(framework) is missing or invalid; change the
logic to always validate the interpreter before returning success: call
manager.get_interpreter_python(framework) and verify python and python.exists()
both when short-circuiting on manager.is_exported() and after any build/export
steps, and if the interpreter is missing attempt a rebuild/export (via
manager.build_stacks() and manager.export_stacks()) or return False with an
explicit error message referencing manager.export_dir and the missing
interpreter; ensure you reference the existing symbols manager.is_exported(),
manager.get_interpreter_python(framework), python.exists(),
manager.build_stacks(), manager.export_stacks(), and manager.export_dir when
locating and updating the code.
- Around line 126-147: The build_stacks method invokes the venvstacks CLI which
always writes output into a "__venvstacks__" subfolder, but is_built() currently
checks self.build_dir directly causing false negatives and infinite rebuilds in
ensure_stacks_ready(); update is_built() to check for (self.build_dir /
"__venvstacks__") existence (or, alternatively, switch build_stacks to use the
venvstacks Python API and pass the build_dir so outputs land in self.build_dir),
and keep references to the existing methods build_stacks, is_built, and
ensure_stacks_ready when making the change so the readiness check aligns with
where the CLI actually writes artifacts.
In `@server/tests/test_venvstack_sandbox.py`:
- Around line 141-176: The tests test_sandbox_network_denied and
test_sandbox_file_access_restricted capture two-return values from
self.executor.execute but bind the second to an unused name `error`; rename that
binding to `_` or `_error` (e.g., `result, _error = self.executor.execute(...)`
or `result, _ = ...`) in both functions to satisfy the linter while preserving
the returned result handling.
♻️ Duplicate comments (1)
server/mem_agent/engine.py (1)
615-621: Add timeout to venv creation subprocess.This can hang indefinitely if venv creation stalls. (Duplicate of prior review note.)
🔧 Suggested fix
if not os.path.exists(python_path): # Create venv using system Python subprocess.run( [sys.executable, "-m", "venv", self.venv_path], check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, + timeout=60, )
server/mem_agent/engine.py
Outdated
| # Prepare code with state restoration/saving | ||
| wrapped_code = f''' | ||
| import pickle | ||
| import base64 | ||
| import sys | ||
| import os | ||
|
|
||
| # Restore session state safely via pickle | ||
| _state_b64 = "{state_b64}" | ||
| _session_state = pickle.loads(base64.b64decode(_state_b64)) | ||
|
|
||
| # Make session state available as globals | ||
| globals().update(_session_state) | ||
|
|
||
| # Change to workspace | ||
| os.chdir({repr(self.workspace_path)}) | ||
|
|
||
| # Execute user code | ||
| {code} | ||
|
|
||
| # Capture updated state (only picklable values) | ||
| _new_state = {{}} | ||
| for k, v in dict(locals()).items(): | ||
| if not k.startswith("_"): | ||
| try: | ||
| pickle.dumps(v) | ||
| _new_state[k] = v | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Output state | ||
| sys.stdout.buffer.write(pickle.dumps((_new_state, None))) | ||
| ''' |
There was a problem hiding this comment.
Wrapped script likely raises IndentationError.
The triple-quoted string is indented within the method, so the generated script starts with leading spaces and fails at top-level. Dedent the template before writing.
🔧 Suggested fix
- wrapped_code = f'''
+ wrapped_code = textwrap.dedent(f"""
import pickle
import base64
import sys
import os
@@
sys.stdout.buffer.write(pickle.dumps((_new_state, None)))
-'''
+""")Add import textwrap near the top of the file.
🧰 Tools
🪛 Ruff (0.14.13)
800-800: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
In `@server/mem_agent/engine.py` around lines 785 - 817, The generated multi-line
string assigned to wrapped_code is indented inside the method, causing leading
spaces in the produced script and an IndentationError; import textwrap at the
top of the file and wrap the f-string template with textwrap.dedent(...) (i.e.,
set wrapped_code = textwrap.dedent(f'''...''')) so the script has no unintended
leading indentation while keeping references like wrapped_code and
self.workspace_path unchanged.
Security improvements: - _get_sandbox_profile now fails closed when use_sandbox=True and validation fails (raises RuntimeError/ValueError) instead of silently disabling the sandbox - _sanitize_session_id appends a hash suffix when sanitization changes the input, preventing collisions like 'a/b' and 'a_b' - _SAFE_PATH_PATTERN now allows space characters for paths like '/Users/test/My Documents' - execute() now logs only safe metadata (code_length) instead of raw user code which could leak secrets Bug fixes: - _get_sandbox_profile now prefers export_dir if exported, otherwise falls back to build_dir for built-but-not-exported stacks - is_built() now checks for __venvstacks__ marker directory to align with where venvstacks CLI actually writes artifacts - ensure_stacks_ready() now always validates the interpreter exists before returning success, attempting rebuild if missing - Removed leading indentation from wrapped_code to prevent IndentationError in generated scripts Test updates: - Updated _sanitize_session_id tests to expect hash suffix - Added test for spaces in paths - Renamed unused 'error' variables to '_error' in sandbox tests
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/mem_agent/venvstacks_manager.py`:
- Around line 95-106: The subprocess.run call that executes venvstacks (the call
assigning to result using full_cmd) needs a timeout to avoid hanging; add a
timeout parameter (preferably configurable via a function/class option or
derived from the venvstacks subcommand like shorter for "show" and longer for
"build"/"lock"), pass it into subprocess.run, and catch
subprocess.TimeoutExpired to log via logger.error and raise VenvStacksError
(preserving context) when check is True; ensure existing behavior for non-zero
returncode (result.returncode, result.stderr) remains unchanged.
♻️ Duplicate comments (1)
server/mem_agent/engine.py (1)
624-629: Add timeout to venv creation subprocess.The
subprocess.runfor venv creation lacks a timeout. If venv creation hangs (e.g., due to disk issues or filesystem locks), the thread blocks indefinitely.🔧 Suggested fix
subprocess.run( [sys.executable, "-m", "venv", self.venv_path], check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, + timeout=120, # Reasonable timeout for venv creation )
🧹 Nitpick comments (5)
server/tests/test_venvstack_sandbox.py (1)
207-212: Test classes defined afterif __name__ == "__main__"block.The
TestSecurityHelpersand subsequent test classes are defined after theif __name__ == "__main__":block (lines 208-209). While this is valid Python, it's unconventional and can cause confusion. Consider moving theif __name__ == "__main__"block to the end of the file.♻️ Suggested reordering
Move lines 208-209 to the end of the file (after line 387) so all test classes are defined before the main entry point.
server/mem_agent/engine.py (3)
559-562: Remove extraneousfprefix from f-string without placeholders.Line 561 uses an f-string but contains no interpolation placeholders.
🔧 Suggested fix
if not _validate_path_containment(self.venv_path, self.base_venv_path): raise ValueError( - f"Invalid session_id: resulting venv_path escapes base directory" + "Invalid session_id: resulting venv_path escapes base directory" )
917-926: Uselogging.exceptionin exception handlers for automatic traceback.In exception handlers,
logging.exceptionautomatically includes the traceback. Also, line 932'slogger.exception(f"... {e}")includes the exception redundantly.♻️ Suggested fix
except pickle.UnpicklingError as e: error_msg = f"Unsafe pickle data rejected: {e}" - logger.error(f"[SANDBOX] Security: {error_msg}") + logger.exception("[SANDBOX] Security: %s", error_msg) return None, error_msg except Exception as e: error_msg = ( f"Failed to parse output: {e}\nStderr: {result.stderr.decode()}" ) - logger.error(f"[SANDBOX] Parse error: {error_msg}") + logger.exception("[SANDBOX] Parse error: %s", error_msg) return None, error_msg except subprocess.TimeoutExpired: - logger.error(f"[SANDBOX] Timeout after {timeout}s") + logger.exception("[SANDBOX] Timeout after %ds", timeout) return None, f"Execution timeout ({timeout}s)" except Exception as e: - logger.exception(f"[SANDBOX] Exception during execution: {e}") + logger.exception("[SANDBOX] Exception during execution") return None, f"Execution error: {e}"
884-884: Consider using list unpacking for command construction.Ruff suggests using
["sandbox-exec", "-f", profile_path, *cmd]instead of concatenation for cleaner code.♻️ Suggested fix
- cmd = ["sandbox-exec", "-f", profile_path] + cmd + cmd = ["sandbox-exec", "-f", profile_path, *cmd]server/mem_agent/venvstacks_manager.py (1)
344-353: Consider thread-safe initialization for global manager.The
get_venvstacks_manager()function has a potential race condition if called from multiple threads simultaneously. While the manager is mostly stateless after initialization (just paths), consider using a lock for consistency with the thread-safe patterns used inengine.py.♻️ Suggested fix
+import threading + # Global manager instance (lazy initialization) _global_manager: Optional[VenvStacksManager] = None +_global_manager_lock = threading.Lock() def get_venvstacks_manager() -> VenvStacksManager: """Get or create the global VenvStacksManager instance.""" global _global_manager - if _global_manager is None: - _global_manager = VenvStacksManager() + if _global_manager is None: + with _global_manager_lock: + if _global_manager is None: + _global_manager = VenvStacksManager() return _global_manager
| result = subprocess.run( | ||
| full_cmd, | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
|
|
||
| if result.returncode != 0: | ||
| logger.error(f"venvstacks failed: {result.stderr}") | ||
| if check: | ||
| raise VenvStacksError(f"venvstacks command failed: {result.stderr}") | ||
|
|
||
| return result |
There was a problem hiding this comment.
Add timeout to venvstacks subprocess calls.
The subprocess.run call lacks a timeout. Venvstacks operations (especially build and lock) can take significant time or hang, blocking the caller indefinitely.
🔧 Suggested fix
+ # Long timeout for build operations which can take several minutes
+ timeout = 600 # 10 minutes
+
result = subprocess.run(
full_cmd,
capture_output=True,
text=True,
+ timeout=timeout,
)Consider making the timeout configurable or varying it based on the command type (e.g., shorter for show, longer for build).
🧰 Tools
🪛 Ruff (0.14.13)
95-95: subprocess call: check for execution of untrusted input
(S603)
104-104: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@server/mem_agent/venvstacks_manager.py` around lines 95 - 106, The
subprocess.run call that executes venvstacks (the call assigning to result using
full_cmd) needs a timeout to avoid hanging; add a timeout parameter (preferably
configurable via a function/class option or derived from the venvstacks
subcommand like shorter for "show" and longer for "build"/"lock"), pass it into
subprocess.run, and catch subprocess.TimeoutExpired to log via logger.error and
raise VenvStacksError (preserving context) when check is True; ensure existing
behavior for non-zero returncode (result.returncode, result.stderr) remains
unchanged.
Features
Security Hardening
VenvStacks Architecture
When venvstacks are built and exported, sessions share pre-built Python environments:
datascience(numpy, pandas, matplotlib, scikit-learn, scipy, seaborn, requests, pillow, sympy) orminimal(numpy, requests)interpreterandinterpreter-minimalThis reduces session startup time and disk usage. Falls back to per-session venvs when stacks aren't available.
Components
engine.py: VenvStackExecutor class, security helpers, execute_sandboxed_venvstack functionvenvstacks_manager.py: VenvStacksManager for building/exporting portable environmentsvenvstacks.toml: Stack configuration with datascience and minimal frameworkssandbox.sb: Template sandbox profile for macOS isolationtest_venvstack_sandbox.py: Comprehensive test suite (22 tests)This provides the foundation for safe, stateful code interpreter functionality.