Skip to content

Conversation

@edgarpavlovsky
Copy link
Member

@edgarpavlovsky edgarpavlovsky commented Nov 6, 2025

This pull request introduces new documentation and configuration files, restructures the project layout for clarity, and adds support for terminal-bench integration. The main themes are enforcing consistent development rules (especially around Python version and dependency management), improving project organization, and providing detailed instructions for benchmarking and integration.

Development rules and documentation:

  • Added .ai-rules.md, .claude.md, .cursorrules, and WARP.md to standardize AI assistant usage, enforce Python 3.12+ requirement, and mandate the use of uv for dependency management across different environments and tools. [1] [2] [3]
  • Updated README.md to clarify runtime state directory usage, configuration file location, and reorganized the project structure to use a src/ directory for source code and a separate state/ directory for runtime data. [1] [2] [3]

Terminal-bench benchmarking integration:

  • Added benchmark/README.md and benchmark/USAGE.md with comprehensive instructions for installing, running, and troubleshooting Fireteam as a terminal-bench agent, including real-time logging, state isolation, and advanced usage tips. [1] [2]
  • Added new adapter package files: benchmark/__init__.py and benchmark/adapters/__init__.py, establishing a clear entry point and import structure for terminal-bench integration. [1] [2]

Configuration and environment updates:

  • Added ANTHROPIC_API_KEY to .env.example to support Anthropic API integration for Claude agents.

Codebase cleanup:

  • Removed the old agents/base.py file, likely in favor of the new structure under src/agents/.

These changes collectively improve developer onboarding, enforce best practices, and enable robust benchmarking and integration workflows for Fireteam.

- Refactor StateManager to use centralized config for state directory
- Add state module initialization
- Add runs/ directory to .gitignore to exclude benchmark artifacts
@edgarpavlovsky edgarpavlovsky changed the title Add terminal-bench run artifacts and improve state manager configuration Terminal bench artifacts, evolving state management into memeory management Nov 6, 2025
@edgarpavlovsky edgarpavlovsky changed the title Terminal bench artifacts, evolving state management into memeory management Terminal bench artifacts + adding memory layer Nov 6, 2025
edgarpavlovsky and others added 12 commits November 6, 2025 18:17
- Created 165 total tests (161 unit + 4 new e2e/integration)
- Added test infrastructure (conftest.py, helpers.py)
- Enhanced MemoryManager with embedding_model parameter
- Added lightweight embedding tests for fast CI
- Added E2E hello world subprocess test
- Added terminal-bench integration test
- Created GitHub Actions workflow with 3 jobs
- Updated documentation and added TODO for improvements
- Fixed config.py .env loading from repo root
- All fast tests passing (163/163)
Lightweight tests are already included in fast tests since they're not marked as slow/e2e/integration. Running them separately caused duplication.
- Marked all tests that load Qwen3 model (~1.2GB) as @pytest.mark.slow
- This excludes them from CI fast-tests job
- Fast tests now: 127 tests in ~25s (was 163 tests in ~60s)
- Slow memory tests: 36 tests (use heavy model)
- Lightweight tests: 2 tests (use 80MB model, run in fast job)

This prevents CI timeouts and keeps fast tests truly fast.
Temporarily running all tests on feature branches to validate before merging to main.
Use github.head_ref for pull request events to properly detect e/* branches.
- Use subprocess.call() to stream output directly to console
- Add --livestream flag for real-time terminal-bench output
- Simplify assertions to just check return code
- Remove output parsing (terminal-bench handles success/failure)
- This provides much better observability during long test runs
Terminal-bench tests need local debugging. E2E tests remain enabled.

Fast tests (127 tests): ✅ Running
E2E tests (1 test): ✅ Running
Integration tests: ⏸️  Disabled for now
Only run on:
- Pull requests to main
- Direct pushes to main

This prevents duplicate runs (once on push, once on PR).
Co-authored-by: edgarpavlovsky <edgarpavlovsky@gmail.com>
Co-authored-by: edgarpavlovsky <edgarpavlovsky@gmail.com>
@edgarpavlovsky edgarpavlovsky merged commit a9f057e into main Nov 7, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants