Skip to content

Add episode replay buffer for RL agents#20

Closed
joshgreaves wants to merge 1 commit intomainfrom
rowan/replay-buffer
Closed

Add episode replay buffer for RL agents#20
joshgreaves wants to merge 1 commit intomainfrom
rowan/replay-buffer

Conversation

@joshgreaves
Copy link
Contributor

@joshgreaves joshgreaves commented Jan 14, 2026

User description

Summary

  • Implement EpisodeReplayBuffer with support for concurrent episode collection from multiple agents
  • Add n-step return sampling with configurable discount factor γ
  • Provide episode-based storage with explicit lifecycle control (start, append, end)
  • Include capacity management with automatic eviction and full asyncio safety

Implementation Details

The buffer stores episodes as sequences of (observation, action, reward) tuples and supports uniform sampling over all valid time steps. N-step samples automatically handle episode boundaries and compute discount powers.

Key features:

  • Thread-safe concurrent access with internal locking
  • Efficient capacity management with FIFO eviction
  • Flexible n-step sampling that respects episode boundaries
  • Comprehensive error handling and validation

Test Plan

  • Episode lifecycle (start, append, end)
  • N-step sampling with boundary handling
  • Concurrent access patterns
  • Capacity and eviction policies
  • Edge cases and error conditions
  • All 35 tests passing
  • Ruff lint checks passing
  • Ruff format checks passing

🤖 Generated with Claude Code


Generated description

graph LR
EpisodeReplayBuffer_start_episode_("EpisodeReplayBuffer.start_episode"):::added
Episode_("Episode"):::added
EpisodeReplayBuffer_append_observation_action_reward_("EpisodeReplayBuffer.append_observation_action_reward"):::added
EpisodeReplayBuffer_end_episode_("EpisodeReplayBuffer.end_episode"):::added
EpisodeReplayBuffer_sample_n_step_("EpisodeReplayBuffer.sample_n_step"):::added
EpisodeReplayBuffer_build_n_step_sample_("EpisodeReplayBuffer._build_n_step_sample"):::added
NStepSample_("NStepSample"):::added
EpisodeReplayBuffer_evict_if_needed_("EpisodeReplayBuffer._evict_if_needed"):::added
EpisodeReplayBuffer_start_episode_ -- "Creates Episode instance for agent, registers and enforces capacity." --> Episode_
EpisodeReplayBuffer_append_observation_action_reward_ -- "Appends observation/action/reward to Episode; increments total_steps." --> Episode_
EpisodeReplayBuffer_end_episode_ -- "Marks Episode TERMINAL/TRUNCATED and appends final observation if provided." --> Episode_
EpisodeReplayBuffer_sample_n_step_ -- "Delegates per-sample construction to _build_n_step_sample." --> EpisodeReplayBuffer_build_n_step_sample_
EpisodeReplayBuffer_sample_n_step_ -- "Enumerates Episode steps to build uniform sampling positions." --> Episode_
EpisodeReplayBuffer_build_n_step_sample_ -- "Extracts obs/action/rewards, computes next_obs and flags." --> Episode_
EpisodeReplayBuffer_build_n_step_sample_ -- "Packages trajectory segment and metadata into NStepSample dataclass." --> NStepSample_
EpisodeReplayBuffer_start_episode_ -- "Triggers eviction to respect max_episodes/max_steps after creation." --> EpisodeReplayBuffer_evict_if_needed_
EpisodeReplayBuffer_append_observation_action_reward_ -- "Triggers eviction after appending to maintain capacity constraints." --> EpisodeReplayBuffer_evict_if_needed_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px
Loading

Introduces an asyncio-safe EpisodeReplayBuffer within the ares reinforcement learning framework, enabling concurrent collection and episode-based storage of agent experiences. This new component provides robust capacity management with automatic eviction and supports flexible n-step return sampling, crucial for various RL algorithms.

TopicDetails
N-Step Sampling & Returns Adds functionality for NStepSample generation and compute_discounted_return, allowing the buffer to provide multi-step trajectories with configurable discount factors. The sampling logic (sample_n_step) uniformly selects experiences, correctly truncates at episode boundaries, and is thoroughly tested for various scenarios, including concurrency and edge cases, supported by pytest-asyncio configuration.
Modified files (6)
  • pyproject.toml
  • pyproject.toml
  • src/ares/contrib/rl/replay_buffer.py
  • src/ares/contrib/rl/replay_buffer.py
  • tests/contrib/rl/test_replay_buffer.py
  • tests/contrib/rl/test_replay_buffer.py
Latest Contributors(2)
UserCommitDate
joshua.greaves@gmail.comfix-avoid-os.getlogin-...January 13, 2026
ryanscais3@gmail.comAdd-DM-Env-Interface-3December 18, 2025
Replay Buffer Core Implements the core EpisodeReplayBuffer and associated data structures (Episode, EpisodeStatus) to manage the lifecycle of agent experiences, including starting, appending, and ending episodes. It ensures thread-safe concurrent access using asyncio.Lock and incorporates efficient capacity management with FIFO eviction, prioritizing finished episodes.
Modified files (12)
  • src/ares/contrib/__init__.py
  • src/ares/contrib/__init__.py
  • src/ares/contrib/rl/__init__.py
  • src/ares/contrib/rl/__init__.py
  • src/ares/contrib/rl/replay_buffer.py
  • src/ares/contrib/rl/replay_buffer.py
  • tests/contrib/__init__.py
  • tests/contrib/__init__.py
  • tests/contrib/rl/__init__.py
  • tests/contrib/rl/__init__.py
  • tests/contrib/rl/test_replay_buffer.py
  • tests/contrib/rl/test_replay_buffer.py
Latest Contributors(0)
UserCommitDate
This pull request is reviewed by Baz. Review like a pro on (Baz).

Implement EpisodeReplayBuffer with support for:
- Concurrent episode collection from multiple agents
- N-step return sampling with configurable discount factor
- Episode-based storage with explicit lifecycle control
- Capacity management with automatic eviction
- Full asyncio safety with internal locking

The buffer stores episodes as sequences of (observation, action, reward)
tuples and supports uniform sampling over all valid time steps. N-step
samples automatically handle episode boundaries and compute discount powers.

Includes comprehensive test suite covering:
- Episode lifecycle (start, append, end)
- N-step sampling with boundary handling
- Concurrent access patterns
- Capacity and eviction policies
- Edge cases and error conditions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@joshgreaves
Copy link
Contributor Author

Closing as duplicate of #19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant