Add episode replay buffer for RL agents by joshgreaves · Pull Request #20 · withmartian/ares

joshgreaves · 2026-01-14T22:42:20Z

User description

Summary

Implement EpisodeReplayBuffer with support for concurrent episode collection from multiple agents
Add n-step return sampling with configurable discount factor γ
Provide episode-based storage with explicit lifecycle control (start, append, end)
Include capacity management with automatic eviction and full asyncio safety

Implementation Details

The buffer stores episodes as sequences of (observation, action, reward) tuples and supports uniform sampling over all valid time steps. N-step samples automatically handle episode boundaries and compute discount powers.

Key features:

Thread-safe concurrent access with internal locking
Efficient capacity management with FIFO eviction
Flexible n-step sampling that respects episode boundaries
Comprehensive error handling and validation

Test Plan

Episode lifecycle (start, append, end)
N-step sampling with boundary handling
Concurrent access patterns
Capacity and eviction policies
Edge cases and error conditions
All 35 tests passing
Ruff lint checks passing
Ruff format checks passing

🤖 Generated with Claude Code

Generated description

graph LR
EpisodeReplayBuffer_start_episode_("EpisodeReplayBuffer.start_episode"):::added
Episode_("Episode"):::added
EpisodeReplayBuffer_append_observation_action_reward_("EpisodeReplayBuffer.append_observation_action_reward"):::added
EpisodeReplayBuffer_end_episode_("EpisodeReplayBuffer.end_episode"):::added
EpisodeReplayBuffer_sample_n_step_("EpisodeReplayBuffer.sample_n_step"):::added
EpisodeReplayBuffer_build_n_step_sample_("EpisodeReplayBuffer._build_n_step_sample"):::added
NStepSample_("NStepSample"):::added
EpisodeReplayBuffer_evict_if_needed_("EpisodeReplayBuffer._evict_if_needed"):::added
EpisodeReplayBuffer_start_episode_ -- "Creates Episode instance for agent, registers and enforces capacity." --> Episode_
EpisodeReplayBuffer_append_observation_action_reward_ -- "Appends observation/action/reward to Episode; increments total_steps." --> Episode_
EpisodeReplayBuffer_end_episode_ -- "Marks Episode TERMINAL/TRUNCATED and appends final observation if provided." --> Episode_
EpisodeReplayBuffer_sample_n_step_ -- "Delegates per-sample construction to _build_n_step_sample." --> EpisodeReplayBuffer_build_n_step_sample_
EpisodeReplayBuffer_sample_n_step_ -- "Enumerates Episode steps to build uniform sampling positions." --> Episode_
EpisodeReplayBuffer_build_n_step_sample_ -- "Extracts obs/action/rewards, computes next_obs and flags." --> Episode_
EpisodeReplayBuffer_build_n_step_sample_ -- "Packages trajectory segment and metadata into NStepSample dataclass." --> NStepSample_
EpisodeReplayBuffer_start_episode_ -- "Triggers eviction to respect max_episodes/max_steps after creation." --> EpisodeReplayBuffer_evict_if_needed_
EpisodeReplayBuffer_append_observation_action_reward_ -- "Triggers eviction after appending to maintain capacity constraints." --> EpisodeReplayBuffer_evict_if_needed_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Introduces an asyncio-safe EpisodeReplayBuffer within the ares reinforcement learning framework, enabling concurrent collection and episode-based storage of agent experiences. This new component provides robust capacity management with automatic eviction and supports flexible n-step return sampling, crucial for various RL algorithms.

Topic Details

N-Step Sampling & Returns

Adds functionality for NStepSample generation and compute_discounted_return, allowing the buffer to provide multi-step trajectories with configurable discount factors. The sampling logic (sample_n_step) uniformly selects experiences, correctly truncates at episode boundaries, and is thoroughly tested for various scenarios, including concurrency and edge cases, supported by pytest-asyncio configuration.

Modified files (6)

pyproject.toml
pyproject.toml
src/ares/contrib/rl/replay_buffer.py
src/ares/contrib/rl/replay_buffer.py
tests/contrib/rl/test_replay_buffer.py
tests/contrib/rl/test_replay_buffer.py

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	fix-avoid-os.getlogin-...	January 13, 2026
ryanscais3@gmail.com	Add-DM-Env-Interface-3	December 18, 2025

Replay Buffer Core

Implements the core EpisodeReplayBuffer and associated data structures (Episode, EpisodeStatus) to manage the lifecycle of agent experiences, including starting, appending, and ending episodes. It ensures thread-safe concurrent access using asyncio.Lock and incorporates efficient capacity management with FIFO eviction, prioritizing finished episodes.

Modified files (12)

src/ares/contrib/__init__.py
src/ares/contrib/__init__.py
src/ares/contrib/rl/__init__.py
src/ares/contrib/rl/__init__.py
src/ares/contrib/rl/replay_buffer.py
src/ares/contrib/rl/replay_buffer.py
tests/contrib/__init__.py
tests/contrib/__init__.py
tests/contrib/rl/__init__.py
tests/contrib/rl/__init__.py
tests/contrib/rl/test_replay_buffer.py
tests/contrib/rl/test_replay_buffer.py

Latest Contributors(0)

User	Commit	Date

This pull request is reviewed by Baz. Review like a pro on (Baz).

Implement EpisodeReplayBuffer with support for: - Concurrent episode collection from multiple agents - N-step return sampling with configurable discount factor - Episode-based storage with explicit lifecycle control - Capacity management with automatic eviction - Full asyncio safety with internal locking The buffer stores episodes as sequences of (observation, action, reward) tuples and supports uniform sampling over all valid time steps. N-step samples automatically handle episode boundaries and compute discount powers. Includes comprehensive test suite covering: - Episode lifecycle (start, append, end) - N-step sampling with boundary handling - Concurrent access patterns - Capacity and eviction policies - Edge cases and error conditions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

joshgreaves · 2026-01-14T22:44:55Z

Closing as duplicate of #19

joshgreaves closed this Jan 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add episode replay buffer for RL agents#20

Add episode replay buffer for RL agents#20
joshgreaves wants to merge 1 commit intomainfrom
rowan/replay-buffer

joshgreaves commented Jan 14, 2026 •

edited by baz-reviewer bot

Loading

Uh oh!

joshgreaves commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joshgreaves commented Jan 14, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Implementation Details

Test Plan

Generated description

Uh oh!

joshgreaves commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshgreaves commented Jan 14, 2026 •

edited by baz-reviewer bot

Loading