Closed
Conversation
Implement EpisodeReplayBuffer with support for: - Concurrent episode collection from multiple agents - N-step return sampling with configurable discount factor - Episode-based storage with explicit lifecycle control - Capacity management with automatic eviction - Full asyncio safety with internal locking The buffer stores episodes as sequences of (observation, action, reward) tuples and supports uniform sampling over all valid time steps. N-step samples automatically handle episode boundaries and compute discount powers. Includes comprehensive test suite covering: - Episode lifecycle (start, append, end) - N-step sampling with boundary handling - Concurrent access patterns - Capacity and eviction policies - Edge cases and error conditions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Contributor
Author
|
Closing as duplicate of #19 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
Summary
EpisodeReplayBufferwith support for concurrent episode collection from multiple agentsImplementation Details
The buffer stores episodes as sequences of (observation, action, reward) tuples and supports uniform sampling over all valid time steps. N-step samples automatically handle episode boundaries and compute discount powers.
Key features:
Test Plan
🤖 Generated with Claude Code
Generated description
graph LR EpisodeReplayBuffer_start_episode_("EpisodeReplayBuffer.start_episode"):::added Episode_("Episode"):::added EpisodeReplayBuffer_append_observation_action_reward_("EpisodeReplayBuffer.append_observation_action_reward"):::added EpisodeReplayBuffer_end_episode_("EpisodeReplayBuffer.end_episode"):::added EpisodeReplayBuffer_sample_n_step_("EpisodeReplayBuffer.sample_n_step"):::added EpisodeReplayBuffer_build_n_step_sample_("EpisodeReplayBuffer._build_n_step_sample"):::added NStepSample_("NStepSample"):::added EpisodeReplayBuffer_evict_if_needed_("EpisodeReplayBuffer._evict_if_needed"):::added EpisodeReplayBuffer_start_episode_ -- "Creates Episode instance for agent, registers and enforces capacity." --> Episode_ EpisodeReplayBuffer_append_observation_action_reward_ -- "Appends observation/action/reward to Episode; increments total_steps." --> Episode_ EpisodeReplayBuffer_end_episode_ -- "Marks Episode TERMINAL/TRUNCATED and appends final observation if provided." --> Episode_ EpisodeReplayBuffer_sample_n_step_ -- "Delegates per-sample construction to _build_n_step_sample." --> EpisodeReplayBuffer_build_n_step_sample_ EpisodeReplayBuffer_sample_n_step_ -- "Enumerates Episode steps to build uniform sampling positions." --> Episode_ EpisodeReplayBuffer_build_n_step_sample_ -- "Extracts obs/action/rewards, computes next_obs and flags." --> Episode_ EpisodeReplayBuffer_build_n_step_sample_ -- "Packages trajectory segment and metadata into NStepSample dataclass." --> NStepSample_ EpisodeReplayBuffer_start_episode_ -- "Triggers eviction to respect max_episodes/max_steps after creation." --> EpisodeReplayBuffer_evict_if_needed_ EpisodeReplayBuffer_append_observation_action_reward_ -- "Triggers eviction after appending to maintain capacity constraints." --> EpisodeReplayBuffer_evict_if_needed_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxIntroduces an asyncio-safe
EpisodeReplayBufferwithin thearesreinforcement learning framework, enabling concurrent collection and episode-based storage of agent experiences. This new component provides robust capacity management with automatic eviction and supports flexible n-step return sampling, crucial for various RL algorithms.NStepSamplegeneration andcompute_discounted_return, allowing the buffer to provide multi-step trajectories with configurable discount factors. The sampling logic (sample_n_step) uniformly selects experiences, correctly truncates at episode boundaries, and is thoroughly tested for various scenarios, including concurrency and edge cases, supported bypytest-asyncioconfiguration.Modified files (6)
Latest Contributors(2)
EpisodeReplayBufferand associated data structures (Episode,EpisodeStatus) to manage the lifecycle of agent experiences, including starting, appending, and ending episodes. It ensures thread-safe concurrent access usingasyncio.Lockand incorporates efficient capacity management with FIFO eviction, prioritizing finished episodes.Modified files (12)
Latest Contributors(0)