Skip to content

Comments

Refactoring the Demuxers for chunked Tensor data#181

Open
rwkeane wants to merge 2 commits intomainfrom
refactor/demuxer-receives-chunks-b
Open

Refactoring the Demuxers for chunked Tensor data#181
rwkeane wants to merge 2 commits intomainfrom
refactor/demuxer-receives-chunks-b

Conversation

@rwkeane
Copy link
Owner

@rwkeane rwkeane commented Jun 20, 2025

This commit refactors TensorDemuxer and SmoothedTensorDemuxer to process SerializableTensorChunk objects, aligning with a chunk-based data model. Unit tests have been carefully updated to ensure minimal diffs for existing test cases while adding new tests for chunk-specific scenarios.

Core Refactoring:

  1. TensorDemuxer (tensor_demuxer.py):

    • I renamed on_update_received to on_chunk_received, accepting
      SerializableTensorChunk (using chunk.tensor for data and
      chunk.timestamp.as_datetime() for timestamp conversion).
    • I preserved internal keyframe storage and cascading update logic.
    • I ensured _on_keyframe_updated is called at most once per affected
      timestamp after a chunk is processed and causes a state change.
  2. SmoothedTensorDemuxer (smoothed_tensor_demuxer.py):

    • I updated on_chunk_received to delegate to the parent class's
      method for chunk processing.

Unit Test Updates (Iterative Approach):

  • Initial Broad Refactor & Subsequent Refinement: I iteratively refined test updates based on feedback to ensure minimal changes to
    existing test cases.
  • tensor_demuxer_unittest.py:
    • Existing tests: Calls to on_update_received(idx, val, ts) were
      transformed to on_chunk_received(Chunk(ts, idx, [val])).
    • Assertions for mock client notifications (call_count) were
      carefully adjusted to reflect the rule that notifications occur
      only if a keyframe's final calculated value changes, for both
      direct updates and cascades.
    • I added new tests to specifically cover scenarios with
      multi-update chunks.
  • smoothed_tensor_demuxer_unittest.py:
    • Minimal changes were required as these tests primarily verify
      smoothing logic by directly manipulating parent keyframe history,
      not by calling the data input method.
  • Test Helpers: I added MockSynchronizedTimestamp and an inlined
    SerializableTensorChunk (with tensor field) to test
    files to ensure compatibility with TensorDemuxer's expectation of
    timestamp.as_datetime() and field naming.
  • E2E Test Adjustments: I applied necessary fixes to
    tensor_e2etest.py for compatibility with SerializableTensorChunk
    (field name and timestamp object).

Code Quality and Verification:

  • Comments: I reviewed and refined application code comments to
    explain "why" not just "what," and removed placeholder/meta-comments.
  • Placeholder File: I deleted the temporary
    serializable_tensor_chunk_placeholder.py.
  • Static Analysis: All changes passed Black, Ruff, MyPy, and Pylint.
  • Testing: The full pytest suite (895 tests) passed consistently
    across two runs, confirming the correctness and integration of changes.

This work completes the transition of the demuxer components to the new chunk-based data transport model while maintaining test coverage and adhering to project standards.

google-labs-jules bot and others added 2 commits June 20, 2025 00:57
…nput and updating the corresponding tests.

Here's a summary of what I've done:

This commit refactors `TensorDemuxer` and `SmoothedTensorDemuxer` to process
`SerializableTensorChunk` objects, aligning with a chunk-based data model.
Unit tests have been carefully updated to ensure minimal diffs for existing
test cases while adding new tests for chunk-specific scenarios.

**Core Refactoring:**

1.  **`TensorDemuxer` (`tensor_demuxer.py`):**
    *   I renamed `on_update_received` to `on_chunk_received`, accepting
        `SerializableTensorChunk` (using `chunk.tensor` for data and
        `chunk.timestamp.as_datetime()` for timestamp conversion).
    *   I preserved internal keyframe storage and cascading update logic.
    *   I ensured `_on_keyframe_updated` is called at most once per affected
        timestamp after a chunk is processed and causes a state change.

2.  **`SmoothedTensorDemuxer` (`smoothed_tensor_demuxer.py`):**
    *   I updated `on_chunk_received` to delegate to the parent class's
        method for chunk processing.

**Unit Test Updates (Iterative Approach):**

*   **Initial Broad Refactor & Subsequent Refinement:** I iteratively refined test updates based on feedback to ensure minimal changes to
    existing test cases.
*   **`tensor_demuxer_unittest.py`:**
    *   Existing tests: Calls to `on_update_received(idx, val, ts)` were
        transformed to `on_chunk_received(Chunk(ts, idx, [val]))`.
    *   Assertions for mock client notifications (`call_count`) were
        carefully adjusted to reflect the rule that notifications occur
        only if a keyframe's final calculated value changes, for both
        direct updates and cascades.
    *   I added new tests to specifically cover scenarios with
        multi-update chunks.
*   **`smoothed_tensor_demuxer_unittest.py`:**
    *   Minimal changes were required as these tests primarily verify
        smoothing logic by directly manipulating parent keyframe history,
        not by calling the data input method.
*   **Test Helpers:** I added `MockSynchronizedTimestamp` and an inlined
    `SerializableTensorChunk` (with `tensor` field) to test
    files to ensure compatibility with `TensorDemuxer`'s expectation of
    `timestamp.as_datetime()` and field naming.
*   **E2E Test Adjustments:** I applied necessary fixes to
    `tensor_e2etest.py` for compatibility with `SerializableTensorChunk`
    (field name and timestamp object).

**Code Quality and Verification:**

*   **Comments:** I reviewed and refined application code comments to
    explain "why" not just "what," and removed placeholder/meta-comments.
*   **Placeholder File:** I deleted the temporary
    `serializable_tensor_chunk_placeholder.py`.
*   **Static Analysis:** All changes passed Black, Ruff, MyPy, and Pylint.
*   **Testing:** The full `pytest` suite (895 tests) passed consistently
    across two runs, confirming the correctness and integration of changes.

This work completes the transition of the demuxer components to the new
chunk-based data transport model while maintaining test coverage and adhering
to project standards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant