Fix issue related to empty batch #271

jiashuy · 2026-01-19T07:11:30Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

jiashuy · 2026-01-19T07:14:36Z

CI

jiashuy · 2026-01-20T02:38:37Z

CI

jiashuy · 2026-01-20T02:44:58Z

CI

jiashuy · 2026-01-20T07:28:02Z

CI

shijieliu · 2026-01-26T06:56:21Z

ci

jiashuy · 2026-01-27T01:59:20Z

Fix a bug and relaunch CI

greptile-apps · 2026-01-27T02:00:28Z

Greptile Summary

This PR fixes critical issues related to empty batch handling in the dynamic embedding system and addresses a shared memory configuration bug in the optimizer.

Key Changes:

Added early return guards for empty batches (batch == 0) in KeyValueTable.update(), DynamicEmbeddingTable.lookup(), and DynamicEmbeddingTable.update() to prevent unnecessary operations and potential crashes
Fixed uninitialized memory bug by changing torch.empty() to torch.zeros() for num_missing_0 and num_missing_1 tensors in lookup operations
Added empty batch checks in CUDA functions (load_from_combined_table(), select(), select_index()) to avoid launching kernels with zero elements
Fixed Invalid Memory Access (IMA) in rowwise adagrad optimizer by properly configuring shared memory size via a lambda function parameter
Added comprehensive test coverage for empty batch scenarios and non-multiple-of-4 dimensions

Impact:
The changes prevent runtime errors and invalid memory access when processing empty batches, which can occur in real-world scenarios. The shared memory fix resolves a critical bug that could cause GPU crashes.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
All changes are defensive bug fixes with clear intent and comprehensive test coverage. The empty batch guards prevent edge case crashes, the memory initialization fix addresses undefined behavior, and the shared memory configuration fix resolves a documented IMA issue. The test additions validate the fixes work correctly.
No files require special attention

Important Files Changed

Filename	Overview
corelib/dynamicemb/dynamicemb/key_value_table.py	Added early returns for empty batches in `update()` methods and changed `torch.empty` to `torch.zeros` for `num_missing` tensors to fix uninitialized memory issues
corelib/dynamicemb/src/dynamic_emb_op.cu	Added early return for empty batches in `load_from_combined_table()` by moving size check before variable declarations
corelib/dynamicemb/src/index_calculation.cu	Added empty batch checks in `select()` and `select_index()` functions to prevent unnecessary CUDA operations
corelib/dynamicemb/src/optimizer.cu	Fixed shared memory configuration for rowwise adagrad optimizer by adding `smem_size_f` parameter with default lambda
corelib/dynamicemb/test/test_batched_dynamic_embedding_tables_v2.py	Added comprehensive test for empty batch handling and expanded test coverage for non-multiple-of-4 dimensions

Sequence Diagram

sequenceDiagram
    participant Client
    participant KeyValueTable
    participant DynamicEmbeddingTable
    participant CUDA as CUDA Operations
    participant Optimizer

    Note over Client,Optimizer: Empty Batch Handling Flow

    Client->>KeyValueTable: lookup(keys) with batch=0
    KeyValueTable->>KeyValueTable: Check batch == 0
    KeyValueTable-->>Client: Early return (0, empty tensors)

    Client->>DynamicEmbeddingTable: lookup(keys) with batch=0
    DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
    DynamicEmbeddingTable-->>Client: Early return (0, empty tensors)

    Client->>KeyValueTable: update(keys, grads) with batch=0
    KeyValueTable->>KeyValueTable: Check batch == 0
    KeyValueTable-->>Client: Early return (0, None, None)

    Client->>DynamicEmbeddingTable: update(keys, grads) with batch=0
    DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
    DynamicEmbeddingTable-->>Client: Early return (0, None, None)

    Note over CUDA: CUDA Layer Protection

    Client->>CUDA: load_from_combined_table(indices) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Client->>CUDA: select(flags, inputs) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Client->>CUDA: select_index(flags, indices) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Note over Optimizer: Optimizer Fix

    Client->>Optimizer: rowwise_adagrad_for_combined_table()
    Optimizer->>Optimizer: Configure shared memory: smem_size = block_size * sizeof(float)
    Optimizer->>CUDA: Launch kernel with correct shared memory
    CUDA-->>Optimizer: Success
    Optimizer-->>Client: Complete

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T02:00:28Z

corelib/dynamicemb/dynamicemb/key_value_table.py

+        if batch == 0:
+            return None, None, None


logic: Inconsistent return value with DynamicEmbeddingTable.update() which returns 0, None, None for empty batches (line 1371). Should this also return 0, None, None instead of None, None, None?

Suggested change

if batch == 0:

return None, None, None

return 0, None, None

greptile-apps · 2026-01-27T02:07:01Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

jiashuy · 2026-01-27T02:07:04Z

CI

shijieliu · 2026-01-27T09:02:47Z

corelib/dynamicemb/dynamicemb/key_value_table.py

        missing = torch.logical_not(founds)
-        num_missing_0: torch.Tensor = torch.empty(1, dtype=torch.long, device=device)
-        num_missing_1: torch.Tensor = torch.empty(1, dtype=torch.long, device=device)
+        num_missing_0: torch.Tensor = torch.zeros(1, dtype=torch.long, device=device)


why do we need torch.zeros here?

jiashuy added the dynamicemb Related with dynamicemb label Jan 19, 2026

jiashuy requested a review from shijieliu January 20, 2026 02:38

jiashuy added 6 commits January 26, 2026 14:54

Fix issue related to empty batch

237fca9

Empty batch in prefetch

ff91396

Empty batch in backward

2be63fd

debug

d430904

Fix IMA caused by not configuring shared memory size

f0f387d

Remove comments in update

a239176

shijieliu force-pushed the empty_batch_de branch from e4e4c66 to a239176 Compare January 26, 2026 06:54

fix return value of DynamicEmbeddingTable.update when empty batch

67b39ed

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

fix as AI suggests, AI is useful and helpful

4dcb338

shijieliu reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue related to empty batch #271

Fix issue related to empty batch #271

jiashuy commented Jan 19, 2026 •

edited

Loading

Uh oh!

jiashuy commented Jan 19, 2026 •

edited

Loading

Uh oh!

jiashuy commented Jan 20, 2026 •

edited

Loading

Uh oh!

jiashuy commented Jan 20, 2026 •

edited

Loading

Uh oh!

jiashuy commented Jan 20, 2026

Uh oh!

shijieliu commented Jan 26, 2026

Uh oh!

jiashuy commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 27, 2026

Uh oh!

greptile-apps bot commented Jan 27, 2026

Uh oh!

jiashuy commented Jan 27, 2026

Uh oh!

shijieliu Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix issue related to empty batch #271

Are you sure you want to change the base?

Fix issue related to empty batch #271

Conversation

jiashuy commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

jiashuy commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiashuy commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiashuy commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiashuy commented Jan 20, 2026

Uh oh!

shijieliu commented Jan 26, 2026

Uh oh!

jiashuy commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 27, 2026

Greptile found no issues!

Uh oh!

jiashuy commented Jan 27, 2026

Uh oh!

shijieliu Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiashuy commented Jan 19, 2026 •

edited

Loading

jiashuy commented Jan 19, 2026 •

edited

Loading

jiashuy commented Jan 20, 2026 •

edited

Loading

jiashuy commented Jan 20, 2026 •

edited

Loading

jiashuy commented Jan 27, 2026 •

edited

Loading

greptile-apps bot commented Jan 27, 2026 •

edited

Loading