Skip to content

Conversation

@jiashuy
Copy link
Collaborator

@jiashuy jiashuy commented Jan 19, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@jiashuy jiashuy added the dynamicemb Related with dynamicemb label Jan 19, 2026
@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 19, 2026

CI

@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 20, 2026

CI

@jiashuy jiashuy requested a review from shijieliu January 20, 2026 02:38
@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 20, 2026

CI

@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 20, 2026

CI

@shijieliu
Copy link
Collaborator

ci

@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 27, 2026

Fix a bug and relaunch CI

@greptile-apps
Copy link

greptile-apps bot commented Jan 27, 2026

Greptile Summary

This PR fixes critical issues related to empty batch handling in the dynamic embedding system and addresses a shared memory configuration bug in the optimizer.

Key Changes:

  • Added early return guards for empty batches (batch == 0) in KeyValueTable.update(), DynamicEmbeddingTable.lookup(), and DynamicEmbeddingTable.update() to prevent unnecessary operations and potential crashes
  • Fixed uninitialized memory bug by changing torch.empty() to torch.zeros() for num_missing_0 and num_missing_1 tensors in lookup operations
  • Added empty batch checks in CUDA functions (load_from_combined_table(), select(), select_index()) to avoid launching kernels with zero elements
  • Fixed Invalid Memory Access (IMA) in rowwise adagrad optimizer by properly configuring shared memory size via a lambda function parameter
  • Added comprehensive test coverage for empty batch scenarios and non-multiple-of-4 dimensions

Impact:
The changes prevent runtime errors and invalid memory access when processing empty batches, which can occur in real-world scenarios. The shared memory fix resolves a critical bug that could cause GPU crashes.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • All changes are defensive bug fixes with clear intent and comprehensive test coverage. The empty batch guards prevent edge case crashes, the memory initialization fix addresses undefined behavior, and the shared memory configuration fix resolves a documented IMA issue. The test additions validate the fixes work correctly.
  • No files require special attention

Important Files Changed

Filename Overview
corelib/dynamicemb/dynamicemb/key_value_table.py Added early returns for empty batches in update() methods and changed torch.empty to torch.zeros for num_missing tensors to fix uninitialized memory issues
corelib/dynamicemb/src/dynamic_emb_op.cu Added early return for empty batches in load_from_combined_table() by moving size check before variable declarations
corelib/dynamicemb/src/index_calculation.cu Added empty batch checks in select() and select_index() functions to prevent unnecessary CUDA operations
corelib/dynamicemb/src/optimizer.cu Fixed shared memory configuration for rowwise adagrad optimizer by adding smem_size_f parameter with default lambda
corelib/dynamicemb/test/test_batched_dynamic_embedding_tables_v2.py Added comprehensive test for empty batch handling and expanded test coverage for non-multiple-of-4 dimensions

Sequence Diagram

sequenceDiagram
    participant Client
    participant KeyValueTable
    participant DynamicEmbeddingTable
    participant CUDA as CUDA Operations
    participant Optimizer

    Note over Client,Optimizer: Empty Batch Handling Flow

    Client->>KeyValueTable: lookup(keys) with batch=0
    KeyValueTable->>KeyValueTable: Check batch == 0
    KeyValueTable-->>Client: Early return (0, empty tensors)

    Client->>DynamicEmbeddingTable: lookup(keys) with batch=0
    DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
    DynamicEmbeddingTable-->>Client: Early return (0, empty tensors)

    Client->>KeyValueTable: update(keys, grads) with batch=0
    KeyValueTable->>KeyValueTable: Check batch == 0
    KeyValueTable-->>Client: Early return (0, None, None)

    Client->>DynamicEmbeddingTable: update(keys, grads) with batch=0
    DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
    DynamicEmbeddingTable-->>Client: Early return (0, None, None)

    Note over CUDA: CUDA Layer Protection

    Client->>CUDA: load_from_combined_table(indices) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Client->>CUDA: select(flags, inputs) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Client->>CUDA: select_index(flags, indices) with num_total=0
    CUDA->>CUDA: Check num_total == 0
    CUDA-->>Client: Early return (no kernel launch)

    Note over Optimizer: Optimizer Fix

    Client->>Optimizer: rowwise_adagrad_for_combined_table()
    Optimizer->>Optimizer: Configure shared memory: smem_size = block_size * sizeof(float)
    Optimizer->>CUDA: Launch kernel with correct shared memory
    CUDA-->>Optimizer: Success
    Optimizer-->>Client: Complete
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 549 to 550
if batch == 0:
return None, None, None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Inconsistent return value with DynamicEmbeddingTable.update() which returns 0, None, None for empty batches (line 1371). Should this also return 0, None, None instead of None, None, None?

Suggested change
if batch == 0:
return None, None, None
return 0, None, None

@greptile-apps
Copy link

greptile-apps bot commented Jan 27, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@jiashuy
Copy link
Collaborator Author

jiashuy commented Jan 27, 2026

CI

missing = torch.logical_not(founds)
num_missing_0: torch.Tensor = torch.empty(1, dtype=torch.long, device=device)
num_missing_1: torch.Tensor = torch.empty(1, dtype=torch.long, device=device)
num_missing_0: torch.Tensor = torch.zeros(1, dtype=torch.long, device=device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need torch.zeros here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dynamicemb Related with dynamicemb

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants