-
Notifications
You must be signed in to change notification settings - Fork 43
Fix issue related to empty batch #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4e4c66 to
a239176
Compare
|
Fix a bug and relaunch |
Greptile SummaryThis PR fixes critical issues related to empty batch handling in the dynamic embedding system and addresses a shared memory configuration bug in the optimizer. Key Changes:
Impact: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant KeyValueTable
participant DynamicEmbeddingTable
participant CUDA as CUDA Operations
participant Optimizer
Note over Client,Optimizer: Empty Batch Handling Flow
Client->>KeyValueTable: lookup(keys) with batch=0
KeyValueTable->>KeyValueTable: Check batch == 0
KeyValueTable-->>Client: Early return (0, empty tensors)
Client->>DynamicEmbeddingTable: lookup(keys) with batch=0
DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
DynamicEmbeddingTable-->>Client: Early return (0, empty tensors)
Client->>KeyValueTable: update(keys, grads) with batch=0
KeyValueTable->>KeyValueTable: Check batch == 0
KeyValueTable-->>Client: Early return (0, None, None)
Client->>DynamicEmbeddingTable: update(keys, grads) with batch=0
DynamicEmbeddingTable->>DynamicEmbeddingTable: Check batch == 0
DynamicEmbeddingTable-->>Client: Early return (0, None, None)
Note over CUDA: CUDA Layer Protection
Client->>CUDA: load_from_combined_table(indices) with num_total=0
CUDA->>CUDA: Check num_total == 0
CUDA-->>Client: Early return (no kernel launch)
Client->>CUDA: select(flags, inputs) with num_total=0
CUDA->>CUDA: Check num_total == 0
CUDA-->>Client: Early return (no kernel launch)
Client->>CUDA: select_index(flags, indices) with num_total=0
CUDA->>CUDA: Check num_total == 0
CUDA-->>Client: Early return (no kernel launch)
Note over Optimizer: Optimizer Fix
Client->>Optimizer: rowwise_adagrad_for_combined_table()
Optimizer->>Optimizer: Configure shared memory: smem_size = block_size * sizeof(float)
Optimizer->>CUDA: Launch kernel with correct shared memory
CUDA-->>Optimizer: Success
Optimizer-->>Client: Complete
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 1 comment
| if batch == 0: | ||
| return None, None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Inconsistent return value with DynamicEmbeddingTable.update() which returns 0, None, None for empty batches (line 1371). Should this also return 0, None, None instead of None, None, None?
| if batch == 0: | |
| return None, None, None | |
| return 0, None, None |
Greptile found no issues!From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
| missing = torch.logical_not(founds) | ||
| num_missing_0: torch.Tensor = torch.empty(1, dtype=torch.long, device=device) | ||
| num_missing_1: torch.Tensor = torch.empty(1, dtype=torch.long, device=device) | ||
| num_missing_0: torch.Tensor = torch.zeros(1, dtype=torch.long, device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need torch.zeros here?
Description
Checklist