-
Notifications
You must be signed in to change notification settings - Fork 43
[WIP] Dynamic resizing #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile SummaryThis PR implements dynamic resizing for embedding tables with significant architectural changes: Key Changes:
Critical Issue Found:
Architecture Improvements:
Concerns:
Confidence Score: 1/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant App as Application
participant Table as DynamicEmbeddingTable
participant KVMap as KeyIndexMap (Hash Table)
participant VMM as VMMTensor/HostVMMTensor
participant Buffer as ExtendableBuffer
participant CUDA as CUDA/CU Driver API
Note over App,CUDA: Initialization Flow
App->>Table: Create table with capacity
Table->>Buffer: Create VmmDeviceBuffer/RegisteredHostBuffer
Buffer->>VMM: Initialize VMMTensor(capacity, dtype, device)
VMM->>CUDA: cuMemAddressReserve (reserve VA space)
VMM->>CUDA: cuMemCreate + cuMemMap (allocate initial pages)
VMM->>CUDA: cuMemSetAccess (set permissions)
VMM-->>Buffer: Return tensor wrapper
Table->>KVMap: Create hash table with capacity
Note over App,CUDA: Insert Flow (NO_EVICTION Mode)
App->>Table: insert(keys, values)
Table->>Table: Generate scores = arange(valid_rows, valid_rows+n)
Table->>KVMap: insert_and_evict(keys, scores)
alt Hash table full (eviction occurs)
KVMap-->>Table: Return evicted_keys, evicted_scores
Table->>Table: load_from_table(evicted_scores, evicted_values)
Table->>Table: rehash(capacity * 2)
Table->>KVMap: Create new hash table (2x capacity)
Table->>Table: Re-insert all entries to new hash table
Table->>Buffer: extend(capacity * 2)
Buffer->>VMM: extend(new_capacity)
VMM->>CUDA: cuMemCreate + cuMemMap (map new pages)
VMM->>CUDA: cuMemSetAccess (set new page permissions)
Table->>Table: store_to_table(scores, values)
Table->>Table: Recursive insert(evicted_keys, evicted_values)
else No eviction
Table->>Table: store_to_table(scores, values)
end
Note over App,CUDA: Lookup and Update Flow
App->>Table: update(keys, grads)
Table->>KVMap: lookup(keys) → indices
Table->>CUDA: optimizer_update_kernel(grads, table, indices)
CUDA->>VMM: Read/write embedding table via mapped memory
Note over App,CUDA: Memory Extension
Note right of VMM: All extensions preserve base address
VMM->>CUDA: Map additional pages at offset
CUDA-->>VMM: Extended virtual address range
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
19 files reviewed, 1 comment
| policy = ScorePolicy.ASSIGN | ||
|
|
||
| if self._no_eviction and scores is None: | ||
| scores = torch.arrange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: torch.arrange is not a valid PyTorch function, should be torch.arange
| scores = torch.arrange( | |
| scores = torch.arange( |
Description
Checklist