-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or requesttorchPyTorch neural network workflow and optimizations.PyTorch neural network workflow and optimizations.
Description
We need to implement multi-GPU support for systems with 2 or more GPUs connected via x16 PCI-E lanes.
- we can leverage
device_map="auto"to automatically shard models for HuggingFace models - Utilize
torch.distributedortorch.nn.parallel.DistributedDataParallelfor efficient parallelism
Extend torch_node to track multi-device configurations, and ensure this device metadata is broadcast across the P2P network for smarter job routing and coordination.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesttorchPyTorch neural network workflow and optimizations.PyTorch neural network workflow and optimizations.