Skip to content

Multi GPU Support #46

@mattjhawken

Description

@mattjhawken

We need to implement multi-GPU support for systems with 2 or more GPUs connected via x16 PCI-E lanes.

  • we can leverage device_map="auto" to automatically shard models for HuggingFace models
  • Utilize torch.distributed or torch.nn.parallel.DistributedDataParallel for efficient parallelism

Extend torch_node to track multi-device configurations, and ensure this device metadata is broadcast across the P2P network for smarter job routing and coordination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttorchPyTorch neural network workflow and optimizations.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions