-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Title:
Dimension mismatch in rotary position embedding when using multi-GPU inference with VideoAlign-based video reward model
Body:
Hi,
I am working with a video reward model based on VideoAlign. My training configuration uses FPS = 4.
During inference on videos longer than 5 seconds, single-GPU (A100-80G) inference runs out of memory (OOM).
To avoid this, I tried multi-GPU inference on H20.
I modified my inference code to wrap the model in nn.DataParallel when multiple GPUs are detected:
class VideoVLMRewardInference:
def __init__(self, load_from_pretrained, load_from_pretrained_step=-1, device=None, dtype=torch.bfloat16):
config_path = os.path.join(load_from_pretrained, "VideoReward/model_config.json")
data_config, _, model_config, peft_lora_config, inference_config = load_configs_from_json(config_path)
data_config = DataConfig(**data_config)
model_config = ModelConfig(**model_config)
peft_lora_config = PEFTLoraConfig(**peft_lora_config)
training_args = TrainingConfig(
load_from_pretrained=load_from_pretrained,
load_from_pretrained_step=load_from_pretrained_step,
gradient_checkpointing=False,
disable_flash_attn2=False,
bf16=True if dtype == torch.bfloat16 else False,
fp16=True if dtype == torch.float16 else False,
output_dir="",
)
model, processor, peft_config = create_model_and_processor(
model_config=model_config,
peft_lora_config=peft_lora_config,
training_args=training_args,
)
model, checkpoint_step = load_model_from_checkpoint(
model,
load_from_pretrained,
load_from_pretrained_step
)
model.eval()
if torch.cuda.device_count() > 1:
print(f"Using {torch.cuda.device_count()} GPUs for inference...")
self.device = "cuda"
model = nn.DataParallel(model)
else:
self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
self.model = model.to(self.device)
self.processor = processor
self.data_config = data_config
self.inference_config = inference_config
When running multi-GPU inference, I get the following error:
File ".../transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 372, in forward
q = apply_rotary_pos_emb_vision(q.unsqueeze(0), rotary_pos_emb).squeeze(0)
File ".../transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 255, in apply_rotary_pos_emb_vision
output = (tensor * cos) + (rotate_half(tensor) * sin)
RuntimeError: The size of tensor a (8960) must match the size of tensor b (20480) at non-singleton dimension 1
Question:
How can I fix the dimension mismatch in rotary position embedding when switching from single-GPU to multi-GPU inference?
Metadata
Metadata
Assignees
Labels
No labels