Skip to content

Commit c67393b

Browse files
quic-mamtamamtsing
andauthored
[QEff finetune] : fix qaic device for pp+ddp (#544)
- Fix qaic device for pp+ddp - This fix is required for sdk version 1.21 - Without this fix , pp+ddp was working fine on 1.20.0.194 Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com> Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com>
1 parent f214e43 commit c67393b

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

QEfficient/cloud/finetune.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,8 @@ def setup_distributed_training(train_config: TrainConfig) -> None:
8080
assert torch_device.index is None, f"DDP requires only device type, got: {torch_device}"
8181
dist_backend_map = {"cpu": "gloo", "qaic": "qccl", "cuda": "gloo"}
8282
dist.init_process_group(backend=dist_backend_map[torch_device.type])
83-
if not train_config.enable_pp:
84-
# from here onward "qaic/cuda" will automatically map to "qaic:i/cuda:i", where i = process rank
85-
getattr(torch, torch_device.type).set_device(dist.get_rank())
83+
# from here onward "qaic/cuda" will automatically map to "qaic:i/cuda:i", where i = process rank
84+
getattr(torch, torch_device.type).set_device(dist.get_rank() * train_config.num_pp_stages)
8685

8786

8887
def setup_seeds(seed: int) -> None:

0 commit comments

Comments
 (0)