Skip to content

Commit 4cfa7a2

Browse files
mlunar-metameta-codesync[bot]
authored andcommitted
Rebasing nccl 2.28 with latest changes.
Summary: These diff stack, implements the required updates to rebase NCCL 2.28 for NCCLX with CTRAN integration. It incorporates all changes introduced in version 2.27, applying them on top of NCCL’s latest stable release (2.28). The primary objective of this diff is to enable CTRAN support for NCCLX under NCCL 2.28, ensuring compatibility and leveraging the latest enhancements from the upstream release. In addition, it includes necessary checks inside FastInitTest so that it can ensure the ctran enabling as expected. Specifically in this diff, all rebasing tasks are implemented. Reviewed By: zhiyongww Differential Revision: D85349488 fbshipit-source-id: 370f26f555a224db86c35e03dcf4f4133460c52b
1 parent 9698e1c commit 4cfa7a2

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

comms/ncclx/v2_27/meta/commstate/FactoryCommStateX.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@ void initRankTopologyFrom(CommStateX* _CommStateX, void* _comm) {
2727
// each node is already ordered by local rank
2828
// NOTE: two GPUs on the same node may be with different nodeId because
2929
// they don't have direct NVL access. To keep same nodeId in statex, we
30-
// use hostname+nodeId to make it unique
30+
// use hostHash+nodeId to make it unique
3131
std::string host(
32-
std::string(comm->peerInfo[r].hostname) +
32+
std::to_string(comm->peerInfo[r].hostHash) + "_" +
3333
std::to_string(rankState.nodeId));
3434
_CommStateX->hostToRanks_[host].emplace_back(r);
3535

comms/ncclx/v2_27/meta/tests/SimpleCtranInitTest.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ class SimpleCtranInitTest : public ::testing::Test {
1414
setenv("NCCL_CTRAN_ENABLE", "1", 1);
1515
setenv("NCCL_COLLTRACE", "trace", 1);
1616
setenv("NCCL_USE_MEM_CACHE", "1", 1);
17+
setenv("NCCL_LAZY_SETUP_CHANNELS", "1", 1);
18+
setenv("NCCL_RUNTIME_CONNECT", "1", 1);
1719

1820
ncclCvarInit();
1921
NCCLCHECK_TEST(ncclCudaLibraryInit());

0 commit comments

Comments
 (0)