Skip to content

Conversation

@yingxudeng
Copy link
Collaborator

No description provided.

@yingxudeng
Copy link
Collaborator Author

image We are currently encountering compilation issues when introducing processGroupHccl. We might need to update the image again.

: ProcessGroup(device) {
c10::intrusive_ptr<c10d_npu::ProcessGroupHCCL::Options> hccl_pg_options =
c10d_npu::ProcessGroupHCCL::Options::create();
// hccl_pg_options->group_name = group_name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if TORCH_VERSION_MAJOR >= 2 && TORCH_VERSION_MINOR >= 7
    pg_options->group_name = group_name;
#endif 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I'll add the version check.

To ensure forward compatibility (e.g., for PyTorch 3.0 where MINOR might be 0), I'll adjust the logic to cover cases where MAJOR > 2 as well

#include "npu_process_group.h"
#include "xllm_kernels/core/include/atb_speed/base/external_comm_manager.h"
#include "xllm_kernels/core/include/atb_speed/utils/singleton.h"
#include "xllm_kernels/models/base/param/mapping.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 22-24 seems useless?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. These lines were not added by me, but I verified that lines 22-23 are actually necessary for atb_speed::GetSingleton. However, line 24 is indeed unused, so I will remove it.

limitations under the License.
==============================================================================*/

#include "npu_process_group.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npu_process_group.cpp should be deleted, because npu_process_group.h is enough, like cuda/mlu_process_group.h.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. However, I strongly prefer to keep the .cpp file. Defining implementation directly in the header is generally considered bad practice. I hope you understand my decision to maintain this separation.

// for (int i = 0; i < outputs.size(); ++i) {
// outputs[i].copy_(flattened_output[i], /*non_blocking=*/true);
// }
std::unique_ptr<xllm::ProcessGroup> create_process_group(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_process_group function can placed into an anonymous namespace in collective_communicator.cpp for all devices.

Copy link
Collaborator Author

@yingxudeng yingxudeng Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

Regarding the suggestion to consolidate the create_process_group functions, I have a few concerns:

Since ProcessGroupHCCL, ProcessGroupCncl, and ProcessGroupNccl are device-specific implementations , moving them to collective_communicator.cpp would introduce excessive #if/#elif preprocessor directives.

@yingxudeng yingxudeng force-pushed the feat/npu_backend_torch branch from 9b68705 to 3b6f3d1 Compare December 1, 2025 10:34
@yingxudeng
Copy link
Collaborator Author

I have updated the three images on the main branch, and the build is now passing. Would you mind taking a look at the code to see if it is ready for merging? I would appreciate your review. @yq33victor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants