Skip to content

Conversation

@Trunrain
Copy link

@Trunrain Trunrain commented Dec 1, 2025

What this PR does / why we need it?
Optimization of the fused operator for Qwen3 32B: Matmul, AllReduce, Add, and RMSNorm

Does this PR introduce any user-facing change?
No

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new fused operator, MatmulAllreduceAddRmsnorm, for Ascend NPUs, including its kernel implementation, host-side logic, tiling configuration, and PyTorch bindings. The changes are extensive and add significant new functionality. My review has identified a few areas for improvement regarding code correctness and maintainability, such as redundant conditional logic, use of a floating-point literal for an integer constant, and inconsistent use of deprecated intrinsics in the kernel code. Addressing these points will enhance the overall quality and robustness of the implementation.

Comment on lines 75 to 79
if (op::GetCurrentPlatformInfo().GetSocVersion() == op::SocVersion::ASCEND910B) {
NnopbaseSetHcclServerType(executor, NNOPBASE_HCCL_SERVER_TYPE_MTE);
} else {
NnopbaseSetHcclServerType(executor, NNOPBASE_HCCL_SERVER_TYPE_MTE);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The if and else blocks have identical logic, both setting the HCCL server type to NNOPBASE_HCCL_SERVER_TYPE_MTE. This redundancy can be removed by calling NnopbaseSetHcclServerType once, outside of any conditional check.

        NnopbaseSetHcclServerType(executor, NNOPBASE_HCCL_SERVER_TYPE_MTE);

static constexpr int32_t SWIZZLE_DIRECT_ONE = 1;
static constexpr int32_t COMMNPUSPLIT_ONE = 1;
static constexpr int32_t COMMDATASPLIT_SIXTEEN = 16;
constexpr int32_t SECOND_TO_MS = 1e3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a floating-point literal 1e3 to initialize an integer constant SECOND_TO_MS can lead to precision issues and is less clear than using an integer literal. It's better to use 1000 for integer constants.

constexpr int32_t SECOND_TO_MS = 1000;

@Trunrain Trunrain changed the title [kernel] Add MatmulAllreduceAddRmsnorm aclnn operator [kernel] [Kernel] add custom op MatmulAllreduceAddRmsnorm Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Trunrain Trunrain force-pushed the main branch 2 times, most recently from a1981e7 to a34cae6 Compare December 1, 2025 11:37
@Trunrain Trunrain changed the title [kernel] [Kernel] add custom op MatmulAllreduceAddRmsnorm [Kernel] add custom op MatmulAllreduceAddRmsnorm Dec 2, 2025
@Trunrain Trunrain force-pushed the main branch 9 times, most recently from 8df9054 to 80cdaf5 Compare December 3, 2025 07:00
int residualDimNum = residualShape->GetDimNum();
int bs = 1;
int m = 0;
int n = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to add comments for m/n to improve readability.

addOutShape->SetDim(SHAPE_INDEX1, n);
}

return GRAPH_SUCCESS;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If neither branch is taken, is it appropriate to return success directly?

addOutShape->SetDimNum(residualDimNum);
addOutShape->SetDim(SHAPE_INDEX0, m);
addOutShape->SetDim(SHAPE_INDEX1, n);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to extract common logic into a helper function or inline code block.

}

__aicore__ inline void Process()
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to add comments to the core processing logic to improve readability.

((this->m0 * this->k0 + this->cube_matrix_size - 1) / this->cube_matrix_size * this->cube_matrix_size +
(this->n0 * this->k0 + this->cube_matrix_size - 1) / this->cube_matrix_size * this->cube_matrix_size) *
(this->is_int8 ? DOUBLE_BUFFER_SIZE : 1);
L0AB_PINGPONG_BUFFER_LEN = L0AB_PINGPONG_BUFFER_SIZE / sizeof(MmadDtype);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a global constant and a member variable with the same name (L0AB_PINGPONG_BUFFER_LEN), which violates the principle of "avoiding namespace pollution."

int32_t swizzl_count;
int32_t swizzl_direct;

int32_t L1_PINGPONG_BUFFER_LEN;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class member variables should be named using lowercase letters and underscores.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants