Merge TRI, GGML_DIAG_MASK_INF, GGML_DIAG_MASK_ZERO #17626

pwilkin · 2025-11-30T18:32:10Z

pwilkin
Nov 30, 2025
Collaborator

While looking over some LLM-generated suggestions for GGML code (and then looking over the codebase) I realized we totally messed up with OP_TRI :)

There is actually an operation that's supposed to act like a triangular mask with similar semantics to PyTorch's triu/tril. However, for reasons unknown to me it's been split into two different operations: OP_DIAG_MASK_INF and OP_DIAG_MASK_ZERO. Unlike TRI, those operations take a diagonal offset (similar to triu/tril). One of them fills the upper-diagonal region with zeroes and the other with Inf (but they don't do anything to the rest of the matrix).

I'm not aware whether the DIAG_MASK_* operations are used anywhere. I think the TRI version is more universal (can mask with any value, has both upper and lower version), the only thing I'm wondering about is whether it would be better to keep the PyTorch-compatible version with "diagonal offset" (so technically 0 would be the current LOWER/UPPER and -1 would be LOWER_WITH_DIAG/UPPER_WITH_DIAG) and just retain LOWER/UPPER or whether the current version (just "with and without diagonal") is OK (I'm not aware of any usage that actually uses an offset triangular matrix, but maybe there are some?). If we're happy with the current API then we could just get rid of the DIAG_MASK_* operators (maybe after rewriting their kernels to TRI).

ggerganov · 2025-12-01T09:32:22Z

ggerganov
Dec 1, 2025
Maintainer

The GGML_OP_DIAG_MASK_* ops should be deprecated and removed. They were introduced very early in the project and are not relevant today.

0 replies

pwilkin · 2025-12-01T11:49:37Z

pwilkin
Dec 1, 2025
Collaborator Author

Aight, added an issue for it then, I'll probably start a PR when I'm done with the current ones.

#17654

1 reply

bssrdf Dec 1, 2025

I think SD.cpp still uses GGML_OP_DIAG_MASK_INF in its non-FA flavor attention. FA is a switch (default off) in SD and not supported in all the blocks. But if TRI has all the functions of GGML_OP_DIAG_MASK_*, I don't see a problem to migrate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge TRI, GGML_DIAG_MASK_INF, GGML_DIAG_MASK_ZERO #17626

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Merge TRI, GGML_DIAG_MASK_INF, GGML_DIAG_MASK_ZERO #17626

Uh oh!

pwilkin Nov 30, 2025 Collaborator

Replies: 2 comments · 1 reply

Uh oh!

ggerganov Dec 1, 2025 Maintainer

Uh oh!

pwilkin Dec 1, 2025 Collaborator Author

Uh oh!

bssrdf Dec 1, 2025

pwilkin
Nov 30, 2025
Collaborator

Replies: 2 comments 1 reply

ggerganov
Dec 1, 2025
Maintainer

pwilkin
Dec 1, 2025
Collaborator Author