Replies: 2 comments 1 reply
-
|
The |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Aight, added an issue for it then, I'll probably start a PR when I'm done with the current ones. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
While looking over some LLM-generated suggestions for GGML code (and then looking over the codebase) I realized we totally messed up with OP_TRI :)
There is actually an operation that's supposed to act like a triangular mask with similar semantics to PyTorch's triu/tril. However, for reasons unknown to me it's been split into two different operations: OP_DIAG_MASK_INF and OP_DIAG_MASK_ZERO. Unlike TRI, those operations take a diagonal offset (similar to triu/tril). One of them fills the upper-diagonal region with zeroes and the other with Inf (but they don't do anything to the rest of the matrix).
I'm not aware whether the DIAG_MASK_* operations are used anywhere. I think the TRI version is more universal (can mask with any value, has both upper and lower version), the only thing I'm wondering about is whether it would be better to keep the PyTorch-compatible version with "diagonal offset" (so technically 0 would be the current LOWER/UPPER and -1 would be LOWER_WITH_DIAG/UPPER_WITH_DIAG) and just retain LOWER/UPPER or whether the current version (just "with and without diagonal") is OK (I'm not aware of any usage that actually uses an offset triangular matrix, but maybe there are some?). If we're happy with the current API then we could just get rid of the DIAG_MASK_* operators (maybe after rewriting their kernels to TRI).
Beta Was this translation helpful? Give feedback.
All reactions