opencl: add fastdiv and use it in set_rows, ported from cuda #17090

lhez · 2025-11-07T22:19:54Z

This PR adds fast div/mod that has been in CUDA and Vulkan backends for some time. Hexagon backend is about to have it. This implementation is ported from CUDA. Currently perf stays about the same. I think it is beneficial to have it and we can apply it to more ops in the future.

max-krasnyansky

Tested on Gen5.

I see a little perf bump with llama-3.2-1B-Q4_0

Before:
 llama_perf_context_print: prompt eval time =  247.46 ms / 205 tokens ( 1.21 ms per  token, 828.41 tokens per second)
 llama_perf_context_print:        eval time = 1597.53 ms / 63 runs    (25.36 ms per token,  39.44 tokens per second)

After:
 llama_perf_context_print: prompt eval time =  244.81 ms / 205 tokens ( 1.19 ms per token, 837.39 tokens per second)
 llama_perf_context_print:        eval time = 1558.21 ms /  63 runs   (24.73 ms per token,  40.43 tokens per second)

lhez added 4 commits November 7, 2025 11:41

opencl: add fastdiv for mm q8_0

69ddcdb

opencl: use uint4 for fastdiv vals

792b353

opencl: use fastdiv for set_rows

985cbe3

opencl: do not use fastdiv for q8_0 mm

c586dca

DajanaV mentioned this pull request Nov 7, 2025

UPSTREAM PR #17090: opencl: add fastdiv and use it in set_rows, ported from cuda auroralabs-loci/llama.cpp#125

Open

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Nov 7, 2025

lhez marked this pull request as ready for review November 10, 2025 19:42

lhez requested a review from max-krasnyansky as a code owner November 10, 2025 19:42

max-krasnyansky approved these changes Nov 10, 2025

View reviewed changes

max-krasnyansky merged commit ece0f5c into ggml-org:master Nov 10, 2025
123 of 128 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: add fastdiv and use it in set_rows, ported from cuda #17090

opencl: add fastdiv and use it in set_rows, ported from cuda #17090

Uh oh!

lhez commented Nov 7, 2025

Uh oh!

max-krasnyansky left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

opencl: add fastdiv and use it in set_rows, ported from cuda #17090

opencl: add fastdiv and use it in set_rows, ported from cuda #17090

Uh oh!

Conversation

lhez commented Nov 7, 2025

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants