Skip to content

Conversation

@lhez
Copy link
Collaborator

@lhez lhez commented Nov 7, 2025

This PR adds fast div/mod that has been in CUDA and Vulkan backends for some time. Hexagon backend is about to have it. This implementation is ported from CUDA. Currently perf stays about the same. I think it is beneficial to have it and we can apply it to more ops in the future.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Nov 7, 2025
@lhez lhez marked this pull request as ready for review November 10, 2025 19:42
Copy link
Collaborator

@max-krasnyansky max-krasnyansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Gen5.

I see a little perf bump with llama-3.2-1B-Q4_0

Before:
 llama_perf_context_print: prompt eval time =  247.46 ms / 205 tokens ( 1.21 ms per  token, 828.41 tokens per second)
 llama_perf_context_print:        eval time = 1597.53 ms / 63 runs    (25.36 ms per token,  39.44 tokens per second)

After:
 llama_perf_context_print: prompt eval time =  244.81 ms / 205 tokens ( 1.19 ms per token, 837.39 tokens per second)
 llama_perf_context_print:        eval time = 1558.21 ms /  63 runs   (24.73 ms per token,  40.43 tokens per second)

@max-krasnyansky max-krasnyansky merged commit ece0f5c into ggml-org:master Nov 10, 2025
123 of 128 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants