Clean support for Non-Nvidia hardware vendors like Apple Silicon #63

mikepapadim · 2025-11-07T13:27:48Z

No description provided.

Copilot

Pull Request Overview

This PR adds support for non-Nvidia hardware (specifically Apple Silicon) by introducing a scheduler detection system that adapts execution strategies based on the underlying hardware. The changes enable the application to select between Flash Attention (optimized for Nvidia GPUs) and parallel head processing (for non-Nvidia hardware like Apple Silicon), along with conditional normalization steps.

Key changes:

Introduced SchedulerType parameter across FFN layer constructors and logits layers to enable hardware-specific code paths
Added conditional execution logic for attention mechanisms and normalization based on detected hardware
Updated TornadoVM submodule reference to support non-Nvidia backends

Reviewed Changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
AbstractFFNLayers.java	Added `schedulerType` field and `shouldUseFinalNormalization()` helper method to support hardware-specific optimizations
LlamaQ8_0FFNLayers.java	Integrated scheduler-based attention configuration and conditional normalization tasks
LlamaFP16FFNLayers.java	Integrated scheduler-based attention configuration and conditional normalization tasks
Qwen3Q8_0FFNLayers.java, Qwen2Q8_0FFNLayers.java, Phi3Q8_0FFNLayers.java	Updated constructors to accept and pass `schedulerType` parameter
Qwen3FP16FFNLayers.java, Qwen2FP16FFNLayers.java, Phi3FP16FFNLayers.java	Updated constructors to accept and pass `schedulerType` parameter
LogitsQ8_0Layer.java, LogitsFP16Layer.java	Added conditional normalization task based on scheduler type
QuantizedLayerPlanner.java	Integrated `SchedulerDetectionService` to determine hardware type
All LayerPlanner files	Updated layer instantiation to pass `schedulerType` parameter
TransformerComputeKernelsLayered.java	Added new `processHeadsParallel` method for non-Nvidia hardware
set_paths, external/tornadovm	Updated TornadoVM submodule and adjusted path configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-07T13:30:24Z

src/main/java/org/beehive/gpullama3/tornadovm/layers/type/q8_0/LlamaQ8_0FFNLayers.java

+        } else {
+            return unifiedLayer.task("parallel-attention", TransformerComputeKernelsLayered::processHeadsParallel,
+                    state.wrapQ, state.wrapKeyCache, state.wrapValueCache, state.wrapXb,
+                    config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),


The contextLength() parameter appears twice in the method call on line 171. The parameter at position 8 is seqLen according to the method signature, but config.contextLength() is passed. Then contextLength is passed again at position 11. Verify that seqLen should be config.contextLength() rather than a different value like the current position.

Suggested change

config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),

config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), state.positionHolder,

src/main/java/org/beehive/gpullama3/tornadovm/layers/type/fp16/LlamaFP16FFNLayers.java

Copilot

Pull Request Overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…THREAD_SCALE_FOR_LOGITS to 8. Adjust model loader logic for DEEPSEEK matching.

…/non-nvidia-hardware

orionpapadakis · 2025-11-10T10:32:06Z

src/main/java/org/beehive/gpullama3/model/loader/ModelLoader.java

            } else if (lowerName.contains("qwen3")) {
                return ModelType.QWEN_3;
-            } else if (lowerName.contains("deepseek r1 distill")) {
+            } else if (lowerName.contains("deepseek")) {


i think we still need this

src/main/java/org/beehive/gpullama3/tornadovm/layers/AbstractLayer.java

…te comment in AbstractLayer

Copilot

Pull Request Overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-11T10:18:38Z

src/main/java/org/beehive/gpullama3/tornadovm/layers/type/q8_0/LlamaQ8_0FFNLayers.java

+        } else {
+            return unifiedLayer.task("parallel-attention", TransformerComputeKernelsLayered::processHeadsParallel,
+                    state.wrapQ, state.wrapKeyCache, state.wrapValueCache, state.wrapXb,
+                    config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),


The parameter layerIndex is passed twice in the processHeadsParallel call - once as the second-to-last parameter and once as the last parameter. According to the method signature in TransformerComputeKernelsLayered.java (line 282-283), the parameters should be: q, key_cache, value_cache, xb, nHeads, headSize, kvDim, kvMul, seqLen, positionHolder, wrapAtt, layer, contextLength. Here layerIndex appears to be passed as both layer and contextLength, which is incorrect.

Suggested change

config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),

config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(),

Copilot · 2025-11-11T10:18:38Z

src/main/java/org/beehive/gpullama3/tornadovm/layers/type/fp16/LlamaFP16FFNLayers.java

+            return unifiedLayer.task("parallel-attention", TransformerComputeKernelsLayered::processHeadsParallel,
+                    state.wrapQ, state.wrapKeyCache, state.wrapValueCache, state.wrapXb,
+                    config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),
+                    state.positionHolder, state.wrapAtt, layerIndex, config.contextLength());


The parameter layerIndex is passed twice in the processHeadsParallel call - once as the second-to-last parameter and once as the last parameter. According to the method signature in TransformerComputeKernelsLayered.java (line 282-283), the parameters should be: q, key_cache, value_cache, xb, nHeads, headSize, kvDim, kvMul, seqLen, positionHolder, wrapAtt, layer, contextLength. Here layerIndex appears to be passed as both layer and contextLength, which is incorrect.

mikepapadim · 2025-11-11T11:43:23Z

closed for #66

Support for non-nvidia hardware

f50471b

mikepapadim requested review from Copilot and orionpapadakis November 7, 2025 13:27

Copilot AI reviewed Nov 7, 2025

View reviewed changes

mikepapadim added 2 commits November 7, 2025 17:41

Lint

f7e9dac

Revert set paths

d6051e0

Copilot AI review requested due to automatic review settings November 7, 2025 16:03

Copilot AI reviewed Nov 7, 2025

View reviewed changes

mikepapadim added 2 commits November 10, 2025 11:25

Refactor TornadoVM layers: simplify SchedulerType imports and update …

35ac5d8

…THREAD_SCALE_FOR_LOGITS to 8. Adjust model loader logic for DEEPSEEK matching.

Merge branch 'main' of github.com:beehive-lab/GPULlama3.java into fix…

4779b9a

…/non-nvidia-hardware

orionpapadakis reviewed Nov 10, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 11, 2025 10:15

Update model loader: refine DEEPSEEK matching logic and remove obsole…

fb29400

…te comment in AbstractLayer

Copilot AI reviewed Nov 11, 2025

View reviewed changes

mikepapadim closed this Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean support for Non-Nvidia hardware vendors like Apple Silicon #63

Clean support for Non-Nvidia hardware vendors like Apple Silicon #63

Uh oh!

mikepapadim commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

orionpapadakis Nov 10, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

Copilot AI Nov 11, 2025

Uh oh!

mikepapadim commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), config.contextLength(),
	config.numberOfHeads(), config.headSize(), config.kvDim(), config.kvMul(), state.positionHolder,

Clean support for Non-Nvidia hardware vendors like Apple Silicon #63

Clean support for Non-Nvidia hardware vendors like Apple Silicon #63

Uh oh!

Conversation

mikepapadim commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

orionpapadakis Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

mikepapadim commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants