Offload device #145768

ZuseZ4 · 2025-08-22T21:56:19Z

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? @ghost

ZuseZ4 · 2025-08-22T23:20:13Z

@oli-obk Probably stupid question, but where do set parameter names?
When lowering any GPUKernel Function (including amdgpu_kernel) I add one ptr argument, so for a one-ptr function I instead generate

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %x, ptr noundef readnone captures(none) %0) unnamed_addr #0 {
start:
  %1 = addrspacecast ptr %x to ptr addrspace(1)
  store double 2.100000e+01, ptr addrspace(1) %1, align 8
  ret void
}

but want to generate
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %0, ptr noundef readnone captures(none) %x) (aka ignore the first argument). My PR (asdf commit) just copies the first arg and the last arg get's a generic name, so I assume we have a list of names on the rust side, which we match starting at the first arg. However, I couldn't find a relevant location where we're calling LLVM functions on the rust side to set those names. Do you have any ideas?

Edit: I decided I'll just do the full rewrite of the module on llvm-ir level instead of MIR, since that's what I know best (and seathlin mentioned MIR is probably too low eithr way). I'll add more details later

ZuseZ4 · 2025-08-27T05:31:56Z

I'll clean this up further later to minimze the amount of c++, but I lost a bit of patience with LLVM's C API, so I just did 100% of the work with the C++ API. With this and the previous (review ready) patch, Rust's amdgcn target runs on a GPU, without manual LLVM-IR rewriting. I'll port it back from C++ to Rust.
This should also run on Nvidia or Intel GPUs, I just didn't got to test it yet.

ZuseZ4 · 2025-09-01T07:38:29Z

compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

  llvm::PrintStatistics(OS);
 }

+extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) {


I think this is too niche to be worth wrapping in Rust. We would need to introduce the ValueToValueMapTy, and handle the mapping of one value to another. Plus we'd need to expose the used CloneFunctionInto as well as the CloneFunctionChangeTypes.

ZuseZ4 · 2025-09-01T07:56:58Z

compiler/rustc_codegen_llvm/src/back/write.rs

+    }
+
+    let consider_offload = config.offload.contains(&config::Offload::Enable);
+    if consider_offload && (cgcx.target_arch == "amdgpu" || cgcx.target_arch == "nvptx64") {


these should probably be combined to a target_is_gpu, similar to target_is_like_darwin and target_is_like_aix

Yea that seems like a good idea to keep them in one location. Maybe as a method on target_arch?

ZuseZ4 · 2025-09-01T08:07:13Z

r? @oli-obk

This adds two more commits on top of the other pr which fixes the host code generation.
Together, these 4 commits are enough to compile a rust function and run it on a gpu, I left instructions in the dev guide for it: https://rustc-dev-guide.rust-lang.org/offload/usage.html

rustbot · 2025-09-01T08:07:17Z

oli-obk is not on the review rotation at the moment.
They may take a while to respond.

bors · 2025-10-05T23:16:34Z

☔ The latest upstream changes (presumably #147384) made this pull request unmergeable. Please resolve the merge conflicts.

oli-obk · 2025-10-19T11:22:11Z

This adds two more commits on top of the other pr which fixes the host code generation.

Which other PR specifically?

ZuseZ4 · 2025-10-19T20:37:31Z

This one: #142696

ZuseZ4 · 2025-10-21T15:15:29Z

@oli-obk I dropped the first two commits which now landed, so now it's down to 100 lines.
In the second commit I moved most of my C++ code to Rust, it still has some pure C++ left though, because I feel writing wrappers for it would introduce too many otherwise useless wrappers and be a lot of overhead. E.g. CloneFunctionChangeType or ValueToValueMapTy.

I'll have a look at what happened to my test update, the changes should be visible in the IR we generate.

ZuseZ4 · 2025-10-25T23:49:38Z

I figured out why we have no test, this change was to fix device code. I think we can't easily test this GPU target (amdgcn) since it's tier 3, so we don't even have core available (in CI at least). https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions
The host code which we can test is unchanged. The nvidia target would work in the future, but Marcelo was having some issues with running it on his NVIDIA gpu, and I'd need to set up an nvidia server which I don't use right now.

compiler/rustc_codegen_llvm/src/back/lto.rs

compiler/rustc_codegen_llvm/src/back/write.rs

oli-obk

r=me with nits

View changes since this review

rustbot · 2025-11-05T20:09:00Z

These commits modify compiler targets.
(See the Target Tier Policy.)

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

rustbot · 2025-11-06T04:44:03Z

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

ZuseZ4 · 2025-11-06T08:47:29Z

@bors r=oli-obk rollup

bors · 2025-11-06T08:47:32Z

📌 Commit 360b38c has been approved by oli-obk

It is now in the queue for this repository.

Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? `@ghost`

Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? ``@ghost``

Rollup of 12 pull requests Successful merges: - #145768 (Offload device) - #145992 (Stabilize `vec_deque_pop_if`) - #147416 (Early return if span is from expansion so we dont get empty span and ice later on) - #147808 (btree: cleanup difference, intersection, is_subset) - #148520 (style: Use binary literals instead of hex literals in doctests for `highest_one` and `lowest_one`) - #148559 (Add typo suggestion for a misspelt Cargo environment variable) - #148567 (Fix incorrect precedence caused by range expression) - #148570 (Fix mismatched brackets in generated .dir-locals.el) - #148575 (fix dev guide link in rustc_query_system/dep_graph/README.md) - #148578 (core docs: add notes about availability of `Atomic*::from_mut_slice`) - #148603 (Backport 1.91.1 relnotes to main) - #148609 (Sync str::rsplit_once example with str::split_once) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #145768 - ZuseZ4:offload-device, r=oli-obk Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? ```@ghost```

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-device branch from 773389b to cdbbe9c Compare September 1, 2025 02:53

This comment has been minimized.

Sign in to view

ZuseZ4 commented Sep 1, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

ZuseZ4 commented Sep 1, 2025

View reviewed changes

rustbot assigned oli-obk Sep 1, 2025

ZuseZ4 marked this pull request as ready for review September 1, 2025 08:07

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 1, 2025

ZuseZ4 mentioned this pull request Jul 24, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

ZuseZ4 force-pushed the offload-device branch from dd5af93 to ee30fc9 Compare October 21, 2025 14:54

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-device branch from ee30fc9 to 0637c67 Compare October 21, 2025 15:02

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-device branch from 0637c67 to 37ceb03 Compare October 25, 2025 23:53

oli-obk reviewed Nov 4, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/back/lto.rs Outdated Show resolved Hide resolved

oli-obk reviewed Nov 4, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved

oli-obk reviewed Nov 4, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/back/write.rs Outdated Show resolved Hide resolved

oli-obk approved these changes Nov 4, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-device branch from a5df8df to af2579c Compare November 6, 2025 04:43

Fix device code generation, to account for an implicit dyn_ptr argument.

360b38c

ZuseZ4 force-pushed the offload-device branch from af2579c to 360b38c Compare November 6, 2025 08:46

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 6, 2025

matthiaskrgr mentioned this pull request Nov 6, 2025

Rollup of 10 pull requests #148571

Closed

GuillaumeGomez mentioned this pull request Nov 6, 2025

Rollup of 9 pull requests #148574

Closed

jhpratt mentioned this pull request Nov 7, 2025

Rollup of 12 pull requests #148624

Merged

bors merged commit c637939 into rust-lang:master Nov 7, 2025
11 checks passed

rustbot added this to the 1.93.0 milestone Nov 7, 2025

Offload device #145768

Offload device #145768

Uh oh!

Conversation

ZuseZ4 commented Aug 22, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ZuseZ4 commented Aug 27, 2025

Uh oh!

This comment has been minimized.

ZuseZ4 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

ZuseZ4 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

oli-obk Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 commented Sep 1, 2025

Uh oh!

rustbot commented Sep 1, 2025

Uh oh!

bors commented Oct 5, 2025

Uh oh!

oli-obk commented Oct 19, 2025

Uh oh!

ZuseZ4 commented Oct 19, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZuseZ4 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oli-obk left a comment • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Nov 5, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rustbot commented Nov 6, 2025

Uh oh!

ZuseZ4 commented Nov 6, 2025

Uh oh!

bors commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ZuseZ4 commented Aug 22, 2025 •

edited

Loading

ZuseZ4 commented Oct 21, 2025 •

edited

Loading

ZuseZ4 commented Oct 25, 2025 •

edited

Loading

oli-obk left a comment •

edited by rustbot

Loading