Skip to content

Conversation

@ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Aug 22, 2025

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? @ghost

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 22, 2025
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Aug 22, 2025

@oli-obk Probably stupid question, but where do set parameter names?
When lowering any GPUKernel Function (including amdgpu_kernel) I add one ptr argument, so for a one-ptr function I instead generate

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %x, ptr noundef readnone captures(none) %0) unnamed_addr #0 {
start:
  %1 = addrspacecast ptr %x to ptr addrspace(1)
  store double 2.100000e+01, ptr addrspace(1) %1, align 8
  ret void
}

but want to generate
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %0, ptr noundef readnone captures(none) %x) (aka ignore the first argument). My PR (asdf commit) just copies the first arg and the last arg get's a generic name, so I assume we have a list of names on the rust side, which we match starting at the first arg. However, I couldn't find a relevant location where we're calling LLVM functions on the rust side to set those names. Do you have any ideas?

Edit: I decided I'll just do the full rewrite of the module on llvm-ir level instead of MIR, since that's what I know best (and seathlin mentioned MIR is probably too low eithr way). I'll add more details later

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Aug 27, 2025

I'll clean this up further later to minimze the amount of c++, but I lost a bit of patience with LLVM's C API, so I just did 100% of the work with the C++ API. With this and the previous (review ready) patch, Rust's amdgcn target runs on a GPU, without manual LLVM-IR rewriting. I'll port it back from C++ to Rust.
This should also run on Nvidia or Intel GPUs, I just didn't got to test it yet.

@rust-log-analyzer

This comment has been minimized.

llvm::PrintStatistics(OS);
}

extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too niche to be worth wrapping in Rust. We would need to introduce the ValueToValueMapTy, and handle the mapping of one value to another. Plus we'd need to expose the used CloneFunctionInto as well as the CloneFunctionChangeTypes.

@rust-log-analyzer

This comment has been minimized.

}

let consider_offload = config.offload.contains(&config::Offload::Enable);
if consider_offload && (cgcx.target_arch == "amdgpu" || cgcx.target_arch == "nvptx64") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably be combined to a target_is_gpu, similar to target_is_like_darwin and target_is_like_aix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that seems like a good idea to keep them in one location. Maybe as a method on target_arch?

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Sep 1, 2025

r? @oli-obk

This adds two more commits on top of the other pr which fixes the host code generation.
Together, these 4 commits are enough to compile a rust function and run it on a gpu, I left instructions in the dev guide for it: https://rustc-dev-guide.rust-lang.org/offload/usage.html

@ZuseZ4 ZuseZ4 marked this pull request as ready for review September 1, 2025 08:07
@rustbot
Copy link
Collaborator

rustbot commented Sep 1, 2025

oli-obk is not on the review rotation at the moment.
They may take a while to respond.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 1, 2025
@ZuseZ4 ZuseZ4 mentioned this pull request Jul 24, 2025
5 tasks
@bors
Copy link
Collaborator

bors commented Oct 5, 2025

☔ The latest upstream changes (presumably #147384) made this pull request unmergeable. Please resolve the merge conflicts.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 19, 2025

This adds two more commits on top of the other pr which fixes the host code generation.

Which other PR specifically?

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Oct 19, 2025

This one: #142696

@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Oct 21, 2025

@oli-obk I dropped the first two commits which now landed, so now it's down to 100 lines.
In the second commit I moved most of my C++ code to Rust, it still has some pure C++ left though, because I feel writing wrappers for it would introduce too many otherwise useless wrappers and be a lot of overhead. E.g. CloneFunctionChangeType or ValueToValueMapTy.

I'll have a look at what happened to my test update, the changes should be visible in the IR we generate.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Oct 25, 2025

I figured out why we have no test, this change was to fix device code. I think we can't easily test this GPU target (amdgcn) since it's tier 3, so we don't even have core available (in CI at least). https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions
The host code which we can test is unchanged. The nvidia target would work in the future, but Marcelo was having some issues with running it on his NVIDIA gpu, and I'd need to set up an nvidia server which I don't use right now.

Copy link
Contributor

@oli-obk oli-obk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rustbot
Copy link
Collaborator

rustbot commented Nov 5, 2025

These commits modify compiler targets.
(See the Target Tier Policy.)

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Nov 6, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Nov 6, 2025

@bors r=oli-obk rollup

@bors
Copy link
Collaborator

bors commented Nov 6, 2025

📌 Commit 360b38c has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 6, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 6, 2025
Offload device

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? `@ghost`
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 6, 2025
Offload device

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? ``@ghost``
bors added a commit that referenced this pull request Nov 7, 2025
Rollup of 12 pull requests

Successful merges:

 - #145768 (Offload device)
 - #145992 (Stabilize `vec_deque_pop_if`)
 - #147416 (Early return if span is from expansion so we dont get empty span and ice later on)
 - #147808 (btree: cleanup difference, intersection, is_subset)
 - #148520 (style: Use binary literals instead of hex literals in doctests for `highest_one` and `lowest_one`)
 - #148559 (Add typo suggestion for a misspelt Cargo environment variable)
 - #148567 (Fix incorrect precedence caused by range expression)
 - #148570 (Fix mismatched brackets in generated .dir-locals.el)
 - #148575 (fix dev guide link in rustc_query_system/dep_graph/README.md)
 - #148578 (core docs: add notes about availability of `Atomic*::from_mut_slice`)
 - #148603 (Backport 1.91.1 relnotes to main)
 - #148609 (Sync str::rsplit_once example with str::split_once)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit c637939 into rust-lang:master Nov 7, 2025
11 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Nov 7, 2025
rust-timer added a commit that referenced this pull request Nov 7, 2025
Rollup merge of #145768 - ZuseZ4:offload-device, r=oli-obk

Offload device

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? ```@ghost```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants