-
Notifications
You must be signed in to change notification settings - Fork 14k
Offload device #145768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offload device #145768
Conversation
This comment has been minimized.
This comment has been minimized.
|
@oli-obk Probably stupid question, but where do set parameter names? but want to generate Edit: I decided I'll just do the full rewrite of the module on llvm-ir level instead of MIR, since that's what I know best (and seathlin mentioned MIR is probably too low eithr way). I'll add more details later |
This comment has been minimized.
This comment has been minimized.
|
I'll clean this up further later to minimze the amount of c++, but I lost a bit of patience with LLVM's C API, so I just did 100% of the work with the C++ API. With this and the previous (review ready) patch, Rust's amdgcn target runs on a GPU, without manual LLVM-IR rewriting. I'll port it back from C++ to Rust. |
773389b to
cdbbe9c
Compare
This comment has been minimized.
This comment has been minimized.
| llvm::PrintStatistics(OS); | ||
| } | ||
|
|
||
| extern "C" void LLVMRustOffloadMapper(LLVMModuleRef M, LLVMValueRef OldFn, LLVMValueRef NewFn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too niche to be worth wrapping in Rust. We would need to introduce the ValueToValueMapTy, and handle the mapping of one value to another. Plus we'd need to expose the used CloneFunctionInto as well as the CloneFunctionChangeTypes.
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| let consider_offload = config.offload.contains(&config::Offload::Enable); | ||
| if consider_offload && (cgcx.target_arch == "amdgpu" || cgcx.target_arch == "nvptx64") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should probably be combined to a target_is_gpu, similar to target_is_like_darwin and target_is_like_aix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that seems like a good idea to keep them in one location. Maybe as a method on target_arch?
|
r? @oli-obk This adds two more commits on top of the other pr which fixes the host code generation. |
|
|
|
☔ The latest upstream changes (presumably #147384) made this pull request unmergeable. Please resolve the merge conflicts. |
Which other PR specifically? |
|
This one: #142696 |
dd5af93 to
ee30fc9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ee30fc9 to
0637c67
Compare
This comment has been minimized.
This comment has been minimized.
|
@oli-obk I dropped the first two commits which now landed, so now it's down to 100 lines. I'll have a look at what happened to my test update, the changes should be visible in the IR we generate. |
|
I figured out why we have no test, this change was to fix device code. I think we can't easily test this GPU target (amdgcn) since it's tier 3, so we don't even have core available (in CI at least). https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions |
0637c67 to
37ceb03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with nits
|
These commits modify compiler targets. Some changes occurred in compiler/rustc_codegen_ssa |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
a5df8df to
af2579c
Compare
|
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
af2579c to
360b38c
Compare
|
@bors r=oli-obk rollup |
Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? `@ghost`
Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? ``@ghost``
Rollup of 12 pull requests Successful merges: - #145768 (Offload device) - #145992 (Stabilize `vec_deque_pop_if`) - #147416 (Early return if span is from expansion so we dont get empty span and ice later on) - #147808 (btree: cleanup difference, intersection, is_subset) - #148520 (style: Use binary literals instead of hex literals in doctests for `highest_one` and `lowest_one`) - #148559 (Add typo suggestion for a misspelt Cargo environment variable) - #148567 (Fix incorrect precedence caused by range expression) - #148570 (Fix mismatched brackets in generated .dir-locals.el) - #148575 (fix dev guide link in rustc_query_system/dep_graph/README.md) - #148578 (core docs: add notes about availability of `Atomic*::from_mut_slice`) - #148603 (Backport 1.91.1 relnotes to main) - #148609 (Sync str::rsplit_once example with str::split_once) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #145768 - ZuseZ4:offload-device, r=oli-obk Offload device LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it. This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled. WIP r? ```@ghost```
LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.
This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.
WIP
r? @ghost