-
Notifications
You must be signed in to change notification settings - Fork 77
[AMDGPU] Search TheRock-based device libraries #739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-staging
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
Even though TheRock is still in preview, we can/should put this upstream. |
This patch is a follow up to llvm#170471.
So that we can actually test the workflow before comitting into tree.
This reverts commit 671a8ce. This stil broke the clangd-ubuntu-tsan bot. It seems like somehow the PATH variable is not getting propagated in the system-include-extractor.test test.
Currently Apple unrolling preferences are not applied to apple-a17.
This does not test most of the functionality (i.e., that writing to an existing comment still works), but does ensure that the plumbing works and things are not completely broken. Reviewers: tstellar, cmtice Reviewed By: cmtice Pull Request: llvm#170209
…_epi64 (llvm#170442) This PR adds constexpr support for the AVX2 cross-lane permute intrinsics _mm256_permute4x64_pd and _mm256_permute4x64_epi64 Resolves llvm#169304
This is designed to allow a workflow (e.g., premerge) upload comments across multiple jobs. Subsequent PRs will wire this up within the issue-write workflow to support reading comments from multiple files. Reviewers: tstellar, cmtice Reviewed By: cmtice Pull Request: llvm#170216
The logic for narrowing to single scalar recipes is in two different places: narrowToSingleScalarRecipes and legalizeAndOptimizeInductions. Consolidate them.
We should actually run the test workflow when touching the workflow we are attempting to test.
It appears that using it will result in callers mistakenly thinking that complex f128 returns is done the sret-way, when it should be returned in registers.
This changed the name of one of the outputs that issue-write used to control whether or not it ran. This patch should fix that.
The libcall lowering decisions should be program dependent, depending on the current module's RuntimeLibcallInfo. We need another related analysis derived from that plus the current function's subtarget to provide concrete lowering decisions. This takes on a somewhat unusual form. It's a Module analysis, with a lookup keyed on the subtarget. This is a separate module analysis from RuntimeLibraryAnalysis to avoid that depending on codegen. It's not a function pass to avoid depending on any particular function, to avoid repeated subtarget map lookups in most of the use passes, and to avoid any recomputation in the common case of one subtarget (and keeps it reusable across repeated compilations). This also switches ExpandFp and PreISelIntrinsicLowering as a sample function and module pass. Note this is not yet wired up to SelectionDAG, which is still using the LibcallLoweringInfo constructed inside of TargetLowering.
This reverts commit 5e5937c, since the added test fails on the `lldb-x86_64-win` buildbot. https://lab.llvm.org/buildbot/#/builders/211/builds/4246
This is so that we can read from multiple files emitted by the premerge workflow. Reviewers: tstellar, cmtice Reviewed By: cmtice Pull Request: llvm#170411
…vm#170388) Generate one global per static shared variable so the alignment can be set separately. Dynamic shared memory is unchanged.
Adds more test coverage for llvm#170223.
…on. (llvm#170474) Follow-up to llvm#169576 to enable UDiv canonicalization if the start of the AddRec is not constant. The fold is not restricted to constant start values, as long as we are able to compute a constant remainder. The fold is only applied if the subtraction of the remainder can be folded into to start expression, but that is just to avoid creating more complex AddRecs. For reference, the proof from llvm#169576 is https://alive2.llvm.org/ce/z/iu2tav PR: llvm#170474
This PR adds unrolling for vector.create_mask op based on the targetShape. Each unrolled vector computes its local mask size in each dimension (d) as: min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).
…0420) Disable test due to CASTests failing nondeterministically on some windows CI.
Only some fortran source files in flang/test/Lower/OpenMP have been modified. More cleanups will be performed in subsequent commits.
…170414) Now that the issue-write workflow can support writing comments from multiple files, make the premerge workflow write out comments from both x86_64 Linux and Windows. AArch64 Linux right now is left out as the premerge advisor does not currently support it.
…#169559) Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization. Co-authored-by: Jyotsna Verma <jverma@qti.qualcomm.com> Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com> Co-authored-by: Fateme Hosseini <fhossein@qti.qualcomm.com> Co-authored-by: Muntasir Mallick <mallick@qti.qualcomm.com> Co-authored-by: Tatiana Larina <larina@qti.qualcomm.com> Co-authored-by: Kaushik Kulkarni <kauskulk@qti.qualcomm.com> Co-authored-by: Jyotsna Verma <jverma@qti.qualcomm.com> Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com> Co-authored-by: Muntasir Mallick <mallick@qti.qualcomm.com> Co-authored-by: Tatiana Larina <larina@qti.qualcomm.com> Co-authored-by: Kaushik Kulkarni <kauskulk@qti.qualcomm.com>
This is an NFC patch that aims to simplify how the scanner calls `Consumer.handleBuildCommand()` by doing it directly in `DependencyScanningAction` instead of going through the `setLastCC1Arguments()` and `takeLastCC1Arguments()` dance with the client.
Shared memory needs to be aligned like in the bulk load operation.
…is incomplete (llvm#163654) Indexing in .debug_str section could lead to integer overflow when in DWARF32 format. https://github.com/llvm/llvm-project/blob/e61e6251b692ffe71910bad22b82e41313f003cf/llvm/lib/DWP/DWP.cpp#L35C30-L35C47 This can lead to missing symbols from the DWARF info, and hurts profile quality. As a workaround, we may use information from the symbol table (.symtab), and recover the missing symbols with addresses and ranges. Output: ``` # Before ... warning: 6.64%(338916009/5106344252) of samples are from ranges that do not belong to any functions. # After ... warning: 0.07%(3501587/4906133148) of samples are from ranges that do not belong to any functions. warning: 5.71%(280266919/4906133148) of samples are from ranges that belong to functions recovered from symbol table. ``` We see 0.4% - 1.35% performance improvements on our internal services where profiles are generated with binaries that have .debug_str section over 4GB.
This commit adds two Clang builtins for PowerPC AMO load operations: __builtin_amo_lwat for 32-bit unsigned operations __builtin_amo_ldat for 64-bit unsigned operations Also adds an amo.h header that maps GCC's AMO functions to these Clang builtins for compatibility.
Similar to llvm#161881, we will need private/firstprivate/reduction representation for acc kernels for automatic privatization
This PR adds hardware-measured latencies/Occupancy for all RVV load/stores to the SpacemiT-X60 scheduling model.
…#167254) When emitting the assembly .set directive, KCFI needs to use getZExtValue(). However, this means that FileCheck pattern matching can't match between the .set directive and the IR when the high bit of a 32-bit value is set. We had gotten lucky with the existing tests happening to just not have had the high bit set. The coming hash change will expose this, though. LLVM IR's default printing behavior uses APInt::operator<<, which calls APInt::print(OS, /*isSigned=*/true). This means KCFI operand bundles in call instructions print as signed (e.g. [ "kcfi"(i32 -1208803271) ]), and KCFI type metadata prints as signed (e.g. !3 = !{i32 -1208803271}). Changing the IR to print unsigned i32 values would impact hundreds of existing tests, so it is best to just leave it be. Update the KCFI .set direct to use getSExtValue() in a comment so that we can both build correctly and use FileCheck with pattern matching in tests. KCFI generates hashes in two places. Instead of exposing the hash implementation in both places, introduce a helper that wraps the specific hash implementation in a single place, llvm::getKCFITypeID. In order to transition between KCFI hash, we need to be able to specify them. Add the Clang option -fsanitize-kcfi-hash= and a IR module option "kcfi-hash" that can choose between xxHash64 and FNV-1a. Default to xxHash64 to stay backward compatible, as we'll need to also update rustc to take a new option to change the hash to FNV-1a for interop with the coming GCC KCFI.
|
@lamb-j |
On Mac `unsigned long long` corresponds to `uint64_t` so `%llu` is needed. Simply always using `%llu` and upcasting to `unsigned long long` should make it work fine.
…ndencyScanningTool (NFC) (llvm#169962) This patch is the first of two in refactoring Clang's dependency scanning tooling to remove its dependency on clangDriver. It separates Tooling/DependencyScanningTool.cpp from the rest of clangDependencyScanning and moves clangDependencyScanning out of clangTooling into its own library. No functional changes are introduced. The follow-up patch (llvm#169964) will restrict clangDependencyScanning to handling only -cc1 command line inputs and will move all functionality related to handling driver commands into clangTooling. (Tooling/DependencyScanningTool.cpp). This is part of a broader effort to support driver-managed builds for compilations using C++ named modules and/or Clang modules. It is required for linking the dependency scanning tooling against the driver without introducing cyclic dependencies, which would otherwise cause build failures when dynamic linking is enabled. The RFC for this change can be found here: https://discourse.llvm.org/t/rfc-new-clangoptions-library-remove-dependency-on-clangdriver-from-clangfrontend-and-flangfrontend/88773?u=naveen-seth
5ed03dd to
d3314f0
Compare
d3314f0 to
aee3336
Compare
TheRock has slightly different path (relative to ROCM_PATH) for device libraries. This commit adds search path for device libraries on a TheRock-based distribution. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
aee3336 to
b426723
Compare
TheRock (the latest ROCm distribution) has slightly different path (relative to
${ROCM_PATH}) for device libraries.This commit adds search path for device libraries on a TheRock-based distribution.
Background / Reason of this PR
With TheRock-based distribution, we can install local ROCm SDK as Python packages and that would be helpful a lot.
To fully utilize TheRock-based toolchain, we normally set the environment variable
${ROCM_PATH}to$(rocm-sdk --path root)to make sure that external programs/libraries can locate TheRock-based ROCm SDK.One of the problems is: some programs (such as vLLM) don't use
hipccbut instead, attempt to use$(rocm-sdk path --root)/lib/llvm/bin/clang++directly. Second, TheRock has slightly different toolchain layout than the release versions of ROCm SDK.${ROCM_PATH}/amdgcn/bitcode${ROCM_PATH}/lib/llvm/amdgcn/bitcodeClang does not search the latter by default, causing a compiler error and fails to configure the ROCm toolchain on such Clang-dependent programs.
The workaround below is working yet inconvenient so I hope this issue is resolved in either:
Workaround (without this PR)
With the current state, we can avoid device library issues by:
Why PR to ROCm repository and not upstream?
Because TheRock is in the preview state.