Skip to content

Conversation

@a4lg
Copy link

@a4lg a4lg commented Dec 3, 2025

TheRock (the latest ROCm distribution) has slightly different path (relative to ${ROCM_PATH}) for device libraries.
This commit adds search path for device libraries on a TheRock-based distribution.

Background / Reason of this PR

With TheRock-based distribution, we can install local ROCm SDK as Python packages and that would be helpful a lot.

To fully utilize TheRock-based toolchain, we normally set the environment variable ${ROCM_PATH} to $(rocm-sdk --path root) to make sure that external programs/libraries can locate TheRock-based ROCm SDK.

One of the problems is: some programs (such as vLLM) don't use hipcc but instead, attempt to use $(rocm-sdk path --root)/lib/llvm/bin/clang++ directly. Second, TheRock has slightly different toolchain layout than the release versions of ROCm SDK.

ROCm SDK Device libraries path
Production Release (7.1) ${ROCM_PATH}/amdgcn/bitcode
TheRock (7.9 / Nightly) ${ROCM_PATH}/lib/llvm/amdgcn/bitcode

Clang does not search the latter by default, causing a compiler error and fails to configure the ROCm toolchain on such Clang-dependent programs.

The workaround below is working yet inconvenient so I hope this issue is resolved in either:

  1. TheRock distribution (separate PR required): Change the path of device libraries or make links/copies to the same path as the production releases.
  2. ROCm LLVM (this PR): Add TheRock search path

Workaround (without this PR)

With the current state, we can avoid device library issues by:

export HIP_DEVICE_LIB_PATH=$(rocm-sdk path --root)/lib/llvm/amdgcn/bitcode

Why PR to ROCm repository and not upstream?

Because TheRock is in the preview state.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@z1-cciauto
Copy link
Collaborator

@lamb-j
Copy link
Collaborator

lamb-j commented Dec 3, 2025

Even though TheRock is still in preview, we can/should put this upstream.

charles-zablit and others added 27 commits December 3, 2025 20:14
So that we can actually test the workflow before comitting into tree.
This reverts commit 671a8ce.

This stil broke the clangd-ubuntu-tsan bot. It seems like somehow the
PATH variable is not getting propagated in the
system-include-extractor.test test.
Currently Apple unrolling preferences are not applied to apple-a17.
This does not test most of the functionality (i.e., that writing to an
existing comment still works), but does ensure that the plumbing works
and things are not completely broken.

Reviewers: tstellar, cmtice

Reviewed By: cmtice

Pull Request: llvm#170209
…_epi64 (llvm#170442)

This PR adds constexpr support for the AVX2 cross-lane permute
intrinsics _mm256_permute4x64_pd and _mm256_permute4x64_epi64

Resolves llvm#169304
This is designed to allow a workflow (e.g., premerge) upload comments
across multiple jobs. Subsequent PRs will wire this up within the
issue-write workflow to support reading comments from multiple files.

Reviewers: tstellar, cmtice

Reviewed By: cmtice

Pull Request: llvm#170216
The logic for narrowing to single scalar recipes is in two different
places: narrowToSingleScalarRecipes and legalizeAndOptimizeInductions.
Consolidate them.
We should actually run the test workflow when touching the workflow we
are attempting to test.
It appears that using it will result in callers mistakenly thinking that
complex f128 returns is done the sret-way, when it should be returned in
registers.
This changed the name of one of the outputs that issue-write used to
control whether or not it ran. This patch should fix that.
The libcall lowering decisions should be program dependent,
depending on the current module's RuntimeLibcallInfo. We need
another related analysis derived from that plus the current
function's subtarget to provide concrete lowering decisions.

This takes on a somewhat unusual form. It's a Module analysis,
with a lookup keyed on the subtarget. This is a separate module
analysis from RuntimeLibraryAnalysis to avoid that depending on
codegen. It's not a function pass to avoid depending on any
particular function, to avoid repeated subtarget map lookups in
most of the use passes, and to avoid any recomputation in the
common case of one subtarget (and keeps it reusable across
repeated compilations).

This also switches ExpandFp and PreISelIntrinsicLowering as
a sample function and module pass. Note this is not yet wired
up to SelectionDAG, which is still using the LibcallLoweringInfo
constructed inside of TargetLowering.
This is so that we can read from multiple files emitted by the premerge
workflow.

Reviewers: tstellar, cmtice

Reviewed By: cmtice

Pull Request: llvm#170411
…vm#170388)

Generate one global per static shared variable so the alignment can be
set separately. Dynamic shared memory is unchanged.
…on. (llvm#170474)

Follow-up to llvm#169576 to enable
UDiv canonicalization if the start of the AddRec is not constant.

The fold is not restricted to constant start values, as long as we are
able to compute a constant remainder. The fold is only applied if the
subtraction of the remainder can be folded into to start expression, but
that is just to avoid creating more complex AddRecs.

For reference, the proof from llvm#169576 is
https://alive2.llvm.org/ce/z/iu2tav

PR: llvm#170474
This PR adds unrolling for vector.create_mask op based on the
targetShape. Each unrolled vector computes its local mask size in each
dimension (d) as:
min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).
…0420)

Disable test due to CASTests failing nondeterministically on some
windows CI.
Only some fortran source files in flang/test/Lower/OpenMP have been
modified. More cleanups will be performed in subsequent commits.
…170414)

Now that the issue-write workflow can support writing comments from
multiple files, make the premerge workflow write out comments from both
x86_64 Linux and Windows. AArch64 Linux right now is left out as the
premerge advisor does not currently support it.
…#169559)

Introduce Hexagon-specific passes to generate widening vector
instructions for integer and floating-point operations using generic
LLVM intrinsics. This enables widening operations for short vectors and
improves type legalization by allowing operands to be widened to
appropriate types. The patch also includes a shuffle optimization pass
to relocate and validate shufflevector instructions during widening
legalization.

Co-authored-by: Jyotsna Verma <jverma@qti.qualcomm.com>
Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com>
Co-authored-by: Fateme Hosseini <fhossein@qti.qualcomm.com>
Co-authored-by: Muntasir Mallick <mallick@qti.qualcomm.com>
Co-authored-by: Tatiana Larina <larina@qti.qualcomm.com>
Co-authored-by: Kaushik Kulkarni <kauskulk@qti.qualcomm.com>

Co-authored-by: Jyotsna Verma <jverma@qti.qualcomm.com>
Co-authored-by: Yashas Andaluri <yandalur@qti.qualcomm.com>
Co-authored-by: Muntasir Mallick <mallick@qti.qualcomm.com>
Co-authored-by: Tatiana Larina <larina@qti.qualcomm.com>
Co-authored-by: Kaushik Kulkarni <kauskulk@qti.qualcomm.com>
This is an NFC patch that aims to simplify how the scanner calls
`Consumer.handleBuildCommand()` by doing it directly in
`DependencyScanningAction` instead of going through the
`setLastCC1Arguments()` and `takeLastCC1Arguments()` dance with the
client.
Shared memory needs to be aligned like in the bulk load operation.
…is incomplete (llvm#163654)

Indexing in .debug_str section could lead to integer overflow when in
DWARF32 format.

https://github.com/llvm/llvm-project/blob/e61e6251b692ffe71910bad22b82e41313f003cf/llvm/lib/DWP/DWP.cpp#L35C30-L35C47

This can lead to missing symbols from the DWARF info, and hurts profile
quality. As a workaround, we may use information from the symbol table
(.symtab), and recover the missing symbols with addresses and ranges.

Output:
```
# Before
...
warning: 6.64%(338916009/5106344252) of samples are from ranges that do not belong to any functions.

# After
...
warning: 0.07%(3501587/4906133148) of samples are from ranges that do not belong to any functions.
warning: 5.71%(280266919/4906133148) of samples are from ranges that belong to functions recovered from symbol table.
```

We see 0.4% - 1.35% performance improvements on our internal services
where profiles are generated with binaries that have .debug_str section
over 4GB.
clementval and others added 7 commits December 3, 2025 22:38
This commit adds two Clang builtins for PowerPC AMO load operations:

__builtin_amo_lwat for 32-bit unsigned operations
__builtin_amo_ldat for 64-bit unsigned operations

Also adds an amo.h header that maps GCC's AMO functions to these Clang
builtins for compatibility.
Similar to llvm#161881, we will
need private/firstprivate/reduction representation for acc kernels for
automatic privatization
This PR adds hardware-measured latencies/Occupancy for all RVV
load/stores to the SpacemiT-X60 scheduling model.
…#167254)

When emitting the assembly .set directive, KCFI needs to use
getZExtValue(). However, this means that FileCheck pattern matching can't
match between the .set directive and the IR when the high bit of a 32-bit
value is set. We had gotten lucky with the existing tests happening to
just not have had the high bit set. The coming hash change will expose
this, though.

LLVM IR's default printing behavior uses APInt::operator<<, which calls
APInt::print(OS, /*isSigned=*/true). This means KCFI operand bundles in
call instructions print as signed (e.g. [ "kcfi"(i32 -1208803271) ]),
and KCFI type metadata prints as signed (e.g. !3 = !{i32 -1208803271}).
Changing the IR to print unsigned i32 values would impact hundreds of
existing tests, so it is best to just leave it be.

Update the KCFI .set direct to use getSExtValue() in a comment so that
we can both build correctly and use FileCheck with pattern matching in
tests.

KCFI generates hashes in two places. Instead of exposing the hash
implementation in both places, introduce a helper that wraps the
specific hash implementation in a single place, llvm::getKCFITypeID.

In order to transition between KCFI hash, we need to be able to specify
them. Add the Clang option -fsanitize-kcfi-hash= and a IR module option
"kcfi-hash" that can choose between xxHash64 and FNV-1a. Default to
xxHash64 to stay backward compatible, as we'll need to also update rustc
to take a new option to change the hash to FNV-1a for interop with the
coming GCC KCFI.
@a4lg
Copy link
Author

a4lg commented Dec 3, 2025

@lamb-j
Thanks for letting me know. I'll submit the same PR to upstream (and keep this PR open for a while).

llvmgnsyncbot and others added 3 commits December 3, 2025 23:21
On Mac `unsigned long long` corresponds to `uint64_t` so `%llu` is
needed.

Simply always using `%llu` and upcasting to `unsigned long long` should
make it work fine.
…ndencyScanningTool (NFC) (llvm#169962)

This patch is the first of two in refactoring Clang's dependency
scanning tooling to remove its dependency on clangDriver.

It separates Tooling/DependencyScanningTool.cpp from the rest of
clangDependencyScanning and moves clangDependencyScanning out of
clangTooling into its own library. No functional changes are
introduced.

The follow-up patch (llvm#169964) will restrict clangDependencyScanning to
handling only -cc1 command line inputs and will move all functionality
related to handling driver commands into clangTooling.
(Tooling/DependencyScanningTool.cpp).

This is part of a broader effort to support driver-managed builds for
compilations using C++ named modules and/or Clang modules. It is
required for linking the dependency scanning tooling against the driver
without introducing cyclic dependencies, which would otherwise cause
build failures when dynamic linking is enabled.

The RFC for this change can be found here:

https://discourse.llvm.org/t/rfc-new-clangoptions-library-remove-dependency-on-clangdriver-from-clangfrontend-and-flangfrontend/88773?u=naveen-seth
@a4lg a4lg force-pushed the amdgpu-therock-device-lib-path branch from 5ed03dd to d3314f0 Compare December 4, 2025 00:10
@a4lg a4lg force-pushed the amdgpu-therock-device-lib-path branch from d3314f0 to aee3336 Compare December 4, 2025 00:11
@z1-cciauto
Copy link
Collaborator

TheRock has slightly different path (relative to ROCM_PATH)
for device libraries.  This commit adds search path for device libraries
on a TheRock-based distribution.

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
@a4lg a4lg force-pushed the amdgpu-therock-device-lib-path branch from aee3336 to b426723 Compare December 4, 2025 00:58
@z1-cciauto
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.