Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 28, 2025

CP'ed from amd-mainline version: #638

skc7 and others added 8 commits November 14, 2025 21:17
…) (#541)

This PR introduces a new pass "lower-workdistribute" Fortran array
statements are lowered to fir as fir.do_loop unordered.
"lower-workdistribute" pass works mainly on identifying "fir.do_loop
unordered" that is nested in target{teams{workdistribute{fir.do_loop
unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops
outside target region. Relaces heap allocation on target with
omp.target_allocmem and deallocation with omp.target_freemem from host.
Also replaces runtime function "Assign" with omp.target_memcpy from
host.

This pass implements following rewrites and optimisations:

- **FissionWorkdistribute**: finds the parallelizable ops within teams
{workdistribute} region and moves them to their own
teams{workdistribute} region.
- **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls
nested in teams {workdistribute{}} and lowers it to unordered do loop if
src is scalar and dest is array. Other runtime calls are not handled
currently.
- **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in
teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams
{parallel { distribute {wsloop {loop_nest}}}}.
- **TeamsWorkdistributeToSingle**: hoists all the ops inside teams
{workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from
coexecute implementation:


[flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024)

Paper related to this work by @ivanradanov ["Automatic Parallelization
and OpenMP Offloadingof Fortran Array

Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
will work on script changes for aomp, and npsdb after it lands
- add_subdirectory(utils) in offload CMakeLists.txt
- usage of new macro add_openmp_util to install utils
  into llvm/bin
rocm 7.2 changed pci layout/info

really messes up xnack=1 performance

necessitates  forced path to numactl

      -nr  use numactl ROCR_VISIBLE_DEVICES
      -nm  use numactl OMPI_COMM_WORLD_LOCAL_RANK
…#167486)

Attempt to only define used subregisters when creating IMPLICIT_DEF fix
ups for live interval subranges. This avoids the appearance at the MIR
level of entire (wide) registers becoming live rather than relying only
on transient LiveIntervals dead definitions for unused subregisters.

(cherry picked from commit b1c4b55)
@ronlieb
Copy link
Collaborator Author

ronlieb commented Nov 28, 2025

oops, i stacked this on another PR, will redo it

@ronlieb ronlieb closed this Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants