eth-cscs · Madeeks · Sep 30, 2025 · Sep 30, 2025 · Sep 30, 2025 · Sep 30, 2025
@@ -17,9 +17,11 @@ CWP
 CXI
 Ceph
 Containerfile
+Containerfiles
 DNS
 Dockerfiles
 Dufourspitze
+EFA
 EMPA
 ETHZ
 Ehrenfest
@@ -76,6 +78,7 @@ MeteoSwiss
 NAMD
 NICs
 NVMe
+NVSHMEM
 Nordend
 OpenFabrics
 OAuth
@@ -102,6 +105,7 @@ ROCm
 RPA
 Roboto
 Roothaan
+SHMEM
 SSHService
 STMV
 Scopi

@@ -1,7 +1,29 @@
 [](){#ref-software-communication}
 # Communication Libraries
 
-CSCS provides common communication libraries optimized for the [Slingshot 11 network on Alps][ref-alps-hsn].
+!!! todo "list of ideas to integrate in this page"
+    * communication libraries are part of the "base" or "core" layer in your environment, alongside compilers and cuda (on NVIDIA GPU systems).
+        * we provide base containers that start with compilers+CUDA
+    * have a section "installing/getting comm libs":
+        * CE (build your own) and uenv (it comes with the label) sub-sections
+        * Conda, pre-built (ORCA, ANSYS, etc)
+
+Communication libraries are used by scientific and AI workloads to communicate between processes.
+The communication libraries used by workloads need to be built and configured correctly to get the best performance.
+Broadly speaking, there are two levels of communication:
+
+* **intra-node** communication between two processes on the same node.
+* **inter-node** communication between different nodes, over the [Slingshot 11 network][ref-alps-hsn] that connects nodes on Alps..
+
+Communication libraries, like MPI and NCCL, need to be configured to use the [libfabric][ref-communication-libfabric] library that has an optimised back end for Slingshot 11.
+As such, they are part of the base layer of libraries and tools required to fully utilize the hardware on Alps:
+
+* **CPU**: compilers with support for building applications optimized for the CPU architecture on the node.
+* **GPU**: CUDA and ROCM provide compilers and runtime libraries for NVIDIA and AMD GPUs respectively.
+* **Network**: libfabric, MPI, NCCL/RCCL, NVSHMEM, need to be configured for the Slingshot network.
+
+CSCS provides communication libraries optimised for libfabric and slingshot in uenv, and guidance on how to configure container images similarly.
+This section of the documentation provides advice on how to build and install software to use these libraries, and how to deploy them.
 
 For most scientific applications relying on MPI, [Cray MPICH][ref-communication-cray-mpich] is recommended.
 [MPICH][ref-communication-mpich] and [OpenMPI][ref-communication-openmpi] may also be used, with limitations.
@@ -12,9 +34,40 @@ NCCL and RCCL have to be configured with a plugin using [libfabric][ref-communic
 
 See the individual pages for each library for information on how to use and best configure the libraries.
 
-* [Cray MPICH][ref-communication-cray-mpich]
-* [MPICH][ref-communication-mpich]
-* [OpenMPI][ref-communication-openmpi]
-* [NCCL][ref-communication-nccl]
-* [RCCL][ref-communication-rccl]
-* [libfabric][ref-communication-libfabric]
+<div class="grid cards" markdown>
+
+-   __Low Level__
+
+    learn about the base installation libfabric and its dependencies
+
+    [:octicons-arrow-right-24: libfabric][ref-alps]
+
+</div>
+<div class="grid cards" markdown>
+
+-   __MPI__
+
+    Cray MPICH is the most optimized and best tested MPI implementation on Alps, and is used by uenv.
+
+    [:octicons-arrow-right-24: Cray MPICH][ref-communication-cray-mpich]
+
+    For compatibility in containers:
+
+    [:octicons-arrow-right-24: MPICH][ref-communication-mpich]
+
+    Also OpenMPI can be built in containers or in uenv
+
+    [:octicons-arrow-right-24: FirecREST API][ref-communication-openmpi]
+
+</div>
+<div class="grid cards" markdown>
+
+-   __Machine Learning__
+
+    NCCL and RCCL 
+
+    [:octicons-arrow-right-24: NCCL][ref-communication-nccl]
+
+    [:octicons-arrow-right-24: RCCL][ref-communication-rccl]
+
+</div>
@@ -1,16 +1,153 @@
 [](){#ref-communication-libfabric}
 # Libfabric
 
-[Libfabric](https://ofiwg.github.io/libfabric/), or Open Fabrics Interfaces (OFI), is a low level networking library that abstracts away various networking backends.
-It is used by Cray MPICH, and can be used together with OpenMPI, NCCL, and RCCL to make use of the [Slingshot network on Alps][ref-alps-hsn].
+[Libfabric](https://ofiwg.github.io/libfabric/), or Open Fabrics Interfaces (OFI), is a low-level networking library that provides an abstract interface for networks.
+Libfabric has backends for different network types, and is the interface chosen by HPE for the [Slingshot network on Alps][ref-alps-hsn], and by AWS for their [EFA network interface](https://aws.amazon.com/hpc/efa/).
+
+To fully take advantage of the network on Alps:
+
+* libfabric and its dependencies must be availailable in your environment (uenv or container);
+* and, communication libraries like Cray MPICH, OpenMPI, NCCL, and RCCL have to be built or configured to use libfabric.
+
+??? question "What about UCX?"
+    [Unified Communication X (UCX)](https://openucx.org/) is a low level library that targets the same layer as libfabric.
+    Specifically, it provides an open, standards-based, networking API.
+
+    By targetting UCX and libfabric, MPI and NCCL do not need to implement low-level support for each network hardware.
+
+    A downside of having two standards instead of one, is that pre-built software (for example Conda packages and Containers) have versions of MPI built for UCX, which does not provide a back end for Slingshot 11.
+    Trying to run these images will lead to errors, or very poor performance.
 
 ## Using libfabric
 
+### uenv
+
 If you are using a uenv provided by CSCS, such as [prgenv-gnu][ref-uenv-prgenv-gnu], [Cray MPICH][ref-communication-cray-mpich] is linked to libfabric and the high speed network will be used.
 No changes are required in applications.
 
-If you are using containers, the system libfabric can be loaded into your container using the [CXI hook provided by the container engine][ref-ce-cxi-hook].
-Using the hook is essential to make full use of the Alps network.
+### Container Engine
+
+If you are using [containers][ref-container-engine], the simplest approach is to load libfabric into your container using the [CXI hook provided by the container engine][ref-ce-cxi-hook].
+
+Alternatively, it is possible to build libfabric and its dependencies into your container.
+
+!!! example "Installing libfabric in a container for NVIDIA nodes"
+    The following lines demonstrate how to configure and 
+
+    Note that it is assumed that CUDA has already been installed on the system.
+    ```Dockerfile
+    # Install libfabric
+    ARG gdrcopy_version=2.5.1
+    RUN git clone --depth 1 --branch v${gdrcopy_version} https://github.com/NVIDIA/gdrcopy.git \
+        && cd gdrcopy \
+        && export CUDA_PATH=${CUDA_HOME:-$(echo $(which nvcc) | grep -o '.*cuda')} \
+        && make CC=gcc CUDA=$CUDA_PATH lib \
+        && make lib_install \
+        && cd ../ && rm -rf gdrcopy
+
+    # Install libfabric
+    ARG libfabric_version=1.22.0
+    RUN git clone --branch v${libfabric_version} --depth 1 https://github.com/ofiwg/libfabric.git \
+        && cd libfabric \
+        && ./autogen.sh \
+        && ./configure --prefix=/usr --with-cuda=/usr/local/cuda --enable-cuda-dlopen \
+           --enable-gdrcopy-dlopen --enable-efa \
+        && make -j$(nproc) \
+        && make install \
+        && ldconfig \
+        && cd .. \
+        && rm -rf libfabric
+    ```
+
+!!! todo
+    In the above recipe `CUDA_PATH` is "calculated" for gdrcopy, and just hard coded to `/usr/loca/cuda` for libfabric.
+    How about just hard-coding it everywhere, to simplify the recipe?
+
+!!! todo
+    Should we include the EFA and UCX support here? It is not needed to run on Alps, and might confuse readers.
+
+??? note "The full containerfile for GH200"
+
+    The containerfile below is based on the NVIDIA CUDA image, which provides a complete CUDA installation.
+
+    - Communication frameworks are built with explicit support for CUDA and GDRCopy.
+
+    Some additional features are enabled to increase the portability of the container to non-Alps systems:
+
+    - The libfabric [EFA](https://aws.amazon.com/hpc/efa/) provider is configured using the `--enable-efa` compatibility for derived images on AWS infrastructure.
+    - this image also packages the UCX communication framework to allow building a broader set of software (e.g. some OpenSHMEM implementations) and supporting optimized Infiniband communication as well.
+
+    ```
+    ARG ubuntu_version=24.04
+    ARG cuda_version=12.8.1
+    FROM docker.io/nvidia/cuda:${cuda_version}-cudnn-devel-ubuntu${ubuntu_version}
+
+    RUN apt-get update \
+        && DEBIAN_FRONTEND=noninteractive \
+           apt-get install -y \
+            build-essential \
+            ca-certificates \
+            pkg-config \
+            automake \
+            autoconf \
+            libtool \
+            cmake \
+            gdb \
+            strace \
+            wget \
+            git \
+            bzip2 \
+            python3 \
+            gfortran \
+            rdma-core \
+            numactl \
+            libconfig-dev \
+            libuv1-dev \
+            libfuse-dev \
+            libfuse3-dev \
+            libyaml-dev \
+            libnl-3-dev \
+            libnuma-dev \
+            libsensors-dev \
+            libcurl4-openssl-dev \
+            libjson-c-dev \
+            libibverbs-dev \
+            --no-install-recommends \
+        && rm -rf /var/lib/apt/lists/*
+
+    ARG gdrcopy_version=2.5.1
+    RUN git clone --depth 1 --branch v${gdrcopy_version} https://github.com/NVIDIA/gdrcopy.git \
+        && cd gdrcopy \
+        && export CUDA_PATH=${CUDA_HOME:-$(echo $(which nvcc) | grep -o '.*cuda')} \
+        && make CC=gcc CUDA=$CUDA_PATH lib \
+        && make lib_install \
+        && cd ../ && rm -rf gdrcopy
+
+    # Install libfabric
+    ARG libfabric_version=1.22.0
+    RUN git clone --branch v${libfabric_version} --depth 1 https://github.com/ofiwg/libfabric.git \
+        && cd libfabric \
+        && ./autogen.sh \
+        && ./configure --prefix=/usr --with-cuda=/usr/local/cuda --enable-cuda-dlopen --enable-gdrcopy-dlopen --enable-efa \
+        && make -j$(nproc) \
+        && make install \
+        && ldconfig \
+        && cd .. \
+        && rm -rf libfabric
+
+    # Install UCX
+    ARG UCX_VERSION=1.19.0
+    RUN wget https://github.com/openucx/ucx/releases/download/v${UCX_VERSION}/ucx-${UCX_VERSION}.tar.gz \
+        && tar xzf ucx-${UCX_VERSION}.tar.gz \
+        && cd ucx-${UCX_VERSION} \
+        && mkdir build \
+        && cd build \
+        && ../configure --prefix=/usr --with-cuda=/usr/local/cuda --with-gdrcopy=/usr/local --enable-mt --enable-devel-headers \
+        && make -j$(nproc) \
+        && make install \
+        && cd ../.. \
+        && rm -rf ucx-${UCX_VERSION}.tar.gz ucx-${UCX_VERSION}
+    ```
 
 ## Tuning libfabric
 
@@ -21,4 +158,4 @@
 See the [Cray MPICH known issues page][ref-communication-cray-mpich-known-issues] for issues when using Cray MPICH together with libfabric.
 
 !!! todo
-    More options?
+    - add environment variable tuning guide
@@ -0,0 +1,105 @@
+[](){#ref-ce-guidelines-images-commfwk}
+# Communication frameworks image
+
+This page describes a container image providing foundational software components for achieving efficient execution on Alps nodes with NVIDIA GPUs.
+
+The most important aspect to consider for performance of containerized applications is related to use of high-speed networks,
+therefore this image mainly installs communication frameworks and libraries, besides general utility tools.
+In particular, the [libfabric](https://ofiwg.github.io/libfabric/) framework (also known as Open Fabrics Interfaces - OFI) is required to interface applications with the Slingshot high-speed network.
+
+At runtime, the container engine [CXI hook][ref-ce-cxi-hook] will replace the libfabric libraries inside the container with the corresponding libraries on the host system.
+This will ensure access to the Slingshot interconnect.
+
+This image is not intended to be used on its own, but to serve as a base to build higher-level software (e.g. MPI implementations) and application stacks.
+For this reason, no performance results are provided in this page.
+
+A build of this image is currently hosted on the [Quay.io](https://quay.io/) registry at the following reference:
+`quay.io/ethcscs/comm-fwk:ofi1.22-ucx1.19-cuda12.8`.
+The image name `comm-fwk` is a shortened form of "communication frameworks".
+
+## Contents
+
+- Ubuntu 24.04
+- CUDA 12.8.1
+- GDRCopy 2.5.1
+- Libfabric 1.22.0
+- UCX 1.19.0
+
+## Containerfile
+```Dockerfile
+ARG ubuntu_version=24.04
+ARG cuda_version=12.8.1
+FROM docker.io/nvidia/cuda:${cuda_version}-cudnn-devel-ubuntu${ubuntu_version}
+
+RUN apt-get update \
+    && DEBIAN_FRONTEND=noninteractive \
+       apt-get install -y \
+        build-essential \
+        ca-certificates \
+        pkg-config \
+        automake \
+        autoconf \
+        libtool \
+        cmake \
+        gdb \
+        strace \
+        wget \
+        git \
+        bzip2 \
+        python3 \
+        gfortran \
+        rdma-core \
+        numactl \
+        libconfig-dev \
+        libuv1-dev \
+        libfuse-dev \
+        libfuse3-dev \
+        libyaml-dev \
+        libnl-3-dev \
+        libnuma-dev \
+        libsensors-dev \
+        libcurl4-openssl-dev \
+        libjson-c-dev \
+        libibverbs-dev \
+        --no-install-recommends \
+    && rm -rf /var/lib/apt/lists/*
+
+ARG gdrcopy_version=2.5.1
+RUN git clone --depth 1 --branch v${gdrcopy_version} https://github.com/NVIDIA/gdrcopy.git \
+    && cd gdrcopy \
+    && export CUDA_PATH=${CUDA_HOME:-$(echo $(which nvcc) | grep -o '.*cuda')} \
+    && make CC=gcc CUDA=$CUDA_PATH lib \
+    && make lib_install \
+    && cd ../ && rm -rf gdrcopy
+
+# Install libfabric
+ARG libfabric_version=1.22.0
+RUN git clone --branch v${libfabric_version} --depth 1 https://github.com/ofiwg/libfabric.git \
+    && cd libfabric \
+    && ./autogen.sh \
+    && ./configure --prefix=/usr --with-cuda=/usr/local/cuda --enable-cuda-dlopen --enable-gdrcopy-dlopen --enable-efa \
+    && make -j$(nproc) \
+    && make install \
+    && ldconfig \
+    && cd .. \
+    && rm -rf libfabric
+
+# Install UCX
+ARG UCX_VERSION=1.19.0
+RUN wget https://github.com/openucx/ucx/releases/download/v${UCX_VERSION}/ucx-${UCX_VERSION}.tar.gz \
+    && tar xzf ucx-${UCX_VERSION}.tar.gz \
+    && cd ucx-${UCX_VERSION} \
+    && mkdir build \
+    && cd build \
+    && ../configure --prefix=/usr --with-cuda=/usr/local/cuda --with-gdrcopy=/usr/local --enable-mt --enable-devel-headers \
+    && make -j$(nproc) \
+    && make install \
+    && cd ../.. \
+    && rm -rf ucx-${UCX_VERSION}.tar.gz ucx-${UCX_VERSION}
+```
+
+## Notes
+- The image is based on an official NVIDIA CUDA image, and therefore already provides the NCCL library, alongside a complete CUDA installation.
+- Communication frameworks are built with explicit support for CUDA and GDRCopy.
+- The libfabric [EFA](https://aws.amazon.com/hpc/efa/) provider is included to leave open the possibility to experiment with derived images on AWS infrastructure as well.
+- Although only the libfabric framework is required to support Alps' Slingshot network, this image also packages the UCX communication framework to allow building a broader set of software (e.g. some OpenSHMEM implementations) and supporting optimized Infiniband communication as well.