Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 46 additions & 6 deletions jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai create github issue to capture

jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu (1)
189-190: Consistent use of extra PyPI index is pragmatic but warrants a comment on sustainability.

The --extra-index-url https://pypi.org/simple pattern appears consistently across Power-specific builds to compensate for gaps in the AIPCC index (missing cython, numpy, protobuf, etc.). While this is an appropriate workaround, consider adding a top-level comment in the file (near the first occurrence or in the file header) documenting that this is a temporary mitigation for AIPCC index limitations and should be revisited if/when the AIPCC index coverage improves.

Also applies to: 241-246, 347-348, 361-363, 428-428

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to capture the review comment about documenting the --extra-index-url workaround

gh issue create \
  --repo opendatahub-io/notebooks \
  --title "Document temporary AIPCC index workaround in Dockerfile" \
  --body "## Context

This issue tracks the suggestion from PR #2678 to add documentation about the use of \`--extra-index-url https://pypi.org/simple\` as a temporary workaround for gaps in the AIPCC index.

## Description

The \`--extra-index-url https://pypi.org/simple\` pattern appears consistently across Power-specific builds (ppc64le/s390x) in \`jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu\` to compensate for missing packages in the AIPCC index (e.g., cython, numpy>=1.22, protobuf>=4.25.1, cryptography dependencies).

## Requested Action

Add a top-level comment in the Dockerfile (near the first occurrence or in the file header) documenting that:
- This is a temporary mitigation for AIPCC index limitations
- It should be revisited if/when the AIPCC index coverage improves
- List the known missing packages that necessitate this workaround

## Affected Lines

The pattern appears at multiple locations:
- Lines 189-190 (pyarrow builder)
- Lines 241-246 (onnx builder)
- Lines 347-348 (pyarrow wheel installation)
- Lines 361-363 (onnx wheel installation)
- Line 428 (pylock.toml installation)

## References

- PR: https://github.com/opendatahub-io/notebooks/pull/2678
- Review comment: https://github.com/opendatahub-io/notebooks/pull/2678#discussion_r2527010335
- Requested by: @jiridanek" \
  --assignee jiridanek

Length of output: 55


@jiridanek I've created GitHub issue #2679 to capture this documentation improvement: #2679

The issue documents the need to add a top-level comment in the Dockerfile explaining that the --extra-index-url https://pypi.org/simple pattern is a temporary workaround for AIPCC index limitations, along with references to the affected lines and known missing packages (cython, numpy>=1.22, protobuf>=4.25.1, cryptography dependencies).

Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@ EOF
# cpu-base #
####################
FROM ${BASE_IMAGE} AS cpu-base
USER 0
RUN /bin/bash <<'EOF'
set -Eeuxo pipefail
if command -v subscription-manager &> /dev/null; then
subscription-manager identity &>/dev/null && subscription-manager refresh || echo "Not registered, skipping refresh."
fi
EOF

WORKDIR /opt/app-root/bin

Expand Down Expand Up @@ -144,7 +151,13 @@ RUN --mount=type=cache,target=/root/.cache/pip \
set -Eeuxo pipefail
if [ "$TARGETARCH" = "ppc64le" ] || [ "$TARGETARCH" = "s390x" ]; then
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ pybind11-devel wget
packages=(
cmake make gcc-c++ pybind11-devel wget
# pyarrow -DARROW_S3=ON pulls in AWS C++ SDK
# libcurl-devel is required to build AWS C++ SDK
libcurl-devel openssl-devel
)
dnf install -y "${packages[@]}"
dnf clean all
# Build and collect pyarrow wheel
git clone --depth 1 --branch "apache-arrow-17.0.0" https://github.com/apache/arrow.git
Expand Down Expand Up @@ -173,7 +186,8 @@ if [ "$TARGETARCH" = "ppc64le" ] || [ "$TARGETARCH" = "s390x" ]; then
make -j$(nproc) VERBOSE=1
make install -j$(nproc)
cd ../../python
pip install --no-cache-dir -r requirements-build.txt
# aipcc index is missing cython, and maybe more
pip install --no-cache-dir --extra-index-url https://pypi.org/simple -r requirements-build.txt
PYARROW_WITH_PARQUET=1 \
PYARROW_WITH_DATASET=1 \
PYARROW_WITH_FILESYSTEM=1 \
Expand Down Expand Up @@ -224,10 +238,12 @@ if [ "${TARGETARCH}" = "ppc64le" ]; then
cd onnx
git checkout ${ONNX_VERSION}
git submodule update --init --recursive
pip install --no-cache-dir -r requirements.txt
# aipcc index does not have numpy>=1.22
pip install --no-cache-dir --extra-index-url https://pypi.org/simple -r requirements.txt
CMAKE_ARGS="-DPython3_EXECUTABLE=$(which python3.12)"
export CMAKE_ARGS
pip wheel . -w /root/onnx_wheel
# protobuf>=4.25.1
pip wheel --extra-index-url https://pypi.org/simple . -w /root/onnx_wheel
else
echo "Skipping ONNX build on non-Power"
mkdir -p /root/onnx_wheel
Expand Down Expand Up @@ -328,7 +344,8 @@ COPY --from=pyarrow-builder /tmp/wheels /tmp/wheels
RUN /bin/bash <<'EOF'
set -Eeuxo pipefail
if [ "$TARGETARCH" = "ppc64le" ] || [ "$TARGETARCH" = "s390x" ]; then
pip install --no-cache-dir /tmp/wheels/*.whl
# aipcc is lacking numpy>=1.16.6
pip install --no-cache-dir --extra-index-url https://pypi.org/simple /tmp/wheels/*.whl
else
echo "Skipping wheel install for $TARGETARCH"
fi
Expand All @@ -342,7 +359,8 @@ COPY --from=onnx-builder /root/onnx_wheel/ /onnxwheels/
RUN /bin/bash <<'EOF'
set -Eeuxo pipefail
if [ "${TARGETARCH}" = "ppc64le" ]; then
pip install --no-cache-dir /onnxwheels/*.whl
# aipcc is sure to lack something, so add extra index preemptively
pip install --no-cache-dir --extra-index-url https://pypi.org/simple /onnxwheels/*.whl
else
echo "Skipping ONNX/OpenBLAS install on non-Power"
fi
Expand All @@ -368,6 +386,26 @@ else
fi
EOF

RUN /bin/bash <<'EOF'
set -Eeuxo pipefail
if [ "${TARGETARCH}" = "ppc64le" ]; then
packages=(
# required to compile pillow
zlib-devel libjpeg-turbo-devel
# optional pillow deps https://pillow.readthedocs.io/en/latest/installation/building-from-source.html#external-libraries
#libtiff-devel libwebp-devel openjpeg2-devel lcms2-devel freetype-devel
#libimagequant-devel harfbuzz-devel fribidi-devel

# required to compile maturin
openssl-devel

# required to compile scikit-learn
gcc-gfortran
)
dnf install -y "${packages[@]}"
fi
EOF

USER 1001:0

# Install Python packages and Jupyterlab extensions from pylock.toml
Expand All @@ -382,10 +420,12 @@ echo "Installing software and packages"
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
if [ "$TARGETARCH" = "ppc64le" ] || [ "$TARGETARCH" = "s390x" ]; then
# We need special flags and environment variables when building packages
# aipcc does not have some dependencies to build cryptography==43.0.3 on ppc64le, so we need to use pypi.org/simple
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress \
--verify-hashes --compile-bytecode --index-strategy=unsafe-best-match \
--extra-index-url https://pypi.org/simple \
--requirements=./pylock.toml
else
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
Expand Down
Loading