Skip to content

Enabling GPUs (--gpus=all) messes up the library search path inside docker #156

@leofang

Description

@leofang

Not sure if this is the right repository to file a bug, please kindly point me to the right place if it isn't.

1. Issue or feature description

We are hunting down an error in the internal Blossom CI, and we find that --gpus=all causes a failure when loading one of the Python standard libraries.

2. Steps to reproduce the issue

This works fine:

$ docker run -it --rm quay.io/pypa/manylinux2014_x86_64
[root@107e83787741 /]# /opt/python/cp38-cp38/bin/python3 -c "import _sqlite3"
[root@107e83787741 /]# /opt/python/cp39-cp39/bin/python3 -c "import _sqlite3"
[root@107e83787741 /]# /opt/python/cp310-cp310/bin/python3 -c "import _sqlite3"

but once we enable GPUs it fails with Python 3.10:

$ docker run -it --rm --gpus=all quay.io/pypa/manylinux2014_x86_64
[root@cbf428498785 /]# /opt/python/cp38-cp38/bin/python3 -c "import _sqlite3"
[root@cbf428498785 /]# /opt/python/cp39-cp39/bin/python3 -c "import _sqlite3"
[root@cbf428498785 /]# /opt/python/cp310-cp310/bin/python3 -c "import _sqlite3"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /opt/python/cp310-cp310/lib/python3.10/lib-dynload/_sqlite3.cpython-310-x86_64-linux-gnu.so: undefined symbol: sqlite3_trace_v2

We've figured out why 3.8 & 3.9 work fine (due to the search path being messed up, they ended up fetching libsqlite3.so.0 from elsewhere inside the container), but it's irrelevant here.

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
  • Kernel version from uname -a
  • Any relevant kernel output lines from dmesg
  • Driver information from nvidia-smi -a
  • Docker version from docker version
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
  • NVIDIA container library version from nvidia-container-cli -V
  • NVIDIA container library logs (see troubleshooting)
  • Docker command, image and tag used

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions