Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions packages/backend/src/workers/provider/LlamaCppPython.ts
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,12 @@ export class LlamaCppPython extends InferenceProvider {
Type: 'bind',
});

devices.push({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This is a flag not a path to share, what is the rationale to do that ?

Copy link
Author

@limam-B limam-B Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a CDI (Container Device Interface) device identifier. Podman uses nvidia.com/gpu=all as a CDI spec name to automatically mount all NVIDIA GPU devices.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on that link - i'm not that excpert - i belielve that Podman Devices array accepts CDI device names like nvidia.com/gpu=all in PathOnHost , so when Podman sees such format, it automatically resolves it via CDI and mounts all GPU devices.
This is the same pattern used for Linux (as screen shoot above).
The alternative would be using the --device CLI flag, but since we're using the API, this is the equivalent approach.

PathOnHost: 'nvidia.com/gpu=all',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what happens if the podman machine do not have the nvidia CDI installed?

Copy link
Author

@limam-B limam-B Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess If CDI isn't configured, Podman will fail to resolve nvidia.com/gpu=all and the container won't start.
But users enabling GPU support should have nvidia-container-toolkit installed which generates the CDI spec.
Maybe should add a check like the Linux case does with isNvidiaCDIConfigured()?
"I'll confirm by reproducing this scenario, drop more later on.

Copy link
Author

@limam-B limam-B Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Scenario: What happens without CDI?

Check current CDI status in Podman machine

podman machine ssh cat /etc/cdi/nvidia.yaml
file exists, CDI configured.

Temporarily disable CDI

  1. SSH into the Podman machine

podman machine ssh

  1. Disable/Backup the CDI config

sudo mv /etc/cdi/nvidia.yaml /etc/cdi/nvidia.yaml.disabled

  1. Exit the SSH session

exit


Test Results

Inference server with [ GPU ENABLED | no CDI ] in AI Lab.

image

Inference server with [ GPU DISABLED | no CDI ] in AI Lab.

image

Why this behavior is correct:

Thanks to the conditional checks at:

image image

The CDI device is only added when GPU is explicitly enabled in settings.


Conclusion:

This is the correct behavior since RamaLama requires CDI:
https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md

  • CPU mode is unaffected (no CDI device added when GPU is disabled)
  • GPU mode gives clear error when CDI is missing
  • GPU mode works when CDI is properly configured
  • RamaLama requires CDI Documented
  • AI Lab Extension requires CDI Documented

Background:

The "magic trick" in #1824 worked with the old ai-lab-playground-chat-cuda image (CUDA embedded).
RamaLama images expect CDI injection instead , this change happened in e34d59f.

We should update the AI Lab documentation to mention CDI is required for WSL GPU support. Maybe?

PathInContainer: '',
CgroupPermissions: '',
});

devices.push({
PathOnHost: '/dev/dxg',
PathInContainer: '/dev/dxg',
Expand Down