Ollama backend, but all on CPU #2107
Unanswered
StevenD07
asked this question in
Algorithm + Paper
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I’m using Ollama as the backend for GraphRAG.
Does anyone know why my CPU usage is extremely high while GPU usage remains almost zero?
Has anyone else encountered the same issue?
ps aux | grep ollama | grep -v grep
dyding 3776687 98.6 0.4 52559636 1111420 pts/69 Sl+ Oct08 9585:32 /net/kihara/home/dyding/ollama/bin/ollama runner --model /net/kihara/home/dyding/.ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 --ctx-size 16384 --batch-size 512 --n-gpu-layers 33 --threads 24 --parallel 4 --port 46347
I think ollama is correctly deployed on GPU
(ollama) dyding@mayura:~/graphrag/net/kihara/home/dyding/ollama/bin/ollama serve
time=2025-10-20T13:50:29.347-04:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/net/kihara/home/dyding/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-10-20T13:50:29.389-04:00 level=INFO source=images.go:479 msg="total blobs: 30"
time=2025-10-20T13:50:29.418-04:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0"
time=2025-10-20T13:50:29.442-04:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11434 (version 0.9.0)"
time=2025-10-20T13:50:29.442-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-10-20T13:50:30.265-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-e78c0530-8c90-da02-5fc0-4ea5f469dcb1 library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A6000" total="47.5 GiB" available="17.7 GiB"
time=2025-10-20T13:50:30.265-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-4cdedc48-70c5-91c9-6493-51e3f262a283 library=cuda variant=v12 compute=8.6 driver=12.2 name="NVIDIA RTX A6000" total="47.5 GiB" available="16.0 GiB"
time=2025-10-20T13:50:30.265-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-b4c32ffe-4492-3901-98d2-a063fc3f5047 library=cuda variant=v12 compute=7.5 driver=12.2 name="Quadro RTX 8000" total="47.5 GiB" available="18.5 GiB"
time=2025-10-20T13:50:30.265-04:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-ee9d5906-deef-f6ea-0703-3ca87b5983a5 library=cuda variant=v12 compute=7.5 driver=12.2 name="Quadro RTX 8000" total="47.5 GiB" available="47.3 GiB"
Beta Was this translation helpful? Give feedback.
All reactions