You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/faq.md
+20Lines changed: 20 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -333,3 +333,23 @@ The currently available K/V cache quantization types are:
333
333
How much the cache quantization impacts the model's response quality will depend on the model and the task. Models that have a high GQA count (e.g. Qwen2) may see a larger impact on precision from quantization than models with a low GQA count.
334
334
335
335
You may need to experiment with different quantization types to find the best balance between memory usage and quality.
336
+
337
+
338
+
339
+
## How do I bypass available memory check before loading a model?
340
+
341
+
By default, Ollama checks if your system has sufficient available memory before loading a model to prevent out-of-memory errors that could crash your system or cause instability.
342
+
You can bypass this safety check by setting the OLLAMA_SKIP_MEMORY_CHECK environment variable to 1.
343
+
344
+
### When to use this option
345
+
346
+
- You have swap space configured and accept slower performance
347
+
- You're running on a system with non-standard memory reporting
348
+
- You're debugging memory-related issues
349
+
- You understand the risks and have adequate system monitoring
350
+
351
+
### Important Warnings
352
+
353
+
- System instability: Loading models without sufficient memory can cause system freezes or crashes
354
+
- Performance degradation: Your system may become unresponsive due to excessive swapping
355
+
- Data loss risk: System crashes could result in unsaved work being lost
Copy file name to clipboardExpand all lines: envconfig/config.go
+9Lines changed: 9 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -226,6 +226,12 @@ var (
226
226
MaxQueue=Uint("OLLAMA_MAX_QUEUE", 512)
227
227
)
228
228
229
+
var (
230
+
// Bypass the memory check during model load. This is an expert only setting, to be used under situations where the system is guaranteedAdd commentMore actions
231
+
// to get the have enough memory or is able to procure this at runtime by evicting blocks from caches. e.g ZFS Arc Cache.
"NO_PROXY": {"NO_PROXY", String("NO_PROXY")(), "No proxy"},
284
+
285
+
//Overrides
286
+
"OLLAMA_SKIP_MEMORY_CHECK": {"OLLAMA_SKIP_MEMORY_CHECK", AvailableMemoryCheckOverride(), "Bypass checking for available memory before loading models. (e.g. OLLAMA_SKIP_MEMORY_CHECK=1)"},
slog.Warn("model request too large for system", "requested", format.HumanBytes2(systemMemoryRequired), "available", available, "total", format.HumanBytes2(systemTotalMemory), "free", format.HumanBytes2(systemFreeMemory), "swap", format.HumanBytes2(systemSwapFreeMemory))
171
-
returnnil, fmt.Errorf("model requires more system memory (%s) than is available (%s)", format.HumanBytes2(systemMemoryRequired), format.HumanBytes2(available))
164
+
// Env variable to bypass ollama's memory check guardrail.
slog.Warn("failure while computing ZFS Arc cache size:", "error", err)
182
+
}
183
+
}
184
+
ifsystemMemoryRequired>available {
185
+
slog.Warn("model request too large for system", "requested", format.HumanBytes2(systemMemoryRequired), "available", available, "total", format.HumanBytes2(systemTotalMemory), "free", format.HumanBytes2(systemFreeMemory), "swap", format.HumanBytes2(systemSwapFreeMemory))
186
+
returnnil, fmt.Errorf("model requires more system memory (%s) than is available (%s)", format.HumanBytes2(systemMemoryRequired), format.HumanBytes2(available))
0 commit comments