Skip to content

Commit 04656b1

Browse files
authored
Merge branch 'JamePeng:main' into main
2 parents ade9fc8 + 67baa01 commit 04656b1

File tree

1 file changed

+10
-26
lines changed

1 file changed

+10
-26
lines changed

README.md

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<img src="https://raw.githubusercontent.com/abetlen/llama-cpp-python/main/docs/icon.svg" style="height: 5rem; width: 5rem">
33
</p>
44

5-
# Python Bindings for [`llama.cpp`](https://github.com/ggerganov/llama.cpp)
5+
# Python Bindings for [`llama.cpp`](https://github.com/ggml-org/llama.cpp)
66

77
[![Documentation Status](https://readthedocs.org/projects/llama-cpp-python/badge/?version=latest)](https://llama-cpp-python.readthedocs.io/en/latest/?badge=latest)
88
[![Tests](https://github.com/abetlen/llama-cpp-python/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/abetlen/llama-cpp-python/actions/workflows/test.yaml)
@@ -12,7 +12,7 @@
1212
[![PyPI - Downloads](https://static.pepy.tech/badge/llama-cpp-python/month)](https://pepy.tech/projects/llama-cpp-python)
1313
[![Github All Releases](https://img.shields.io/github/downloads/abetlen/llama-cpp-python/total.svg?label=Github%20Downloads)]()
1414

15-
Simple Python bindings for **@ggerganov's** [`llama.cpp`](https://github.com/ggerganov/llama.cpp) library.
15+
Simple Python bindings for **@ggerganov's** [`llama.cpp`](https://github.com/ggml-org/llama.cpp) library.
1616
This package provides:
1717

1818
- Low-level access to C API via `ctypes` interface.
@@ -32,7 +32,7 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
3232

3333
Requirements:
3434

35-
- Python 3.8+
35+
- Python 3.9+
3636
- C compiler
3737
- Linux: gcc or clang
3838
- Windows: Visual Studio or MinGW
@@ -125,27 +125,11 @@ CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
125125

126126
It is also possible to install a pre-built wheel with CUDA support. As long as your system meets some requirements:
127127

128-
- CUDA Version is 12.1, 12.2, 12.3, 12.4 or 12.5
129-
- Python Version is 3.10, 3.11 or 3.12
130-
131-
```bash
132-
pip install llama-cpp-python \
133-
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>
134-
```
128+
- CUDA Version is 12.4, 12.6 or 12.8
129+
- Python Version is 3.10, 3.11, 3.12 or 3.13
135130

136-
Where `<cuda-version>` is one of the following:
137-
- `cu121`: CUDA 12.1
138-
- `cu122`: CUDA 12.2
139-
- `cu123`: CUDA 12.3
140-
- `cu124`: CUDA 12.4
141-
- `cu125`: CUDA 12.5
142-
143-
For example, to install the CUDA 12.1 wheel:
144-
145-
```bash
146-
pip install llama-cpp-python \
147-
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
148-
```
131+
Check the releases page:
132+
https://github.com/JamePeng/llama-cpp-python/releases
149133

150134
</details>
151135

@@ -602,9 +586,9 @@ messages = [
602586
</details>
603587

604588
<details>
605-
<summary>Loading a Local Image With Qwen3VL(Thinking/No Thinking)</summary>
589+
<summary>Loading a Local Image With Qwen3VL(Thinking/Instruct)</summary>
606590

607-
This script demonstrates how to load a local image, encode it as a base64 Data URI, and pass it to a local Qwen3-VL model (with the 'use_think_prompt' parameter enabled for thinking model, disabled for instruct model) for processing using the llama-cpp-python library.
591+
This script demonstrates how to load a local image, encode it as a base64 Data URI, and pass it to a local Qwen3-VL model (with the 'force_reasoning' parameter enabled for thinking model, disabled for instruct model) for processing using the llama-cpp-python library.
608592

609593
```python
610594
# Import necessary libraries
@@ -623,7 +607,7 @@ MMPROJ_PATH = r"./mmproj-Qwen3-VL-8b-Thinking-F16.gguf"
623607
llm = Llama(
624608
model_path=MODEL_PATH,
625609
# Set up the chat handler for Qwen3-VL, specifying the projector path
626-
chat_handler=Qwen3VLChatHandler(clip_model_path=MMPROJ_PATH, use_think_prompt=True),
610+
chat_handler=Qwen3VLChatHandler(clip_model_path=MMPROJ_PATH, force_reasoning=True),
627611
n_gpu_layers=-1, # Offload all layers to the GPU
628612
n_ctx=10240, # Set the context window size
629613
swa_full=True,

0 commit comments

Comments
 (0)