Skip to content

Commit ea4ca57

Browse files
committed
Update README.md for Qwen3VL example(Thinking/No Thinking)
Signed-off-by: JamePeng <jame_peng@sina.com>
1 parent 5c34564 commit ea4ca57

File tree

1 file changed

+95
-0
lines changed

1 file changed

+95
-0
lines changed

README.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,7 @@ Below are the supported multi-modal models and their respective chat handlers (P
507507
| [minicpm-v-2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) | `MiniCPMv26ChatHandler` | `minicpm-v-2.6`, `minicpm-v-4.0` |
508508
| [gemma3](https://huggingface.co/unsloth/gemma-3-27b-it-GGUF) | `Gemma3ChatHandler` | `gemma3` |
509509
| [qwen2.5-vl](https://huggingface.co/unsloth/Qwen2.5-VL-3B-Instruct-GGUF) | `Qwen25VLChatHandler` | `qwen2.5-vl` |
510+
| [qwen3-vl](https://huggingface.co/unsloth/Qwen3-VL-8B-Thinking-GGUF) | `Qwen3VLChatHandler` | `qwen3-vl` |
510511

511512
Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
512513

@@ -600,6 +601,100 @@ messages = [
600601

601602
</details>
602603

604+
<details>
605+
<summary>Loading a Local Image With Qwen3VL(Thinking/No Thinking)</summary>
606+
607+
This script demonstrates how to load a local image, encode it as a base64 Data URI, and pass it to a local Qwen3-VL model (with the 'use_think_prompt' parameter enabled for thinking model, disabled for instruct model) for processing using the llama-cpp-python library.
608+
609+
```python
610+
# Import necessary libraries
611+
from llama_cpp import Llama
612+
from llama_cpp.llama_chat_format import Qwen3VLChatHandler
613+
import base64
614+
import os
615+
616+
# --- Model Configuration ---
617+
# Define the path to the main model file
618+
MODEL_PATH = r"./Qwen3-VL-8B-Thinking-F16.gguf"
619+
# Define the path to the multi-modal projector file
620+
MMPROJ_PATH = r"./mmproj-Qwen3-VL-8b-Thinking-F16.gguf"
621+
622+
# --- Initialize the Llama Model ---
623+
llm = Llama(
624+
model_path=MODEL_PATH,
625+
# Set up the chat handler for Qwen3-VL, specifying the projector path
626+
chat_handler=Qwen3VLChatHandler(clip_model_path=MMPROJ_PATH, use_think_prompt=True),
627+
n_gpu_layers=-1, # Offload all layers to the GPU
628+
n_ctx=10240, # Set the context window size
629+
swa_full=True,
630+
)
631+
632+
# --- Helper Function to Convert Image to Base64 Data URI ---
633+
def image_to_base64_data_uri(file_path):
634+
"""
635+
Reads an image file, determines its MIME type, and converts it
636+
to a base64 encoded Data URI.
637+
"""
638+
# Get the file extension to determine MIME type
639+
extension = os.path.splitext(file_path)[1].lower()
640+
641+
# Determine the MIME type based on the file extension
642+
if extension == '.png':
643+
mime_type = 'image/png'
644+
elif extension in ('.jpg', '.jpeg'):
645+
mime_type = 'image/jpeg'
646+
elif extension == '.gif':
647+
mime_type = 'image/gif'
648+
elif extension == '.svg':
649+
mime_type = 'image/svg+xml'
650+
else:
651+
# Use a generic stream type for unsupported formats
652+
mime_type = 'application/octet-stream'
653+
print(f"Warning: Unsupported image type for file: {file_path}. Using a generic MIME type.")
654+
655+
# Read the image file in binary mode
656+
with open(file_path, "rb") as img_file:
657+
# Encode the binary data to base64 and decode to UTF-8
658+
base64_data = base64.b64encode(img_file.read()).decode('utf-8')
659+
# Format as a Data URI string
660+
return f"data:{mime_type};base64,{base64_data}"
661+
662+
# --- Main Logic for Image Processing ---
663+
664+
# 1. Create a list containing all image paths
665+
image_paths = [
666+
r'./scene.jpeg',
667+
# Add more image paths here if needed
668+
]
669+
670+
# 2. Create an empty list to store the message objects (images and text)
671+
images_messages = []
672+
673+
# 3. Loop through the image path list, convert each image to a Data URI,
674+
# and add it to the message list as an image_url object.
675+
for path in image_paths:
676+
data_uri = image_to_base64_data_uri(path)
677+
images_messages.append({"type": "image_url", "image_url": {"url": data_uri}})
678+
679+
# 4. Add the final text prompt at the end of the list
680+
images_messages.append({"type": "text", "text": "Describes the images."})
681+
682+
# 5. Use this list to build the chat_completion request
683+
res = llm.create_chat_completion(
684+
messages=[
685+
{"role": "system", "content": "You are a AI assistant who perfectly describes images."},
686+
# The user's content is the list containing both images and text
687+
{"role": "user", "content": images_messages}
688+
]
689+
)
690+
691+
# Print the assistant's response
692+
print(res["choices"][0]["message"]["content"])
693+
694+
```
695+
696+
</details>
697+
603698
### Speculative Decoding
604699

605700
`llama-cpp-python` supports speculative decoding which allows the model to generate completions based on a draft model.

0 commit comments

Comments
 (0)