Skip to content

Commit 11dee17

Browse files
committed
Update doc and link new doc file.
This is not finished documentation
1 parent 7a16be5 commit 11dee17

File tree

3 files changed

+43
-131
lines changed

3 files changed

+43
-131
lines changed

docs/Examples/LLavaInteractiveModeExecute.md

Lines changed: 0 additions & 129 deletions
This file was deleted.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# MTMD interactive mode
2+
3+
`MtmdInteractiveModeExecute` shows how to pair a multimodal projection with a text model so the chat loop can reason over images supplied at runtime. The sample lives in `LLama.Examples/Examples/MtmdInteractiveModeExecute.cs` and reuses the interactive executor provided by LLamaSharp.
4+
5+
## Workflow
6+
- Resolve the model, multimodal projection, and sample image paths via `UserSettings`.
7+
- Create `ModelParams` for the text model and capture the MTMD defaults with `MtmdContextParams.Default()`.
8+
- Load the base model and context, then initialize `SafeMtmdWeights` with the multimodal projection file.
9+
- Ask the helper for a media marker (`mtmdParameters.MediaMarker ?? NativeApi.MtmdDefaultMarker() ?? "<media>"`) and feed it into an `InteractiveExecutor`.
10+
11+
```cs
12+
var mtmdParameters = MtmdContextParams.Default();
13+
14+
using var model = await LLamaWeights.LoadFromFileAsync(parameters);
15+
using var context = model.CreateContext(parameters);
16+
17+
// Mtmd Init
18+
using var clipModel = await SafeMtmdWeights.LoadFromFileAsync(
19+
multiModalProj,
20+
model,
21+
mtmdParameters);
22+
23+
var mediaMarker = mtmdParameters.MediaMarker
24+
?? NativeApi.MtmdDefaultMarker()
25+
?? "<media>";
26+
27+
var ex = new InteractiveExecutor(context, clipModel);
28+
```
29+
30+
## Handling user input
31+
- Prompts can include image paths wrapped in braces (for example `{c:/image.jpg}`); the loop searches for those markers with regular expressions.
32+
- Every referenced file is loaded through `SafeMtmdWeights.LoadMedia`, producing `SafeMtmdEmbed` instances that are queued for the next tokenization call.
33+
- When the user provides images, the executor clears its KV cache (`MemorySequenceRemove`) before replacing each brace-wrapped path in the prompt with the multimodal marker.
34+
- The embeds collected for the current turn are copied into `ex.Embeds`, so the executor submits both the text prompt and the pending media to the helper before generation.
35+
36+
## Running the sample
37+
1. Ensure the model and projection paths returned by `UserSettings` exist locally.
38+
2. Start the example (for instance from the examples host application) and observe the initial description printed to the console.
39+
3. Type text normally, or reference new images by including their path inside braces. Type `/exit` to end the conversation.
40+
41+
This walkthrough mirrors the logic in the sample so you can adapt it for your own multimodal workflows.

mkdocs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ nav:
3838
- Interactive executor - basic: Examples/InteractiveModeExecute.md
3939
- Kernel memory integration - basic: Examples/KernelMemory.md
4040
- Kernel-memory - save & load: Examples/KernelMemorySaveAndLoad.md
41-
- LLaVA - basic: Examples/LLavaInteractiveModeExecute.md
41+
- MTMD interactive: Examples/MtmdInteractiveModeExecute.md
4242
- ChatSession - load & save: Examples/LoadAndSaveSession.md
4343
- Executor - save/load state: Examples/LoadAndSaveState.md
4444
- Quantization: Examples/QuantizeModel.md
@@ -254,4 +254,4 @@ markdown_extensions:
254254
custom_checkbox: true
255255
- pymdownx.tilde
256256
- pymdownx.tabbed:
257-
alternate_style: true
257+
alternate_style: true

0 commit comments

Comments
 (0)