@@ -164,11 +164,14 @@ Check models below.
164164
165165## Download Model Files
166166
167- Download ` FP16 ` quantized .gguf files from:
167+ Download ` FP16 ` quantized ` Llama-3 ` .gguf files from:
168168- https://huggingface.co/beehive-lab/Llama-3.2-1B-Instruct-GGUF-FP16
169169- https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16
170170- https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16
171171
172+ Download ` FP16 ` quantized ` Mistral ` .gguf files from:
173+ - https://huggingface.co/collections/beehive-lab/mistral-gpullama3java-684afabb206136d2e9cd47e0
174+
172175Please be gentle with [ huggingface.co] ( https://huggingface.co ) servers:
173176
174177** Note** FP16 models are first-class citizens for the current version.
@@ -181,6 +184,9 @@ wget https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16/resolve/
181184
182185# Llama 3 (8B) - FP16
183186wget https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16/resolve/main/beehive-llama-3.2-8b-instruct-fp16.gguf
187+
188+ # Mistral (7B) - FP16
189+ wget https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/Mistral-7B-Instruct-v0.3.fp16.gguf
184190```
185191
186192** [ Experimental] ** you can download the Q8 and Q4 used in the original implementation of Llama3.java, but for now are going to be dequanted to FP16 for TornadoVM support:
@@ -201,7 +207,7 @@ curl -L -O https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/
201207
202208## Running ` llama-tornado `
203209
204- To execute Llama3 models with TornadoVM on GPUs use the ` llama-tornado ` script with the ` --gpu ` flag.
210+ To execute Llama3, or Mistral models with TornadoVM on GPUs use the ` llama-tornado ` script with the ` --gpu ` flag.
205211
206212### Usage Examples
207213
@@ -246,11 +252,11 @@ First, check your GPU specifications. If your GPU has high memory capacity, you
246252
247253### GPU Memory Requirements by Model Size
248254
249- | Model Size | Recommended GPU Memory |
250- | ------------| ------------------------|
251- | 1B models | 7GB (default) |
252- | 3B models | 15GB |
253- | 8B models | 20GB+ |
255+ | Model Size | Recommended GPU Memory |
256+ | ------------- | ------------------------|
257+ | 1B models | 7GB (default) |
258+ | 3-7B models | 15GB |
259+ | 8B models | 20GB+ |
254260
255261** Note** : If you still encounter memory issues, try:
256262
@@ -288,6 +294,7 @@ LLaMA Configuration:
288294 Maximum number of tokens to generate (default: 512)
289295 --stream STREAM Enable streaming output (default: True)
290296 --echo ECHO Echo the input prompt (default: False)
297+ --suffix SUFFIX Suffix for fill-in-the-middle request (Codestral) (default: None)
291298
292299Mode Selection:
293300 -i, --interactive Run in interactive/chat mode (default: False)
0 commit comments