Skip to content

Commit 4887326

Browse files
authored
Update openvino in modelui model list (#26)
# ADDED / CHANGES Added openvino models to `modelui.py` list - `EmbeddedLLM/Meta-Llama-3.1-8B-Instruct-int4-sym-ov` - `EmbeddedLLM/Phi-3-mini-4k-instruct-int4-sym-ov` - `OpenVINO/Phi-3-mini-4k-instruct-int8-ov` - `EmbeddedLLM/Phi-3-mini-128k-instruct-int4-ov-model` - `OpenVINO/Phi-3-mini-128k-instruct-int8-ov` - `EmbeddedLLM/Phi-3-medium-4k-instruct-int4-sym-ov` - `OpenVINO/Phi-3-medium-4k-instruct-int8-ov` - `EmbeddedLLM/Phi-3-medium-128k-instruct-int4-sym-ov` - `EmbeddedLLM/Qwen2-7B-Instruct-int4-sym-ov` - `OpenVINO/Mistral-7B-Instruct-v0.2-int4-ov` - `OpenVINO/Mistral-7B-Instruct-v0.2-int8-ov` - `EmbeddedLLM/Mistral-7B-Instruct-v0.3-int4-sym-ov` - `OpenVINO/open_llama_3b_v2-int8-ov` - `OpenVINO/open_llama_7b_v2-int4-ov` - `OpenVINO/open_llama_7b_v2-int8-ov` - `OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov` - `OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov` - `EmbeddedLLM/neural-chat-7b-v1-1-int4-sym-ov` - `OpenVINO/neural-chat-7b-v1-1-int8-ov` - `OpenVINO/starcoder2-15b-int4-ov` - `OpenVINO/starcoder2-15b-int8-ov` - `OpenVINO/persimmon-8b-chat-int4-ov` - `OpenVINO/persimmon-8b-chat-int8-ov` - `EmbeddedLLM/RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov`
1 parent 25fcb76 commit 4887326

File tree

5 files changed

+281
-41
lines changed

5 files changed

+281
-41
lines changed

README.md

Lines changed: 6 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
77
| Model architectures | Gemma <br/> Llama \* <br/> Mistral + <br/>Phi <br/> | | |
88
| Platform | Linux <br/> Windows | | |
99
| Architecture | x86 <br/> x64 <br/> | Arm64 | |
10-
| Hardware Acceleration | CUDA<br/>DirectML<br/>IpexLLM | QNN <br/> ROCm | OpenVINO |
10+
| Hardware Acceleration | CUDA<br/>DirectML<br/>IpexLLM<br/>OpenVINO | QNN <br/> ROCm | |
1111

1212
\* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
1313

@@ -21,9 +21,6 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
2121
## Table Content
2222

2323
- [Supported Models](#supported-models-quick-start)
24-
- [Onnxruntime DirectML Models](./docs/model/onnxruntime_directml_models.md)
25-
- [Onnxruntime CPU Models](./docs/model/onnxruntime_cpu_models.md)
26-
- [Ipex-LLM Models](./docs/model/ipex_models.md)
2724
- [Getting Started](#getting-started)
2825
- [Installation From Source](#installation)
2926
- [Launch OpenAI API Compatible Server](#launch-openai-api-compatible-server)
@@ -34,22 +31,10 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
3431
- [Acknowledgements](#acknowledgements)
3532

3633
## Supported Models (Quick Start)
37-
38-
| Models | Parameters | Context Length | Link |
39-
| --- | --- | --- | --- |
40-
| Gemma-2b-Instruct v1 | 2B | 8192 | [EmbeddedLLM/gemma-2b-it-onnx](https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx) |
41-
| Llama-2-7b-chat | 7B | 4096 | [EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml) |
42-
| Llama-2-13b-chat | 13B | 4096 | [EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml) |
43-
| Llama-3-8b-chat | 8B | 8192 | [luweigen/Llama-3-8B-Instruct-int4-onnx-directml](https://huggingface.co/luweigen/Llama-3-8B-Instruct-int4-onnx-directml) |
44-
| Mistral-7b-v0.3-instruct | 7B | 32768 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
45-
| Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4) |
46-
| Phi3-mini-4k-instruct | 3.8B | 4096 | [microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) |
47-
| Phi3-mini-128k-instruct | 3.8B | 128k | [microsoft/Phi-3-mini-128k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) |
48-
| Phi3-medium-4k-instruct | 17B | 4096 | [microsoft/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) |
49-
| Phi3-medium-128k-instruct | 17B | 128k | [microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) |
50-
| Openchat-3.6-8b | 8B | 8192 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx) |
51-
| Yi-1.5-6b-chat | 6B | 32k | [EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx](https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx) |
52-
| Phi-3-vision-128k-instruct | | 128k | [EmbeddedLLM/Phi-3-vision-128k-instruct-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4) |
34+
* Onnxruntime DirectML Models [Link](./docs/model/onnxruntime_directml_models.md)
35+
* Onnxruntime CPU Models [Link](./docs/model/onnxruntime_cpu_models.md)
36+
* Ipex-LLM Models [Link](./docs/model/ipex_models.md)
37+
* OpenVINO-LLM Models [Link](./docs/model/openvino_models.md)
5338

5439
## Getting Started
5540

@@ -122,7 +107,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
122107
123108
### Launch Chatbot Web UI
124109
125-
1. `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
110+
1. `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost --model_name <served_model_name>`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
126111
127112
![asset/ellm_chatbot_vid.webp](asset/ellm_chatbot_vid.webp)
128113

docs/model/ipex_models.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
11
# Model Powered by Ipex-LLM
22

33
## Verified Models
4+
Verified models can be found from EmbeddedLLM IpexLLM model collections
5+
* EmbeddedLLM IpexLLM Model collections: [link](https://huggingface.co/collections/EmbeddedLLM/ipex-llm-genai-66c85eadb05bb4dedd5e70ca)
6+
47
| Model | Model Link |
58
| --- | --- |
6-
| Phi-3 | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
9+
| Phi-3-mini-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
10+
| Phi-3-mini-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
11+
| Phi-3-medium-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) |
12+
| Phi-3-medium-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) |
713

814
## Supported Models by Ipex-LLM
15+
Unverified models, but supported by Upstream IpexLLM could be found in the following model collections.
916

1017
| Model | Model Link |
1118
| --- | --- |
@@ -64,6 +71,8 @@
6471

6572
Resources from: https://github.com/intel-analytics/ipex-llm/
6673

74+
## Contribution
75+
We welcome contributions to the verified model list.
6776

6877
## Qwen2 Model (Experimental)
6978
1. Upgrade `transformers`. `pip install --upgrade transformers~=4.42.3`.

docs/model/openvino_models.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Model Powered by OpenVINO-LLM
2+
3+
Unverified models, but supported by Upstream OpenVINO could be found in the following model collections.
4+
* Intel OpenVINO Model collections: [link](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd)
5+
6+
## Contribution
7+
We welcome contributions to the verified model list.
8+
9+
## Verified Models
10+
Verified models can be found from EmbeddedLLM OpenVINO model collections
11+
* EmbeddedLLM OpenVINO Model collections: [link](https://huggingface.co/collections/EmbeddedLLM/openvino-genai-66be290e5b64185087d4b624)
12+
13+
| Model | Model Link |
14+
| --- | --- |
15+
| Meta-Llama-3.1-8B-Instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Meta-Llama-3.1-8B-Instruct-int4-sym-ov/tree/main/) |
16+
| Phi-3-mini-4k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-int4-sym-ov/tree/main/) |
17+
| Phi-3-mini-4k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-mini-4k-instruct-int8-ov/tree/main/) |
18+
| Phi-3-mini-128k-instruct-int4-ov-model | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-mini-128k-instruct-int4-ov-model/tree/main/) |
19+
| Phi-3-mini-128k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-mini-128k-instruct-int8-ov/tree/main/) |
20+
| Phi-3-medium-4k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-medium-4k-instruct-int4-sym-ov/tree/main/) |
21+
| Phi-3-medium-4k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-medium-4k-instruct-int8-ov/tree/main/) |
22+
| Phi-3-medium-128k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-medium-128k-instruct-int4-sym-ov/tree/main/) |
23+
| Qwen2-7B-Instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Qwen2-7B-Instruct-int4-sym-ov/tree/main/) |
24+
| Mistral-7B-Instruct-v0.2-int4-ov | [Link](https://huggingface.co/OpenVINO/Mistral-7B-Instruct-v0.2-int4-ov/tree/main/) |
25+
| Mistral-7B-Instruct-v0.2-int8-ov | [Link](https://huggingface.co/OpenVINO/Mistral-7B-Instruct-v0.2-int8-ov/tree/main/) |
26+
| Mistral-7B-Instruct-v0.3-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Mistral-7B-Instruct-v0.3-int4-sym-ov/tree/main/) |
27+
| open_llama_3b_v2-int8-ov | [Link](https://huggingface.co/OpenVINO/open_llama_3b_v2-int8-ov/tree/main/) |
28+
| open_llama_7b_v2-int4-ov | [Link](https://huggingface.co/OpenVINO/open_llama_7b_v2-int4-ov/tree/main/) |
29+
| open_llama_7b_v2-int8-ov | [Link](https://huggingface.co/OpenVINO/open_llama_7b_v2-int8-ov/tree/main/) |
30+
| TinyLlama-1.1B-Chat-v1.0-int4-ov | [Link](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov/tree/main/) |
31+
| TinyLlama-1.1B-Chat-v1.0-int8-ov | [Link](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov/tree/main/) |
32+
| neural-chat-7b-v1-1-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/neural-chat-7b-v1-1-int4-sym-ov/tree/main/) |
33+
| neural-chat-7b-v1-1-int8-ov | [Link](https://huggingface.co/OpenVINO/neural-chat-7b-v1-1-int8-ov/tree/main/) |
34+
| starcoder2-15b-int4-ov | [Link](https://huggingface.co/OpenVINO/starcoder2-15b-int4-ov/tree/main/) |
35+
| starcoder2-15b-int8-ov | [Link](https://huggingface.co/OpenVINO/starcoder2-15b-int8-ov/tree/main/) |
36+
| persimmon-8b-chat-int4-ov | [Link](https://huggingface.co/OpenVINO/persimmon-8b-chat-int4-ov/tree/main/) |
37+
| persimmon-8b-chat-int8-ov | [Link](https://huggingface.co/OpenVINO/persimmon-8b-chat-int8-ov/tree/main/) |
38+
| RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov/tree/main/) |

src/embeddedllm/engine.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def __init__(self, model_path: str, vision: bool, device: str = "xpu", backend:
5050
self.device == "gpu" or self.device == "cpu"
5151
), f"To run openvino on cpu, set `backend` to `openvino` and `device` to `cpu`. EmbeddedLLMEngine load model with openvino on Intel processor."
5252
self.engine = OpenVinoEngine(self.model_path, self.vision, self.device)
53-
logger.info(f"Initializing openvino backend (GPU): OpenVinoEngine")
53+
logger.info(f"Initializing openvino backend ({self.device.upper()}): OpenVinoEngine")
5454
elif self.backend in ("directml", "cuda"):
5555
from embeddedllm.backend.onnxruntime_engine import OnnxruntimeEngine
5656

0 commit comments

Comments
 (0)