Update openvino in modelui model list (#26)

szeyu · web-flow · commit 4887326b6dd3 · 2024-08-28T16:16:27.000+08:00
# ADDED / CHANGES
Added openvino models to `modelui.py` list

- `EmbeddedLLM/Meta-Llama-3.1-8B-Instruct-int4-sym-ov`
- `EmbeddedLLM/Phi-3-mini-4k-instruct-int4-sym-ov`
- `OpenVINO/Phi-3-mini-4k-instruct-int8-ov`
- `EmbeddedLLM/Phi-3-mini-128k-instruct-int4-ov-model`
- `OpenVINO/Phi-3-mini-128k-instruct-int8-ov`
- `EmbeddedLLM/Phi-3-medium-4k-instruct-int4-sym-ov`
- `OpenVINO/Phi-3-medium-4k-instruct-int8-ov`
- `EmbeddedLLM/Phi-3-medium-128k-instruct-int4-sym-ov`
- `EmbeddedLLM/Qwen2-7B-Instruct-int4-sym-ov`
- `OpenVINO/Mistral-7B-Instruct-v0.2-int4-ov`
- `OpenVINO/Mistral-7B-Instruct-v0.2-int8-ov`
- `EmbeddedLLM/Mistral-7B-Instruct-v0.3-int4-sym-ov`
- `OpenVINO/open_llama_3b_v2-int8-ov`
- `OpenVINO/open_llama_7b_v2-int4-ov`
- `OpenVINO/open_llama_7b_v2-int8-ov`
- `OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov`
- `OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov`
- `EmbeddedLLM/neural-chat-7b-v1-1-int4-sym-ov`
- `OpenVINO/neural-chat-7b-v1-1-int8-ov`
- `OpenVINO/starcoder2-15b-int4-ov`
- `OpenVINO/starcoder2-15b-int8-ov`
- `OpenVINO/persimmon-8b-chat-int4-ov`
- `OpenVINO/persimmon-8b-chat-int8-ov`
- `EmbeddedLLM/RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov`
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
 | Model architectures   | Gemma <br/> Llama \* <br/> Mistral + <br/>Phi <br/> |                   |                |
 | Platform              | Linux <br/> Windows                                 |                   |                |
 | Architecture          | x86 <br/> x64 <br/>                                 | Arm64             |                |
-| Hardware Acceleration | CUDA<br/>DirectML<br/>IpexLLM                       | QNN <br/> ROCm    | OpenVINO       |
+| Hardware Acceleration | CUDA<br/>DirectML<br/>IpexLLM<br/>OpenVINO          | QNN <br/> ROCm    |                |
 
 \* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
 
@@ -21,9 +21,6 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
 ## Table Content
 
 - [Supported Models](#supported-models-quick-start)
-  - [Onnxruntime DirectML Models](./docs/model/onnxruntime_directml_models.md)
-  - [Onnxruntime CPU Models](./docs/model/onnxruntime_cpu_models.md)
-  - [Ipex-LLM Models](./docs/model/ipex_models.md)
 - [Getting Started](#getting-started)
   - [Installation From Source](#installation)
   - [Launch OpenAI API Compatible Server](#launch-openai-api-compatible-server)
@@ -34,22 +31,10 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
 - [Acknowledgements](#acknowledgements)
 
 ## Supported Models (Quick Start)
-
-| Models | Parameters | Context Length | Link |
-| --- | --- | --- | --- |
-| Gemma-2b-Instruct v1 | 2B | 8192 | [EmbeddedLLM/gemma-2b-it-onnx](https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx) |
-| Llama-2-7b-chat | 7B | 4096 | [EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml) |
-| Llama-2-13b-chat | 13B | 4096 | [EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml) |
-| Llama-3-8b-chat | 8B | 8192 | [luweigen/Llama-3-8B-Instruct-int4-onnx-directml](https://huggingface.co/luweigen/Llama-3-8B-Instruct-int4-onnx-directml) |
-| Mistral-7b-v0.3-instruct | 7B | 32768 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
-| Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4) |
-| Phi3-mini-4k-instruct | 3.8B | 4096 | [microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) |
-| Phi3-mini-128k-instruct | 3.8B | 128k | [microsoft/Phi-3-mini-128k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx) |
-| Phi3-medium-4k-instruct | 17B | 4096 | [microsoft/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml) |
-| Phi3-medium-128k-instruct | 17B | 128k | [microsoft/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml) |
-| Openchat-3.6-8b | 8B | 8192 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx) |
-| Yi-1.5-6b-chat | 6B | 32k | [EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx](https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx) |
-| Phi-3-vision-128k-instruct |  | 128k | [EmbeddedLLM/Phi-3-vision-128k-instruct-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-vision-128k-instruct-onnx/tree/main/onnx/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4) |
+  * Onnxruntime DirectML Models [Link](./docs/model/onnxruntime_directml_models.md)
+  * Onnxruntime CPU Models [Link](./docs/model/onnxruntime_cpu_models.md)
+  * Ipex-LLM Models [Link](./docs/model/ipex_models.md)
+  * OpenVINO-LLM Models [Link](./docs/model/openvino_models.md)
 
 ## Getting Started
 
@@ -122,7 +107,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
 
 ### Launch Chatbot Web UI
 
-1.  `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
+1.  `ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost --model_name <served_model_name>`. **Note:** To find out more of the supported arguments. `ellm_chatbot --help`.
 
 ![asset/ellm_chatbot_vid.webp](asset/ellm_chatbot_vid.webp)
 
diff --git a/docs/model/ipex_models.md b/docs/model/ipex_models.md
@@ -1,11 +1,18 @@
 # Model Powered by Ipex-LLM
 
 ## Verified Models
+Verified models can be found from EmbeddedLLM IpexLLM model collections
+* EmbeddedLLM IpexLLM Model collections: [link](https://huggingface.co/collections/EmbeddedLLM/ipex-llm-genai-66c85eadb05bb4dedd5e70ca)
+
 | Model | Model Link |
 | --- | --- |
-| Phi-3 | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
+| Phi-3-mini-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |
+| Phi-3-mini-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
+| Phi-3-medium-4k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) |
+| Phi-3-medium-128k-instruct | [link](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) |
 
 ## Supported Models by Ipex-LLM
+Unverified models, but supported by Upstream IpexLLM could be found in the following model collections.
 
 | Model | Model Link |
 | --- | --- |
@@ -64,6 +71,8 @@
 
 Resources from: https://github.com/intel-analytics/ipex-llm/
 
+## Contribution
+We welcome contributions to the verified model list.
 
 ## Qwen2 Model (Experimental)
 1. Upgrade `transformers`. `pip install --upgrade transformers~=4.42.3`.
diff --git a/docs/model/openvino_models.md b/docs/model/openvino_models.md
@@ -0,0 +1,38 @@
+# Model Powered by OpenVINO-LLM
+
+Unverified models, but supported by Upstream OpenVINO could be found in the following model collections.
+* Intel OpenVINO Model collections: [link](https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd)
+
+## Contribution
+We welcome contributions to the verified model list.
+
+## Verified Models
+Verified models can be found from EmbeddedLLM OpenVINO model collections
+* EmbeddedLLM OpenVINO Model collections: [link](https://huggingface.co/collections/EmbeddedLLM/openvino-genai-66be290e5b64185087d4b624)
+
+| Model | Model Link |
+| --- | --- |
+| Meta-Llama-3.1-8B-Instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Meta-Llama-3.1-8B-Instruct-int4-sym-ov/tree/main/) |
+| Phi-3-mini-4k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-int4-sym-ov/tree/main/) |
+| Phi-3-mini-4k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-mini-4k-instruct-int8-ov/tree/main/) |
+| Phi-3-mini-128k-instruct-int4-ov-model | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-mini-128k-instruct-int4-ov-model/tree/main/) |
+| Phi-3-mini-128k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-mini-128k-instruct-int8-ov/tree/main/) |
+| Phi-3-medium-4k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-medium-4k-instruct-int4-sym-ov/tree/main/) |
+| Phi-3-medium-4k-instruct-int8-ov | [Link](https://huggingface.co/OpenVINO/Phi-3-medium-4k-instruct-int8-ov/tree/main/) |
+| Phi-3-medium-128k-instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Phi-3-medium-128k-instruct-int4-sym-ov/tree/main/) |
+| Qwen2-7B-Instruct-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Qwen2-7B-Instruct-int4-sym-ov/tree/main/) |
+| Mistral-7B-Instruct-v0.2-int4-ov | [Link](https://huggingface.co/OpenVINO/Mistral-7B-Instruct-v0.2-int4-ov/tree/main/) |
+| Mistral-7B-Instruct-v0.2-int8-ov | [Link](https://huggingface.co/OpenVINO/Mistral-7B-Instruct-v0.2-int8-ov/tree/main/) |
+| Mistral-7B-Instruct-v0.3-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/Mistral-7B-Instruct-v0.3-int4-sym-ov/tree/main/) |
+| open_llama_3b_v2-int8-ov | [Link](https://huggingface.co/OpenVINO/open_llama_3b_v2-int8-ov/tree/main/) |
+| open_llama_7b_v2-int4-ov | [Link](https://huggingface.co/OpenVINO/open_llama_7b_v2-int4-ov/tree/main/) |
+| open_llama_7b_v2-int8-ov | [Link](https://huggingface.co/OpenVINO/open_llama_7b_v2-int8-ov/tree/main/) |
+| TinyLlama-1.1B-Chat-v1.0-int4-ov | [Link](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int4-ov/tree/main/) |
+| TinyLlama-1.1B-Chat-v1.0-int8-ov | [Link](https://huggingface.co/OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov/tree/main/) |
+| neural-chat-7b-v1-1-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/neural-chat-7b-v1-1-int4-sym-ov/tree/main/) |
+| neural-chat-7b-v1-1-int8-ov | [Link](https://huggingface.co/OpenVINO/neural-chat-7b-v1-1-int8-ov/tree/main/) |
+| starcoder2-15b-int4-ov | [Link](https://huggingface.co/OpenVINO/starcoder2-15b-int4-ov/tree/main/) |
+| starcoder2-15b-int8-ov | [Link](https://huggingface.co/OpenVINO/starcoder2-15b-int8-ov/tree/main/) |
+| persimmon-8b-chat-int4-ov | [Link](https://huggingface.co/OpenVINO/persimmon-8b-chat-int4-ov/tree/main/) |
+| persimmon-8b-chat-int8-ov | [Link](https://huggingface.co/OpenVINO/persimmon-8b-chat-int8-ov/tree/main/) |
+| RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov | [Link](https://huggingface.co/EmbeddedLLM/RedPajama-INCITE-Instruct-3B-v1-int4-sym-ov/tree/main/) |
diff --git a/src/embeddedllm/engine.py b/src/embeddedllm/engine.py
@@ -50,7 +50,7 @@ def __init__(self, model_path: str, vision: bool, device: str = "xpu", backend:
                 self.device == "gpu" or self.device == "cpu"
             ), f"To run openvino on cpu, set `backend` to `openvino` and `device` to `cpu`. EmbeddedLLMEngine load model with openvino on Intel processor."
             self.engine = OpenVinoEngine(self.model_path, self.vision, self.device)
-            logger.info(f"Initializing openvino backend (GPU): OpenVinoEngine")
+            logger.info(f"Initializing openvino backend ({self.device.upper()}): OpenVinoEngine")
         elif self.backend in ("directml", "cuda"):
             from embeddedllm.backend.onnxruntime_engine import OnnxruntimeEngine
 
diff --git a/src/embeddedllm/entrypoints/modelui.py b/src/embeddedllm/entrypoints/modelui.py