Skip to content

Commit 25fcb76

Browse files
authored
Autoload + onnx model and ipexllm in the model ui (#25)
# ADDED and CHANGES - #17 - #24 ------------------- Contributors: - @szeyu szeyu.sim@embeddedllm.com
1 parent 501ab10 commit 25fcb76

File tree

9 files changed

+260
-107
lines changed

9 files changed

+260
-107
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
2121
## Table Content
2222

2323
- [Supported Models](#supported-models-quick-start)
24-
- [Onnxruntime Models](./docs/model/onnxruntime_models.md)
24+
- [Onnxruntime DirectML Models](./docs/model/onnxruntime_directml_models.md)
25+
- [Onnxruntime CPU Models](./docs/model/onnxruntime_cpu_models.md)
2526
- [Ipex-LLM Models](./docs/model/ipex_models.md)
2627
- [Getting Started](#getting-started)
2728
- [Installation From Source](#installation)
@@ -39,7 +40,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
3940
| Gemma-2b-Instruct v1 | 2B | 8192 | [EmbeddedLLM/gemma-2b-it-onnx](https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx) |
4041
| Llama-2-7b-chat | 7B | 4096 | [EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml) |
4142
| Llama-2-13b-chat | 13B | 4096 | [EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml) |
42-
| Llama-3-8b-chat | 8B | 8192 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
43+
| Llama-3-8b-chat | 8B | 8192 | [luweigen/Llama-3-8B-Instruct-int4-onnx-directml](https://huggingface.co/luweigen/Llama-3-8B-Instruct-int4-onnx-directml) |
4344
| Mistral-7b-v0.3-instruct | 7B | 32768 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx) |
4445
| Phi-3-mini-4k-instruct-062024 | 3.8B | 4096 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx/tree/main/onnx/directml/Phi-3-mini-4k-instruct-062024-int4) |
4546
| Phi3-mini-4k-instruct | 3.8B | 4096 | [microsoft/Phi-3-mini-4k-instruct-onnx](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) |
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Model Powered by Onnxruntime CPU GenAI
2+
3+
## Supported Models
4+
5+
| Model Name | Parameters | Context Length | Size (GB) | Link |
6+
|-------------------------------------------------------|------------|----------------|-----------|---------------------------------------------------------------------------------------------------------------------|
7+
| Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32 | 3.8B | 4096 | 2.538 | [EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32/tree/main) |
8+
| Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4 | 3.8B | 4096 | 2.538 | [EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4/tree/main) |
9+
| Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32 | 3.8B | 4096 | 2.585 | [EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32](https://huggingface.co/EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32/tree/main) |
10+
| Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4 | 3.8B | 4096 | 2.585 | [EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4](https://huggingface.co/EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32-acc-level-4/tree/main) |
11+
| mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32 | 7B | 32768 | 4.66 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32/tree/main) |
12+
| mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32-acc-level-4 | 7B | 32768 | 4.66 | [EmbeddedLLM/mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32-acc-level-4](https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx-cpu-int4-rtn-block-32-acc-level-4/tree/main) |
13+
| openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32 | 8B | 8192 | 6.339 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32/tree/main) |
14+
| openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32-acc-level-4 | 8B | 8192 | 6.339 | [EmbeddedLLM/openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32-acc-level-4](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-onnx-cpu-int4-rtn-block-32-acc-level-4/tree/main) |
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Model Powered by Onnxruntime DirectML GenAI
2+
3+
## Supported Models
4+
5+
| Model Name | Parameters | Context Length | Size (GB) | Link |
6+
|--------------------------------------------|------------|----------------|-----------|---------------------------------------------------------------------------------------------------------------------|
7+
| Phi-3-mini-4k-instruct-onnx-directml | 3.8B | 4096 | 1.989 | [EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml) |
8+
| Phi-3-mini-128k-instruct-onnx-directml | 3.8B | 131072 | 2.018 | [EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-directml](https://huggingface.co/EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-directml) |
9+
| Phi-3-medium-4k-instruct-onnx-directml | 17B | 4096 | 6.987 | [EmbeddedLLM/Phi-3-medium-4k-instruct-onnx-directml](https://huggingface.co/EmbeddedLLM/Phi-3-medium-4k-instruct-onnx-directml) |
10+
| Phi-3-medium-128k-instruct-onnx-directml | 17B | 131072 | 7.025 | [EmbeddedLLM/Phi-3-medium-128k-instruct-onnx-directml](https://huggingface.co/EmbeddedLLM/Phi-3-medium-128k-instruct-onnx-directml) |
11+
| Phi-3-mini-4k-instruct-062024-int4-onnx-directml | 3.8B | 4096 | 2.137 | [EmbeddedLLM/Phi-3-mini-4k-instruct-062024-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/Phi-3-mini-4k-instruct-062024-int4-onnx-directml) |
12+
| mistralai_Mistral-7B-Instruct-v0.3-int4-onnx-directml | 7B | 32768 | 3.988 | [EmbeddedLLM/mistralai_Mistral-7B-Instruct-v0.3-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/mistralai_Mistral-7B-Instruct-v0.3-int4-onnx-directml) |
13+
| gemma-2b-it-int4-onnx-directml | 2B | 8192 | 2.314 | [EmbeddedLLM/gemma-2b-it-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/gemma-2b-it-int4-onnx-directml) |
14+
| gemma-7b-it-int4-onnx-directml | 7B | 8192 | 5.958 | [EmbeddedLLM/gemma-7b-it-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/gemma-7b-it-int4-onnx-directml) |
15+
| llama-2-7b-chat-int4-onnx-directml | 7B | 4096 | 3.708 | [EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml) |
16+
| Starling-LM-7b-beta-int4-onnx-directml | 7B | 8192 | 3.974 | [EmbeddedLLM/Starling-LM-7b-beta-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/Starling-LM-7b-beta-int4-onnx-directml) |
17+
| openchat-3.6-8b-20240522-int4-onnx-directml | 8B | 8192 | 4.922 | [EmbeddedLLM/openchat-3.6-8b-20240522-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/openchat-3.6-8b-20240522-int4-onnx-directml) |
18+
| Yi-1.5-6B-Chat-int4-onnx-directml | 6B | 32768 | 3.532 | [EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-int4-onnx-directml](https://huggingface.co/EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-int4-onnx-directml) |
19+

docs/model/onnxruntime_models.md

Lines changed: 0 additions & 19 deletions
This file was deleted.

src/embeddedllm/backend/onnxruntime_engine.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# from embeddedllm.transformers_utils.image_processing_phi3v import Phi3VImageProcessor
22
import contextlib
33
import time
4+
import os
45
from pathlib import Path
56
from tempfile import TemporaryDirectory
67
from typing import AsyncIterator, List, Optional
8+
from huggingface_hub import snapshot_download
79

810
import onnxruntime_genai as og
911
from loguru import logger
@@ -39,6 +41,15 @@ def onnx_generator_context(model, params):
3941
class OnnxruntimeEngine(BaseLLMEngine):
4042
def __init__(self, model_path: str, vision: bool, device: str = "cpu"):
4143
self.model_path = model_path
44+
45+
if not os.path.exists(model_path):
46+
snapshot_path = snapshot_download(
47+
repo_id=model_path,
48+
allow_patterns=None,
49+
repo_type="model",
50+
)
51+
model_path = snapshot_path
52+
4253
self.model_config = AutoConfig.from_pretrained(self.model_path, trust_remote_code=True)
4354
self.device = device
4455

src/embeddedllm/engine.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ def __init__(self, model_path: str, vision: bool, device: str = "xpu", backend:
8080

8181
else:
8282
raise ValueError(
83-
f"EmbeddedLLMEngine only supports `cpu`, `ipex`, `cuda` and `directml`."
83+
f"EmbeddedLLMEngine only supports `cpu`, `ipex`, `cuda`, `openvino` and `directml`."
8484
)
8585
self.tokenizer = self.engine.tokenizer
8686

src/embeddedllm/entrypoints/api_server.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ class Config(BaseSettings):
2828
)
2929
port: int = Field(default=6979, description="Server port.")
3030
host: str = Field(default="0.0.0.0", description="Server host.")
31-
device: str = Field(default="cpu", description="Device type: `cpu`, `xpu`")
31+
device: str = Field(default="cpu", description="Device type: `cpu`, `xpu`, `gpu`")
3232
backend: str = Field(
33-
default="directml", description="Backend engine: `cpu`, `ipex` and `directml`"
33+
default="directml", description="Backend engine: `cpu`, `ipex`, `openvino` and `directml`"
3434
)
3535
response_role: str = Field(default="assistant", description="Server response role.")
3636
uvicorn_log_level: str = Field(

0 commit comments

Comments
 (0)