11# EmbeddedLLM
22
3- Run local LLMs on iGPU and APU (AMD , Intel, and Qualcomm (Coming Soon))
3+ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon))
44
5- | Support matrix| Supported now| Under Development | On the roadmap|
6- | --------------| --------------| -------------------| ---------------|
7- | Model architectures| Gemma <br /> Llama * <br /> Mistral + <br />Phi <br />| |
8- | Platform| Linux <br /> Windows | ||| |
9- | Architecture| x86 <br /> x64 <br /> | Arm64 || |
10- |Hardware Acceleration| CUDA<br />DirectML<br />| QNN <br /> ROCm | OpenVINO
5+ | Support matrix | Supported now | Under Development | On the roadmap |
6+ | --------------------- | --------------------------------------------------- | ----------------- | -------------- | --- | --- |
7+ | Model architectures | Gemma <br /> Llama \ * <br /> Mistral + <br />Phi <br /> | |
8+ | Platform | Linux <br /> Windows | | | | |
9+ | Architecture | x86 <br /> x64 <br /> | Arm64 | | |
10+ | Hardware Acceleration | CUDA<br />DirectML<br /> | QNN <br /> ROCm | OpenVINO |
1111
1212\* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.
1313
1414\+ The Mistral model architecture supports similar model families such as Zephyr.
1515
16-
17-
1816## 🚀 Latest News
17+
1918- [ 2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
2019- [ 2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.
2120
22-
2321## Supported Models (Quick Start)
24- | Models | Parameters | Context Length | Link |
25- | ---------------------| ------------| ----------------| ------|
26- | Gemma-2b-Instruct v1 | 2B | 8192 | [ EmbeddedLLM/gemma-2b-it-onnx] ( https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx ) |
27- | Llama-2-7b-chat | 7B | 4096 | [ EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml ) |
28- | Llama-2-13b-chat | 13B | 4096 | [ EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml ) |
29- | Llama-3-8b-chat | 8B | 8192 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
30- | Mistral-7b-v0.3-instruct| 7B | 32768 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
31- | Phi3-mini-4k-instruct | 3.8B | 4096 | [ microsoft/Phi-3-mini-4k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx ) |
32- | Phi3-mini-128k-instruct | 3.8B | 128k | [ microsoft/Phi-3-mini-128k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx ) |
33- | Phi3-medium-4k-instruct | 17B | 4096 | [ microsoft/Phi-3-medium-4k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml ) |
34- | Phi3-medium-128k-instruct | 17B | 128k | [ microsoft/Phi-3-medium-128k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml ) |
22+
23+ | Models | Parameters | Context Length | Link |
24+ | --- | --- | --- | --- |
25+ | Gemma-2b-Instruct v1 | 2B | 8192 | [ EmbeddedLLM/gemma-2b-it-onnx] ( https://huggingface.co/EmbeddedLLM/gemma-2b-it-onnx ) |
26+ | Llama-2-7b-chat | 7B | 4096 | [ EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml ) |
27+ | Llama-2-13b-chat | 13B | 4096 | [ EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml] ( https://huggingface.co/EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml ) |
28+ | Llama-3-8b-chat | 8B | 8192 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
29+ | Mistral-7b-v0.3-instruct | 7B | 32768 | [ EmbeddedLLM/mistral-7b-instruct-v0.3-onnx] ( https://huggingface.co/EmbeddedLLM/mistral-7b-instruct-v0.3-onnx ) |
30+ | Phi3-mini-4k-instruct | 3.8B | 4096 | [ microsoft/Phi-3-mini-4k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx ) |
31+ | Phi3-mini-128k-instruct | 3.8B | 128k | [ microsoft/Phi-3-mini-128k-instruct-onnx] ( https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx ) |
32+ | Phi3-medium-4k-instruct | 17B | 4096 | [ microsoft/Phi-3-medium-4k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-4k-instruct-onnx-directml ) |
33+ | Phi3-medium-128k-instruct | 17B | 128k | [ microsoft/Phi-3-medium-128k-instruct-onnx-directml] ( https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-directml ) |
3534
3635## Getting Started
3736
3837### Installation
3938
4039#### From Source
40+
4141** Windows**
42+
42431 . Install embeddedllm package. ` $env:ELLM_TARGET_DEVICE='directml'; pip install -e . ` . Note: currently support ` cpu ` , ` directml ` and ` cuda ` .
4344 - ** DirectML:** ` $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml] `
4445 - ** CPU:** ` $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu] `
4546 - ** CUDA:** ` $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda] `
47+ - ** With Web UI** :
48+ - ** DirectML:** ` $env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml, webui] `
49+ - ** CPU:** ` $env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu, webui] `
50+ - ** CUDA:** ` $env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda, webui] `
4651
4752** Linux**
53+
48541 . Install embeddedllm package. ` ELLM_TARGET_DEVICE='directml' pip install -e . ` . Note: currently support ` cpu ` , ` directml ` and ` cuda ` .
4955 - ** DirectML:** ` ELLM_TARGET_DEVICE='directml' pip install -e .[directml] `
5056 - ** CPU:** ` ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu] `
5157 - ** CUDA:** ` ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda] `
52-
58+ - ** With Web UI** :
59+ - ** DirectML:** ` ELLM_TARGET_DEVICE='directml' pip install -e .[directml, webui] `
60+ - ** CPU:** ` ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu, webui] `
61+ - ** CUDA:** ` ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda, webui] `
5362
5463### Launch OpenAI API Compatible Server
64+
5565```
5666usage: ellm_server.exe [-h] [--port int] [--host str] [--response_role str] [--uvicorn_log_level str]
5767 [--served_model_name str] [--model_path str] [--vision bool]
@@ -72,7 +82,10 @@ options:
72821 . ` ellm_server --model_path <path/to/model/weight> ` .
73832 . Example code to connect to the api server can be found in ` scripts/python ` .
7484
85+ ## Launch Chatbot Web UI
7586
87+ 1 . ` ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost ` .
7688
7789## Acknowledgements
78- * Excellent open-source projects: [ vLLM] ( https://github.com/vllm-project/vllm.git ) , [ onnxruntime-genai] ( https://github.com/microsoft/onnxruntime-genai.git ) and many others.
90+
91+ - Excellent open-source projects: [ vLLM] ( https://github.com/vllm-project/vllm.git ) , [ onnxruntime-genai] ( https://github.com/microsoft/onnxruntime-genai.git ) and many others.
0 commit comments