|
| 1 | + |
| 2 | +# Build Your Chatbot with Intel® Extension for Transformers neural-chat |
| 3 | + |
| 4 | +# 1 Setup Environment |
| 5 | + |
| 6 | +## 1.1 Install intel-extension-for-transformers |
| 7 | + |
| 8 | +``` |
| 9 | +conda create -n itrex-chatbot python=3.9 |
| 10 | +conda activate itrex-chatbot |
| 11 | +pip install intel-extension-for-transformers==1.3.2 |
| 12 | +``` |
| 13 | +## 1.2 Install neural-chat dependency |
| 14 | + |
| 15 | +``` |
| 16 | +pip install accelerate |
| 17 | +pip install transformers_stream_generator |
| 18 | +
|
| 19 | +git clone https://github.com/intel/intel-extension-for-transformers.git ~/itrex |
| 20 | +cd ~/itrex |
| 21 | +git checkout v1.3.2 |
| 22 | +
|
| 23 | +cd ~/itrex/intel_extension_for_transformers/neural_chat |
| 24 | +``` |
| 25 | + |
| 26 | +Setup CPU platform go to [1.2.1](#121-cpu-platform) |
| 27 | + |
| 28 | +Setup GPU platform go to [1.2.2](#122-GPU-Platform) |
| 29 | + |
| 30 | +### 1.2.1 CPU Platform |
| 31 | +`pip install -r requirements_cpu.txt` |
| 32 | + |
| 33 | +Got to [Section 2](#2-Run-the-chatbot-in-command-mode). |
| 34 | + |
| 35 | +### 1.2.2 GPU Platform |
| 36 | + |
| 37 | +#### prerequisite |
| 38 | +GPU driver and oneAPI 2024.0 is required. |
| 39 | + |
| 40 | +`pip install -r requirements_xpu.txt` |
| 41 | + |
| 42 | +# 2 Run the chatbot in command mode |
| 43 | + |
| 44 | +## Usage |
| 45 | + |
| 46 | +Go back to the quick_example folder and run the example |
| 47 | + |
| 48 | +``` |
| 49 | +source /opt/intel/oneapi/setvars.sh |
| 50 | +python chatbot.py |
| 51 | +``` |
| 52 | + |
| 53 | +``` |
| 54 | +/home/xiguiwang/anaconda3/envs/test/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? |
| 55 | + warn( |
| 56 | +2024-03-20 11:22:33,191 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available. |
| 57 | +2024-03-20 11:22:33,191 - datasets - INFO - TensorFlow version 2.16.1 available. |
| 58 | +/home/xiguiwang/anaconda3/envs/test/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations |
| 59 | + warnings.warn( |
| 60 | +Loading model Intel/neural-chat-7b-v3-1 |
| 61 | +Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.77it/s] |
| 62 | +2024-03-20 11:22:38,398 - root - INFO - Model loaded. |
| 63 | +Once upon a time, a little girl lived in a quaint village nestled among rolling hills. She had a heart filled with curiosity and dreams of adventure. One day, she decided to leave her cozy home behind and set out on a journey to explore the world beyond her familiar surroundings. |
| 64 | +
|
| 65 | +As she ventured forth, she encountered many wondrous sights and met fascinating people along the way. The little girl learned about different cultures, customs, and traditions that broadened her perspective and enriched her life. Her experiences taught her valuable lessons about kindness, courage, and resilience. |
| 66 | +
|
| 67 | +Throughout her travels, she made lifelong friends who shared her passion for discovery. Together, they faced challenges and celebrated triumphs, forming unbreakable bonds that would last a lifetime. |
| 68 | +
|
| 69 | +Eventually, the little girl returned to her village, now a wise and compassionate young woman. She brought back knowledge and memories that inspired others to dream big and follow their hearts. As she grew older, she continued to share her stories and wisdom with those around her, inspiring future generations to embrace the beauty of the unknown and never stop seeking new horizons. |
| 70 | +
|
| 71 | +``` |
| 72 | + |
| 73 | +# 3. Run chatbot in server mode with UI |
| 74 | + |
| 75 | +## 3.1 Start the service |
| 76 | + |
| 77 | +``` |
| 78 | +python chatbot_server.py |
| 79 | +``` |
| 80 | + |
| 81 | +Here is the completely output: |
| 82 | +``` |
| 83 | +(/home/xiguiwang/ws2/conda/itrex-rag) xiguiwang@icx02-tce-atsm:~/ws2/AI-examples/chatbot$ python chatbot_server.py /home/xiguiwang/ws2/conda/itrex-rag/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? |
| 84 | + warn( |
| 85 | +2024-03-18 16:28:01,454 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available. |
| 86 | +/home/xiguiwang/ws2/conda/itrex-rag/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations |
| 87 | + warnings.warn( |
| 88 | +Loading model Intel/neural-chat-7b-v3-1 |
| 89 | +Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.55it/s] |
| 90 | +2024-03-18 16:28:08,634 - root - INFO - Model loaded. |
| 91 | +Loading config settings from the environment... |
| 92 | +INFO: Started server process [1544268] |
| 93 | +INFO: Waiting for application startup. |
| 94 | +INFO: Application startup complete. |
| 95 | +INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) |
| 96 | +
|
| 97 | +``` |
| 98 | + |
| 99 | + |
| 100 | +### 3.1.1 Verify the client connection to server is OK. |
| 101 | + |
| 102 | +Open a new linux console, run following command |
| 103 | + |
| 104 | +`curl -vv -X POST http://127.0.0.1:8000/v1/chat/completions` |
| 105 | + |
| 106 | +Check the output. Make sure there is no network connection and proxy setting issue at Client side |
| 107 | + |
| 108 | +### 3.1.2 Test request command at client side |
| 109 | + |
| 110 | +Sent a request to chatbat-server from client |
| 111 | + |
| 112 | + |
| 113 | +``` |
| 114 | +curl http://127.0.0.1:8000/v1/chat/completions \ |
| 115 | + -H "Content-Type: application/json" \ |
| 116 | + -d '{ |
| 117 | + "model": "Intel/neural-chat-7b-v3-1", |
| 118 | + "messages": [ |
| 119 | + {"role": "system", "content": "You are a helpful assistant."}, |
| 120 | + {"role": "user", "content": "Tell me about Intel Xeon Scalable Processors."} |
| 121 | + ] |
| 122 | + }' |
| 123 | +``` |
| 124 | + |
| 125 | +At the server side, there is message: |
| 126 | +``` |
| 127 | +INFO: 127.0.0.1:52532 - "POST /v1/chat/completions HTTP/1.1" 200 OK |
| 128 | +``` |
| 129 | + |
| 130 | +At the client side, the response are similar message as following. |
| 131 | +The message contains the LLM answer and other information about the request. |
| 132 | +``` |
| 133 | +{"id":"chatcmpl-29GVLhfoSJHeHTgqL4HgxP","object":"chat.completion","created":1710750809,"model":"Intel/neural-chat-7b-v3-1","choices":[{"index":0,"message":{"role":"assistant","content":"Intel Xeon Scalable Processors are a series of high-performance central processing units (CPUs) designed for data centers, cloud computing, and other demanding computing environments. They are part of Intel's Xeon family of processors, which are specifically tailored for server and workstation applications.\n\nThe Xeon Scalable Processors were introduced in 2017 and are based on Intel's Skylake microarchitecture. They offer significant improvements in performance, efficiency, and scalability compared to their predecessors. These processors are available in various configurations, including single-socket, dual-socket, and multi-socket systems, catering to different workloads and requirements.\n\nSome key features of Intel Xeon Scalable Processors include:\n\n1. Scalable performance: The processors can be configured to meet specific workload needs, allowing for better resource utilization and improved performance.\n\n2. High-speed memory support: They support up to 6 channels of DDR4 memory, enabling faster data transfer and improved system performance.\n\n3. Advanced security features: The processors come with built-in security features, such as Intel Software Guard Extensions (SGX), which help protect sensitive data and applications from potential threats.\n\n4. Enhanced virtualization capabilities: The Xeon Scalable Processors are designed to support multiple virtual machines, making them suitable for virtualized environments.\n\n5. Improved energy efficiency: The processors are designed to optimize power consumption, reducing operational costs and minimizing environmental impact.\n\nOverall, Intel Xeon Scalable Processors are a powerful and versatile choice for organizations seeking high-performance computing solutions in data centers, cloud environments, and other demanding applications."},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} |
| 134 | +``` |
| 135 | + |
| 136 | +## 3.2 Set up Server mode UI |
| 137 | + |
| 138 | +Create UI conda envitonment |
| 139 | +``` |
| 140 | +conda create -n chatbot-ui python=3.9 |
| 141 | +conda activate chatbot-ui |
| 142 | +
|
| 143 | +cd ~/itrex/intel_extension_for_transformers/neural_chat/ui/gradio/basic |
| 144 | +pip install -r requirements.txt |
| 145 | +
|
| 146 | +pip install gradio==3.36.0 |
| 147 | +pip install pydantic==1.10.13 |
| 148 | +``` |
| 149 | + |
| 150 | +## 3.3 Start the web service |
| 151 | + |
| 152 | +Set the default service port |
| 153 | +Edit app.py line 745, set the server port. For example we set port as 8008. |
| 154 | + |
| 155 | +``` |
| 156 | + demo.queue( |
| 157 | + concurrency_count=concurrency_count, status_update_rate=10, api_open=False |
| 158 | + ).launch( |
| 159 | + server_name=host, server_port=8008, share=share, max_threads=200 |
| 160 | + ) |
| 161 | +``` |
| 162 | + |
| 163 | +Start the service: |
| 164 | +`python app.py` |
| 165 | + |
| 166 | +The output is as following: |
| 167 | +``` |
| 168 | +/home/xiguiwang/ws2/conda/chatbot-ui/lib/python3.9/site-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix' |
| 169 | + warnings.warn(f"Could not get documentation group for {cls}: {exc}") |
| 170 | +/home/xiguiwang/ws2/conda/chatbot-ui/lib/python3.9/site-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix' |
| 171 | + warnings.warn(f"Could not get documentation group for {cls}: {exc}") |
| 172 | +2024-03-27 11:00:24 | INFO | gradio_web_server | Models: ['Intel/neural-chat-7b-v3-1'] |
| 173 | +2024-03-27 11:00:26 | ERROR | stderr | sys:1: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead. |
| 174 | +2024-03-27 11:00:26 | INFO | stdout | Running on local URL: http://0.0.0.0:8008 |
| 175 | +2024-03-27 11:00:26 | INFO | stdout | |
| 176 | +2024-03-27 11:00:26 | INFO | stdout | To create a public link, set `share=True` in `launch()`. |
| 177 | +``` |
| 178 | + |
| 179 | +The log shows the service is started on prot 8008. |
| 180 | +You can access chatbot through web browser on port 8008 now. |
0 commit comments