|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Chatbot Example with Self Query Retriever\n", |
| 8 | + "[](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)\n", |
| 9 | + "\n", |
| 10 | + "This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert a question into a structured query and apply structured query to Elasticsearch index. \n", |
| 11 | + "\n", |
| 12 | + "Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.\n", |
| 13 | + "\n", |
| 14 | + "\n", |
| 15 | + "We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.\n" |
| 16 | + ] |
| 17 | + }, |
| 18 | + { |
| 19 | + "cell_type": "markdown", |
| 20 | + "metadata": {}, |
| 21 | + "source": [ |
| 22 | + "## Install packages and import modules\n" |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "code", |
| 27 | + "execution_count": 30, |
| 28 | + "metadata": {}, |
| 29 | + "outputs": [ |
| 30 | + { |
| 31 | + "name": "stdout", |
| 32 | + "output_type": "stream", |
| 33 | + "text": [ |
| 34 | + "\n", |
| 35 | + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", |
| 36 | + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" |
| 37 | + ] |
| 38 | + } |
| 39 | + ], |
| 40 | + "source": [ |
| 41 | + "!python3 -m pip install -qU lark elasticsearch langchain openai\n", |
| 42 | + "\n", |
| 43 | + "from langchain.schema import Document\n", |
| 44 | + "from langchain.embeddings.openai import OpenAIEmbeddings\n", |
| 45 | + "from langchain.vectorstores import ElasticsearchStore\n", |
| 46 | + "from langchain.llms import OpenAI\n", |
| 47 | + "from langchain.retrievers.self_query.base import SelfQueryRetriever\n", |
| 48 | + "from langchain.chains.query_constructor.base import AttributeInfo\n", |
| 49 | + "from getpass import getpass" |
| 50 | + ] |
| 51 | + }, |
| 52 | + { |
| 53 | + "cell_type": "markdown", |
| 54 | + "metadata": {}, |
| 55 | + "source": [ |
| 56 | + "## Create documents \n", |
| 57 | + "Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` .\n", |
| 58 | + "\n" |
| 59 | + ] |
| 60 | + }, |
| 61 | + { |
| 62 | + "cell_type": "code", |
| 63 | + "execution_count": 67, |
| 64 | + "metadata": {}, |
| 65 | + "outputs": [], |
| 66 | + "source": [ |
| 67 | + "docs = [\n", |
| 68 | + " Document(\n", |
| 69 | + " page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", |
| 70 | + " metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\", \"director\": \"Steven Spielberg\", \"title\": \"Jurassic Park\"},\n", |
| 71 | + " ),\n", |
| 72 | + " Document(\n", |
| 73 | + " page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", |
| 74 | + " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2, \"title\": \"Inception\"},\n", |
| 75 | + " ),\n", |
| 76 | + " Document(\n", |
| 77 | + " page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", |
| 78 | + " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6, \"title\": \"Paprika\"},\n", |
| 79 | + " ),\n", |
| 80 | + " Document(\n", |
| 81 | + " page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", |
| 82 | + " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3, \"title\": \"Little Women\"},\n", |
| 83 | + " ),\n", |
| 84 | + " Document(\n", |
| 85 | + " page_content=\"Toys come alive and have a blast doing so\",\n", |
| 86 | + " metadata={\"year\": 1995, \"genre\": \"animated\", \"director\": \"John Lasseter\", \"rating\": 8.3, \"title\": \"Toy Story\"},\n", |
| 87 | + " ),\n", |
| 88 | + " Document(\n", |
| 89 | + " page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n", |
| 90 | + " metadata={\n", |
| 91 | + " \"year\": 1979,\n", |
| 92 | + " \"rating\": 9.9,\n", |
| 93 | + " \"director\": \"Andrei Tarkovsky\",\n", |
| 94 | + " \"genre\": \"science fiction\",\n", |
| 95 | + " \"rating\": 9.9,\n", |
| 96 | + " \"title\": \"Stalker\",\n", |
| 97 | + " },\n", |
| 98 | + " ),\n", |
| 99 | + "]" |
| 100 | + ] |
| 101 | + }, |
| 102 | + { |
| 103 | + "cell_type": "markdown", |
| 104 | + "metadata": {}, |
| 105 | + "source": [ |
| 106 | + "## Connect to Elasticsearch\n", |
| 107 | + "\n", |
| 108 | + "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n", |
| 109 | + "\n", |
| 110 | + "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n", |
| 111 | + "\n", |
| 112 | + "\n", |
| 113 | + "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step." |
| 114 | + ] |
| 115 | + }, |
| 116 | + { |
| 117 | + "cell_type": "code", |
| 118 | + "execution_count": 68, |
| 119 | + "metadata": {}, |
| 120 | + "outputs": [], |
| 121 | + "source": [ |
| 122 | + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n", |
| 123 | + "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n", |
| 124 | + "\n", |
| 125 | + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n", |
| 126 | + "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n", |
| 127 | + "\n", |
| 128 | + "# https://platform.openai.com/api-keys\n", |
| 129 | + "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n", |
| 130 | + "\n", |
| 131 | + "embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n", |
| 132 | + "\n", |
| 133 | + "\n", |
| 134 | + "vectorstore = ElasticsearchStore.from_documents(\n", |
| 135 | + " docs, \n", |
| 136 | + " embeddings, \n", |
| 137 | + " index_name=\"elasticsearch-self-query-demo\", \n", |
| 138 | + " es_cloud_id=ELASTIC_CLOUD_ID, \n", |
| 139 | + " es_api_key=ELASTIC_API_KEY\n", |
| 140 | + ")\n" |
| 141 | + ] |
| 142 | + }, |
| 143 | + { |
| 144 | + "cell_type": "markdown", |
| 145 | + "metadata": {}, |
| 146 | + "source": [ |
| 147 | + "## Setup query retriever\n", |
| 148 | + "\n", |
| 149 | + "Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. \n", |
| 150 | + "\n", |
| 151 | + "We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)" |
| 152 | + ] |
| 153 | + }, |
| 154 | + { |
| 155 | + "cell_type": "code", |
| 156 | + "execution_count": 80, |
| 157 | + "metadata": {}, |
| 158 | + "outputs": [], |
| 159 | + "source": [ |
| 160 | + "# Add details about metadata fields\n", |
| 161 | + "metadata_field_info = [\n", |
| 162 | + " AttributeInfo(\n", |
| 163 | + " name=\"genre\",\n", |
| 164 | + " description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n", |
| 165 | + " type=\"string or list[string]\",\n", |
| 166 | + " ),\n", |
| 167 | + " AttributeInfo(\n", |
| 168 | + " name=\"year\",\n", |
| 169 | + " description=\"The year the movie was released\",\n", |
| 170 | + " type=\"integer\",\n", |
| 171 | + " ),\n", |
| 172 | + " AttributeInfo(\n", |
| 173 | + " name=\"director\",\n", |
| 174 | + " description=\"The name of the movie director\",\n", |
| 175 | + " type=\"string\",\n", |
| 176 | + " ),\n", |
| 177 | + " AttributeInfo(\n", |
| 178 | + " name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n", |
| 179 | + " ),\n", |
| 180 | + "]\n", |
| 181 | + "\n", |
| 182 | + "document_content_description = \"Brief summary of a movie\"\n", |
| 183 | + "\n", |
| 184 | + "# Set up openAI llm with sampling temperature 0\n", |
| 185 | + "llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n", |
| 186 | + "\n", |
| 187 | + "# instantiate retriever\n", |
| 188 | + "retriever = SelfQueryRetriever.from_llm(\n", |
| 189 | + " llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n", |
| 190 | + ")\n" |
| 191 | + ] |
| 192 | + }, |
| 193 | + { |
| 194 | + "cell_type": "markdown", |
| 195 | + "metadata": {}, |
| 196 | + "source": [ |
| 197 | + "# Question Answering with Self-Query Retriever\n", |
| 198 | + "\n", |
| 199 | + "We will now demonstrate how to use self-query retriever for RAG." |
| 200 | + ] |
| 201 | + }, |
| 202 | + { |
| 203 | + "cell_type": "code", |
| 204 | + "execution_count": 77, |
| 205 | + "metadata": {}, |
| 206 | + "outputs": [ |
| 207 | + { |
| 208 | + "data": { |
| 209 | + "text/plain": [ |
| 210 | + "AIMessage(content='Inception (2010)')" |
| 211 | + ] |
| 212 | + }, |
| 213 | + "execution_count": 77, |
| 214 | + "metadata": {}, |
| 215 | + "output_type": "execute_result" |
| 216 | + } |
| 217 | + ], |
| 218 | + "source": [ |
| 219 | + "from langchain.chat_models import ChatOpenAI\n", |
| 220 | + "from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n", |
| 221 | + "from langchain.prompts import ChatPromptTemplate, PromptTemplate\n", |
| 222 | + "from langchain.schema import format_document\n", |
| 223 | + "\n", |
| 224 | + "LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(\"\"\"\n", |
| 225 | + "Use the following context movies that matched the user question. Use the movies below only to answer the user's question.\n", |
| 226 | + "\n", |
| 227 | + "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n", |
| 228 | + "\n", |
| 229 | + "----\n", |
| 230 | + "{context}\n", |
| 231 | + "----\n", |
| 232 | + "Question: {question}\n", |
| 233 | + "Answer:\n", |
| 234 | + "\"\"\")\n", |
| 235 | + "\n", |
| 236 | + "DOCUMENT_PROMPT = PromptTemplate.from_template(\"\"\"\n", |
| 237 | + "---\n", |
| 238 | + "title: {title} \n", |
| 239 | + "year: {year} \n", |
| 240 | + "director: {director} \n", |
| 241 | + "---\n", |
| 242 | + "\"\"\")\n", |
| 243 | + "\n", |
| 244 | + "def _combine_documents(\n", |
| 245 | + " docs, document_prompt=DOCUMENT_PROMPT, document_separator=\"\\n\\n\"\n", |
| 246 | + "):\n", |
| 247 | + " doc_strings = [format_document(doc, document_prompt) for doc in docs]\n", |
| 248 | + " return document_separator.join(doc_strings)\n", |
| 249 | + "\n", |
| 250 | + "\n", |
| 251 | + "_context = RunnableParallel(\n", |
| 252 | + " context=retriever | _combine_documents,\n", |
| 253 | + " question=RunnablePassthrough(),\n", |
| 254 | + ")\n", |
| 255 | + "\n", |
| 256 | + "chain = (_context | LLM_CONTEXT_PROMPT | llm)\n", |
| 257 | + "\n", |
| 258 | + "chain.invoke(\"What movies are about dreams and was released after the year 1992 but before 2007?\")" |
| 259 | + ] |
| 260 | + } |
| 261 | + ], |
| 262 | + "metadata": { |
| 263 | + "kernelspec": { |
| 264 | + "display_name": "Python 3.11.4 64-bit", |
| 265 | + "language": "python", |
| 266 | + "name": "python3" |
| 267 | + }, |
| 268 | + "language_info": { |
| 269 | + "codemirror_mode": { |
| 270 | + "name": "ipython", |
| 271 | + "version": 3 |
| 272 | + }, |
| 273 | + "file_extension": ".py", |
| 274 | + "mimetype": "text/x-python", |
| 275 | + "name": "python", |
| 276 | + "nbconvert_exporter": "python", |
| 277 | + "pygments_lexer": "ipython3", |
| 278 | + "version": "3.10.3" |
| 279 | + }, |
| 280 | + "orig_nbformat": 4, |
| 281 | + "vscode": { |
| 282 | + "interpreter": { |
| 283 | + "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" |
| 284 | + } |
| 285 | + } |
| 286 | + }, |
| 287 | + "nbformat": 4, |
| 288 | + "nbformat_minor": 2 |
| 289 | +} |
0 commit comments