Skip to content

Commit 0843a2c

Browse files
authored
RAG: Self Query Retriever Notebook Example (#121)
* self query retriever * updated self-query retriever * move self-query retrievers examples to folder
1 parent 3b023f0 commit 0843a2c

File tree

5 files changed

+740
-47
lines changed

5 files changed

+740
-47
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@ The [`notebooks`](notebooks/README.md) folder contains a range of executable Pyt
3030
### LangChain
3131

3232
- [`question-answering.ipynb`](./notebooks/generative-ai/question-answering.ipynb)
33-
- [`langchain-self-query-retriever.ipynb`](./notebooks/langchain/langchain-self-query-retriever.ipynb)
33+
- [`langchain-self-query-retriever.ipynb`](./notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb)
34+
- [`Question Answering with Self Query Retriever`](./notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)
35+
- [`BM25 and Self-querying retriever with elasticsearch and LangChain`](./notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb)
3436
- [`langchain-vector-store.ipynb`](./notebooks/langchain/langchain-vector-store.ipynb)
3537
- [`langchain-vector-store-using-elser.ipynb`](./notebooks/langchain/langchain-vector-store-using-elser.ipynb)
3638
- [`langchain-using-own-model.ipynb`](./notebooks/langchain/langchain-using-own-model.ipynb)

notebooks/langchain/README.md

Lines changed: 0 additions & 26 deletions
This file was deleted.
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Chatbot Example with Self Query Retriever\n",
8+
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)\n",
9+
"\n",
10+
"This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert a question into a structured query and apply structured query to Elasticsearch index. \n",
11+
"\n",
12+
"Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.\n",
13+
"\n",
14+
"\n",
15+
"We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.\n"
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"metadata": {},
21+
"source": [
22+
"## Install packages and import modules\n"
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": 30,
28+
"metadata": {},
29+
"outputs": [
30+
{
31+
"name": "stdout",
32+
"output_type": "stream",
33+
"text": [
34+
"\n",
35+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
36+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
37+
]
38+
}
39+
],
40+
"source": [
41+
"!python3 -m pip install -qU lark elasticsearch langchain openai\n",
42+
"\n",
43+
"from langchain.schema import Document\n",
44+
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
45+
"from langchain.vectorstores import ElasticsearchStore\n",
46+
"from langchain.llms import OpenAI\n",
47+
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
48+
"from langchain.chains.query_constructor.base import AttributeInfo\n",
49+
"from getpass import getpass"
50+
]
51+
},
52+
{
53+
"cell_type": "markdown",
54+
"metadata": {},
55+
"source": [
56+
"## Create documents \n",
57+
"Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` .\n",
58+
"\n"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": 67,
64+
"metadata": {},
65+
"outputs": [],
66+
"source": [
67+
"docs = [\n",
68+
" Document(\n",
69+
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
70+
" metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\", \"director\": \"Steven Spielberg\", \"title\": \"Jurassic Park\"},\n",
71+
" ),\n",
72+
" Document(\n",
73+
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
74+
" metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2, \"title\": \"Inception\"},\n",
75+
" ),\n",
76+
" Document(\n",
77+
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
78+
" metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6, \"title\": \"Paprika\"},\n",
79+
" ),\n",
80+
" Document(\n",
81+
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
82+
" metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3, \"title\": \"Little Women\"},\n",
83+
" ),\n",
84+
" Document(\n",
85+
" page_content=\"Toys come alive and have a blast doing so\",\n",
86+
" metadata={\"year\": 1995, \"genre\": \"animated\", \"director\": \"John Lasseter\", \"rating\": 8.3, \"title\": \"Toy Story\"},\n",
87+
" ),\n",
88+
" Document(\n",
89+
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
90+
" metadata={\n",
91+
" \"year\": 1979,\n",
92+
" \"rating\": 9.9,\n",
93+
" \"director\": \"Andrei Tarkovsky\",\n",
94+
" \"genre\": \"science fiction\",\n",
95+
" \"rating\": 9.9,\n",
96+
" \"title\": \"Stalker\",\n",
97+
" },\n",
98+
" ),\n",
99+
"]"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"metadata": {},
105+
"source": [
106+
"## Connect to Elasticsearch\n",
107+
"\n",
108+
"ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n",
109+
"\n",
110+
"We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
111+
"\n",
112+
"\n",
113+
"We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step."
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": 68,
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
123+
"ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
124+
"\n",
125+
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
126+
"ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
127+
"\n",
128+
"# https://platform.openai.com/api-keys\n",
129+
"OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n",
130+
"\n",
131+
"embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n",
132+
"\n",
133+
"\n",
134+
"vectorstore = ElasticsearchStore.from_documents(\n",
135+
" docs, \n",
136+
" embeddings, \n",
137+
" index_name=\"elasticsearch-self-query-demo\", \n",
138+
" es_cloud_id=ELASTIC_CLOUD_ID, \n",
139+
" es_api_key=ELASTIC_API_KEY\n",
140+
")\n"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"## Setup query retriever\n",
148+
"\n",
149+
"Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. \n",
150+
"\n",
151+
"We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)"
152+
]
153+
},
154+
{
155+
"cell_type": "code",
156+
"execution_count": 80,
157+
"metadata": {},
158+
"outputs": [],
159+
"source": [
160+
"# Add details about metadata fields\n",
161+
"metadata_field_info = [\n",
162+
" AttributeInfo(\n",
163+
" name=\"genre\",\n",
164+
" description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n",
165+
" type=\"string or list[string]\",\n",
166+
" ),\n",
167+
" AttributeInfo(\n",
168+
" name=\"year\",\n",
169+
" description=\"The year the movie was released\",\n",
170+
" type=\"integer\",\n",
171+
" ),\n",
172+
" AttributeInfo(\n",
173+
" name=\"director\",\n",
174+
" description=\"The name of the movie director\",\n",
175+
" type=\"string\",\n",
176+
" ),\n",
177+
" AttributeInfo(\n",
178+
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
179+
" ),\n",
180+
"]\n",
181+
"\n",
182+
"document_content_description = \"Brief summary of a movie\"\n",
183+
"\n",
184+
"# Set up openAI llm with sampling temperature 0\n",
185+
"llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n",
186+
"\n",
187+
"# instantiate retriever\n",
188+
"retriever = SelfQueryRetriever.from_llm(\n",
189+
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
190+
")\n"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"# Question Answering with Self-Query Retriever\n",
198+
"\n",
199+
"We will now demonstrate how to use self-query retriever for RAG."
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": 77,
205+
"metadata": {},
206+
"outputs": [
207+
{
208+
"data": {
209+
"text/plain": [
210+
"AIMessage(content='Inception (2010)')"
211+
]
212+
},
213+
"execution_count": 77,
214+
"metadata": {},
215+
"output_type": "execute_result"
216+
}
217+
],
218+
"source": [
219+
"from langchain.chat_models import ChatOpenAI\n",
220+
"from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n",
221+
"from langchain.prompts import ChatPromptTemplate, PromptTemplate\n",
222+
"from langchain.schema import format_document\n",
223+
"\n",
224+
"LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(\"\"\"\n",
225+
"Use the following context movies that matched the user question. Use the movies below only to answer the user's question.\n",
226+
"\n",
227+
"If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
228+
"\n",
229+
"----\n",
230+
"{context}\n",
231+
"----\n",
232+
"Question: {question}\n",
233+
"Answer:\n",
234+
"\"\"\")\n",
235+
"\n",
236+
"DOCUMENT_PROMPT = PromptTemplate.from_template(\"\"\"\n",
237+
"---\n",
238+
"title: {title} \n",
239+
"year: {year} \n",
240+
"director: {director} \n",
241+
"---\n",
242+
"\"\"\")\n",
243+
"\n",
244+
"def _combine_documents(\n",
245+
" docs, document_prompt=DOCUMENT_PROMPT, document_separator=\"\\n\\n\"\n",
246+
"):\n",
247+
" doc_strings = [format_document(doc, document_prompt) for doc in docs]\n",
248+
" return document_separator.join(doc_strings)\n",
249+
"\n",
250+
"\n",
251+
"_context = RunnableParallel(\n",
252+
" context=retriever | _combine_documents,\n",
253+
" question=RunnablePassthrough(),\n",
254+
")\n",
255+
"\n",
256+
"chain = (_context | LLM_CONTEXT_PROMPT | llm)\n",
257+
"\n",
258+
"chain.invoke(\"What movies are about dreams and was released after the year 1992 but before 2007?\")"
259+
]
260+
}
261+
],
262+
"metadata": {
263+
"kernelspec": {
264+
"display_name": "Python 3.11.4 64-bit",
265+
"language": "python",
266+
"name": "python3"
267+
},
268+
"language_info": {
269+
"codemirror_mode": {
270+
"name": "ipython",
271+
"version": 3
272+
},
273+
"file_extension": ".py",
274+
"mimetype": "text/x-python",
275+
"name": "python",
276+
"nbconvert_exporter": "python",
277+
"pygments_lexer": "ipython3",
278+
"version": "3.10.3"
279+
},
280+
"orig_nbformat": 4,
281+
"vscode": {
282+
"interpreter": {
283+
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
284+
}
285+
}
286+
},
287+
"nbformat": 4,
288+
"nbformat_minor": 2
289+
}

0 commit comments

Comments
 (0)