couchbase-examples
diff --git a/‎autovec-tutorial/autovec_langchain.ipynb‎
Lines changed: 51 additions & 45 deletions b/‎autovec-tutorial/autovec_langchain.ipynb‎
Lines changed: 51 additions & 45 deletions
diff --git a/‎autovec-tutorial/img/Select_embedding_model.png‎
1.85 KB b/‎autovec-tutorial/img/Select_embedding_model.png‎
1.85 KB
diff --git a/‎autovec-tutorial/img/vector_index.png‎
5.77 KB b/‎autovec-tutorial/img/vector_index.png‎
5.77 KB
diff --git a/‎autovec-tutorial/img/vector_index_page.png‎
55 KB b/‎autovec-tutorial/img/vector_index_page.png‎
55 KB
diff --git a/‎autovec-tutorial/img/workflow_summary.png‎
-120 KB b/‎autovec-tutorial/img/workflow_summary.png‎
-120 KB
@@ -18,7 +18,9 @@
    },
    "source": [
     "# 1. Create and Deploy Operational Cluster on Capella\n",
-    " To get started with Couchbase Capella, create an account and use it to deploy a cluster. To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).\n",
+    "To get started with Couchbase Capella, create an account and use it to deploy a cluster. \n",
+    "\n",
+    "Make sure that you deploy a `Multi-node` cluster with `data`, `index`, `query` and `eventing` services enabled. To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).\n",
     " ### Couchbase Capella Configuration\n",
     " When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met:\n",
     "   * Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application.\n",
@@ -104,16 +106,16 @@
     "  \n",
     "5. After choosing the type of mapping, it is required to either create an index on the new vector_embedding field or the creation of a vector index can be skipped, which is not recommended as the functionality of vector searching will be lost.\n",
     "\n",
-    "   <img src=\"./img/vector_index.png\" width=\"1200px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   <img src=\"./img/vector_index.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "6. Below screenshot highlights the whole process which were mentioned above, and click next afterwards as shown below.\n",
     "\n",
-    "   <img src=\"./img/vector_index_page.png\" width=\"1200px\" height=\"1200px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   <img src=\"./img/vector_index_page.png\" width=\"900px\" height=\"1200px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "\n",
     "7. Select the model which will be used to create the embeddings. There are two options to create the embeddings, `capella based` and `external model`.\n",
     "   \n",
-    "   <img src=\"./img/Select_embedding_model.png\" width=\"650px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   <img src=\"./img/Select_embedding_model.png\" width=\"500px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "   - For this tutorial, capella based embedding model is used as can be seen in the image above. API credentials can be uploaded using the file downloaded in `step 2.2` or it can be entered manually as well.\n",
     "   - Choices between private and insecure networking is available to choose.\n",
@@ -123,7 +125,7 @@
     "\n",
     "8.  <B>`Workflow Summary`</B> will display all the necessary details of the workflow including `Data Source`, `Model Service` and `Billing Overview` as shown in image below.\n",
     "\n",
-    "    <img src=\"./img/workflow_summary.png\" width=\"800px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "    <img src=\"./img/workflow_summary.png\" width=\"500px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
     "\n",
     "\n",
     "\n",
@@ -142,7 +144,9 @@
    "source": [
     "# 5. Vector Search\n",
     "\n",
-    "The following code cells implement semantic vector search against the embeddings generated by the AutoVectorization workflow. "
+    "The following code cells implement semantic vector search against the embeddings generated by the AutoVectorization workflow. These searches are powered by **Couchbase's Search service**.\n",
+    "\n",
+    "Before you proceed, make sure the following packages are installed by running:"
    ]
   },
   {
@@ -156,15 +160,14 @@
    },
    "outputs": [],
    "source": [
-    "!pip install couchbase langchain-couchbase langchain-openai"
+    "!pip install langchain-couchbase langchain-openai"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "a1854af3",
    "metadata": {},
    "source": [
-    "`couchbase - Version: 4.4.0` \\\n",
     "`langchain-couchbase - Version: 0.4.0` \\\n",
     "`pip install langchain-openai - Version: 0.3.34`\n",
     "\n",
@@ -175,7 +178,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "id": "30955126-0053-4cec-9dec-e4c05a8de7c3",
    "metadata": {},
    "outputs": [],
@@ -185,7 +188,9 @@
     "from couchbase.options import ClusterOptions\n",
     "\n",
     "from langchain_openai import OpenAIEmbeddings\n",
-    "from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore"
+    "from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore\n",
+    "\n",
+    "from datetime import timedelta"
    ]
   },
   {
@@ -194,23 +199,22 @@
    "metadata": {},
    "source": [
     "# Cluster Connection Setup\n",
-    "   - Defines the secure connection string, user credentials, and creates a `Cluster` object.\n",
-    "   - Disables TLS verification by `options = ClusterOptions(auth, tls_verify='none')` ONLY for quick local testing (not recommended in production) and applies the `wan_development` profile to tune timeouts for higher-latency networks."
+    "   - Defines the secure connection string, user credentials, and creates a `Cluster` object."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "id": "7e4c9e8d",
    "metadata": {},
    "outputs": [],
    "source": [
-    "endpoint = \"CLUSTER_CONNECTION_STRING\"                                              # Replace this with Connection String\n",
-    "username = \"YOUR_USERNAME\"                                                          # Replace this with your username\n",
-    "password = \"YOUR_PASSWORD\"                                                          # Replace this with your password\n",
+    "endpoint = \"couchbases://cb.f-znsfdbilcp-ja4.sandbox.nonprod-project-avengers.com\"                                              # Replace this with Connection String\n",
+    "username = \"testing\"                                                          # Replace this with your username\n",
+    "password = \"Testing@1\"                                                          # Replace this with your password\n",
     "auth = PasswordAuthenticator(username, password)\n",
     "\n",
-    "options = ClusterOptions(auth)\n",
+    "options = ClusterOptions(auth, tls_verify='none')\n",
     "cluster = Cluster(endpoint, options)\n",
     "\n",
     "cluster.wait_until_ready(timedelta(seconds=5))"
@@ -223,10 +227,11 @@
    "source": [
     "# Selection of Buckets / Scope / Collection / Index / Embedder\n",
     "   - Sets the bucket, scope, and collection where the documents (with vector fields) live.\n",
-    "   - Specifies the Capella Search index name created (or selected) in Step 4.5.\n",
+    "   - `index_name` specifies the **Capella Search index name**. This is the Search index created automatically during the workflow setup (step 4.5) or manually as described in the same step. You can find this index name in the **Search** tab of your Capella cluster.\n",
     "   - `embedder` instantiates the NVIDIA embedding model that will transform the user's natural language query into a vector at search time.\n",
     "       - `open_api_key` is the api key token created in `step 3.2 -3`.\n",
     "       - `open_api_base` is the Capella model services endpoint found in the models section.\n",
+    "       - for more details visit [openAIEmbeddings](https://docs.langchain.com/oss/python/integrations/text_embedding/openai).\n",
     "\n",
     "`Note that the Capella AI Endpoint also requires an additional /v1 from the endpoint if not shown on the UI`"
    ]
@@ -241,8 +246,8 @@
     "bucket_name = \"travel-sample\"\n",
     "scope_name = \"inventory\"\n",
     "collection_name = \"hotel\"\n",
-    "index_name = \"hybrid_autovec_workflow_vec_addr_descr_id\"  # This is the name of the search index that was created in step 4.5 and can also be seen in the search tab of the cluster.\n",
-    "                                                          # It should be noted that hybrid_workflow_name_index_fieldname is the naming convention for the index created by AutoVectorization workflow where\n",
+    "index_name = \"hyperscale_autovec_workflow_vec_addr_descr_id\"  # This is the name of the search index that was created in step 4.5 and can also be seen in the search tab of the cluster.\n",
+    "                                                          # It should be noted that hyperscale_workflow_name_index_fieldname is the naming convention for the index created by AutoVectorization workflow where\n",
     "                                                          # fieldname is the name of the field being indexed.\n",
     "\n",
     "#  Using the OpenAI SDK for the embeddings with the capella model services and they are compatible with the OpenAIEmbeddings class in Langchain\n",
@@ -261,20 +266,37 @@
    "metadata": {},
    "source": [
     "# VectorStore Construction\n",
-    "   - Creates a `CouchbaseSearchVectorStore` instance that:\n",
+    "   - Creates a [CouchbaseSearchVectorStore](https://couchbase-ecosystem.github.io/langchain-couchbase/langchain_couchbase.html#couchbase-search-vector-store) instance that interfaces with **Couchbase's Search service** to perform vector similarity searches.\n",
+    "   - The vector store:\n",
     "     * Knows where to read documents (`bucket/scope/collection`).\n",
-    "     * Knows the embedding field (the vector produced by the AutoVectorization workflow).\n",
-    "     * Uses the provided embedder to embed queries on-demand.\n",
+    "     * References the Search index (`index_name`) that contains vector field mappings.\n",
+    "     * Knows the embedding field (the vector produced by the Auto-Vectorization workflow).\n",
+    "     * Uses the provided embedder to embed queries on-demand for similarity search.\n",
     "   - If your AutoVectorization workflow produced a different vector field name, update `embedding_key` accordingly.\n",
     "   - If you mapped multiple fields into a single vector, you can choose any representative field for `text_key`, or modify the VectorStore wrapper to concatenate fields."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 21,
    "id": "50b85f78",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "ename": "ValueError",
+     "evalue": "Index hyperscale_autovec_workflow_vec_addr_descr_id does not exist.  Please create the index before searching.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mValueError\u001b[39m                                Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[21]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m vector_store = \u001b[43mCouchbaseSearchVectorStore\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m      2\u001b[39m \u001b[43m    \u001b[49m\u001b[43mcluster\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcluster\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      3\u001b[39m \u001b[43m    \u001b[49m\u001b[43mbucket_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43mbucket_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      4\u001b[39m \u001b[43m    \u001b[49m\u001b[43mscope_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43mscope_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      5\u001b[39m \u001b[43m    \u001b[49m\u001b[43mcollection_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43mcollection_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      6\u001b[39m \u001b[43m    \u001b[49m\u001b[43membedding\u001b[49m\u001b[43m=\u001b[49m\u001b[43membedder\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      7\u001b[39m \u001b[43m    \u001b[49m\u001b[43mindex_name\u001b[49m\u001b[43m=\u001b[49m\u001b[43mindex_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m      8\u001b[39m \u001b[43m    \u001b[49m\u001b[43mtext_key\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43maddress\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m                  \u001b[49m\u001b[38;5;66;43;03m# Your document's text field\u001b[39;49;00m\n\u001b[32m      9\u001b[39m \u001b[43m    \u001b[49m\u001b[43membedding_key\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mvec_addr_descr_id\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m    \u001b[49m\u001b[38;5;66;43;03m# This is the field in which your vector (embedding) is stored in the cluster.\u001b[39;49;00m\n\u001b[32m     10\u001b[39m \u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/vector-search-cookbook/.venv/lib/python3.14/site-packages/langchain_couchbase/vectorstores/search_vector_store.py:267\u001b[39m, in \u001b[36mCouchbaseSearchVectorStore.__init__\u001b[39m\u001b[34m(self, cluster, bucket_name, scope_name, collection_name, embedding, index_name, text_key, embedding_key, scoped_index)\u001b[39m\n\u001b[32m    265\u001b[39m     \u001b[38;5;28mself\u001b[39m._check_index_exists()\n\u001b[32m    266\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[32m--> \u001b[39m\u001b[32m267\u001b[39m     \u001b[38;5;28;01mraise\u001b[39;00m e\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/vector-search-cookbook/.venv/lib/python3.14/site-packages/langchain_couchbase/vectorstores/search_vector_store.py:265\u001b[39m, in \u001b[36mCouchbaseSearchVectorStore.__init__\u001b[39m\u001b[34m(self, cluster, bucket_name, scope_name, collection_name, embedding, index_name, text_key, embedding_key, scoped_index)\u001b[39m\n\u001b[32m    263\u001b[39m \u001b[38;5;66;03m# Check if the index exists. Throws ValueError if it doesn't\u001b[39;00m\n\u001b[32m    264\u001b[39m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[32m--> \u001b[39m\u001b[32m265\u001b[39m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_check_index_exists\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    266\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[32m    267\u001b[39m     \u001b[38;5;28;01mraise\u001b[39;00m e\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/vector-search-cookbook/.venv/lib/python3.14/site-packages/langchain_couchbase/vectorstores/search_vector_store.py:192\u001b[39m, in \u001b[36mCouchbaseSearchVectorStore._check_index_exists\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m    188\u001b[39m     all_indexes = [\n\u001b[32m    189\u001b[39m         index.name \u001b[38;5;28;01mfor\u001b[39;00m index \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m._scope.search_indexes().get_all_indexes()\n\u001b[32m    190\u001b[39m     ]\n\u001b[32m    191\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m._index_name \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m all_indexes:\n\u001b[32m--> \u001b[39m\u001b[32m192\u001b[39m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[32m    193\u001b[39m             \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mIndex \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m._index_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m does not exist. \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    194\u001b[39m             \u001b[33m\"\u001b[39m\u001b[33m Please create the index before searching.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    195\u001b[39m         )\n\u001b[32m    196\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m    197\u001b[39m     all_indexes = [\n\u001b[32m    198\u001b[39m         index.name \u001b[38;5;28;01mfor\u001b[39;00m index \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m._cluster.search_indexes().get_all_indexes()\n\u001b[32m    199\u001b[39m     ]\n",
+      "\u001b[31mValueError\u001b[39m: Index hyperscale_autovec_workflow_vec_addr_descr_id does not exist.  Please create the index before searching."
+     ]
+    }
+   ],
    "source": [
     "vector_store = CouchbaseSearchVectorStore(\n",
     "    cluster=cluster,\n",
@@ -306,19 +328,9 @@
    "execution_count": null,
    "id": "177fd6d5",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1. Glossop — Address: Woodhead Road\n",
-      "2. Glossop — Address: 28 Woodhead Road\n",
-      "3. Hadrian's Wall — Address: Greenhead, Brampton, Cumbria, CA8 7HB\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "query = \"Woodhead Road\"\n",
+    "query = \"What hotels are there in USA?\"\n",
     "results = vector_store.similarity_search(query, k=3)\n",
     "\n",
     "# Print out the top-k results\n",
@@ -350,17 +362,11 @@
     "\n",
     "> Your vector search pipeline is working if the returned documents feel meaningfully related to your natural language query—even when exact keywords do not match. Feel free to experiment with increasingly descriptive queries to observe the semantic power of the embeddings."
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54b9ee43",
-   "metadata": {},
-   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "autovec",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -374,7 +380,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.7"
+   "version": "3.14.0"
   }
  },
  "nbformat": 4,