Added tutorial for Langchain Couchbase Vector Search #75

azaddhirajkumar · 2025-11-11T07:28:35Z

No description provided.

gemini-code-assist · 2025-11-11T07:28:47Z

Summary of Changes

Hello @azaddhirajkumar, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new tutorial designed to walk developers through the creation of an AI-driven PDF chat application. The tutorial highlights how to effectively combine Couchbase Vector Search for document retrieval, LangChain for managing large language model interactions and data processing, and Streamlit for a user-friendly interface. It provides practical guidance on implementing Retrieval-Augmented Generation (RAG) to enhance LLM responses with document context and demonstrates performance improvements through LLM response caching.

Highlights

New Tutorial: Introduces a comprehensive guide for building a PDF Chat Application using Couchbase Vector Search.
Technology Stack: Demonstrates integration of Couchbase Vector Search (Query Service), LangChain, OpenAI LLMs, and Streamlit for an AI-powered chat experience.
Core Concepts: Explains and implements Retrieval-Augmented Generation (RAG) for contextual Q&A and LLM response caching with Couchbase.
Vector Indexing: Details the use of Hyperscale and Composite Vector Indexes in Couchbase and programmatic index creation.
LangChain Integration: Showcases LangChain's capabilities for PDF processing, embedding generation, vector store integration, and building complex chains with LCEL.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a new tutorial for building a PDF chat application using Couchbase Vector Search with the Query Service, LangChain, and Python. The tutorial is well-structured and comprehensive. I've provided several review comments to enhance clarity, fix minor errors in code snippets and text, and improve formatting. Addressing these points will make the tutorial easier for users to follow and help them avoid potential issues.

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+    DB_SCOPE,
+    DB_COLLECTION,
+    embedding,
+    distance_strategy=DistanceStrategy.COSINE,


The code uses DistanceStrategy.COSINE, but it's not shown where DistanceStrategy is imported from. This will cause a NameError for users trying to run the code. Please add a note or an import statement to clarify that it should be imported from langchain_couchbase.vectorstores. A similar issue exists in the function signature for get_vector_store which uses DistanceStrategy as a type hint.

Would be nice to mention & link to the distance strategies

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+The `index_description` parameter controls how Couchbase optimizes vector storage and search performance:
+
+**Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'`


The format string for index_description is enclosed in double single-quotes (''...''), which will render literally in markdown. For better readability and to correctly represent it as a code format, it's recommended to use backticks.

Suggested change

**Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'`

**Format:** `IVF[<centroids>],{PQ|SQ}<settings>`

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+> Note: Unlike the Search-based approach, this method does NOT require `INDEX_NAME` as vector indexes are optional and automatically used when available.
+
+> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.


There's an inconsistency in the naming of the environment variable for the login password. The configuration block uses LOGIN_PASSWORD, but this explanatory text refers to it as Login_Password. Since environment variables are case-sensitive, this can lead to confusion and prevent the application from working correctly if the user follows the text. Please use the correct case (LOGIN_PASSWORD) consistently.

Suggested change

> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.

> `LOGIN_PASSWORD` of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+![Main Screen Default View](main_screen_default_view_query.png)
+
+In the main area, there's a chat screen where you can ask questions about the uploaded PDF document. You will receive two responses: one with context from the PDF (Couchbase Logo - <img src="image.ico" alt="couchbase" width="16" style="display: inline;" /> ) , and one without the PDF context (Bot Logo - 🤖). This demonstrates how the Retrieval Augmented Generation (RAG) model enhances the answers provided by the language model using the PDF content.


The tutorial uses an <img> tag with a source pointing to an .ico file (image.ico). The ICO format is not a standard image format for web pages and may not render correctly in all browsers. It would be better to use a more common format like PNG or SVG for consistency with other images in the document (e.g., main_screen_default_view_query.png).

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+- **Streaming Output**: LangChain supports [streaming](https://python.langchain.com/docs/expression_language/streaming/), allowing the app to stream the generated answer to the client in real-time.
+- **Caching**: LangChain's [caching layer](https://python.langchain.com/docs/modules/model_io/llms/llm_caching/) integrates with Couchbase to cache LLM responses, reducing costs and improving response times.
+
+By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.


There is a grammatical error in this sentence. A semicolon is used where a comma would be more appropriate to separate the items in the introductory list from the main clause.

Suggested change

By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.

By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+    connect_string = connection_string
+    cluster = Cluster(connect_string, options)


The line connect_string = connection_string is redundant. The connection_string parameter can be used directly when creating the Cluster instance. Simplifying this makes the example code cleaner for learners.

Suggested change

connect_string = connection_string

cluster = Cluster(connect_string, options)

cluster = Cluster(connection_string, options)

This is different from the code snippet in the repo https://github.com/couchbase-examples/rag-demo/blob/main/chat_with_pdf_query.py#L91

gemini-code-assist · 2025-11-11T07:30:45Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+This function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage or processing in the vector store. It first checks if file was actually uploaded. Then the uploaded file is saved to a temporary file in `binary` format.
+
+From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format


The link for LangChain Document points to the document loaders page. To provide more accurate information for users who want to learn more, it would be better to link directly to the documentation for the Document class/concept.

Suggested change

From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format

From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document/) Format

nithishr

Please double check all the links as LangChain documentation has changed quite a bit.
Ideally, we should also update the FTS tutorial with the same changes.

nithishr · 2025-11-13T08:35:41Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

@@ -0,0 +1,593 @@
+---
+# frontmatter
+path: "/tutorial-python-langchain-pdf-chat-query"


The URL & title needs to be updated to the new terminology

nithishr · 2025-11-13T08:41:51Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+path: "/tutorial-python-langchain-pdf-chat-query"
+# title and description do not need to be added to markdown, start with H2 (##)
+title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search
+short_title: Build PDF Chat App (Vector Search)


It is RAG rather than Vector Search.

nithishr · 2025-11-13T10:26:43Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+# frontmatter
+path: "/tutorial-python-langchain-pdf-chat-query"
+# title and description do not need to be added to markdown, start with H2 (##)
+title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search


Build Chat with PDF App using LangChain and Couchbase Vector Search/Hyperscale and Composite Vector Search

nithishr · 2025-11-13T18:39:53Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+short_title: Build PDF Chat App (Vector Search)
+description:
+  - Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit.
+  - Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search.


Query based Vector Store

nithishr · 2025-11-13T18:40:24Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+description:
+  - Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit.
+  - Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search.
+  - Discover how to use RAG's for context-based Q&A's from PDFs with LLMs.


RAG's -> RAG
Q&A's -> Q&A

nithishr · 2025-11-13T18:56:42Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+### Add Documents to Vector Store
+
+We will utilize the vector store created at [Initialize OpenAI and Couchbase Vector Store](#initialize-openai-and-couchbase-vector-store). In this we will add the documents using add_documents method of Couchbase vector store. This method will utilize the OpenAI embeddings to create embeddings(vectors) from text and add it to Couchbase documents in the specified collection.


Highlight add_documents

nithishr · 2025-11-13T18:57:41Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+### LangChain Expression Language (LCEL)
+
+We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://python.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.


nithishr · 2025-11-13T18:59:00Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+### Create Retriever Chain
+
+We also create the [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore) of the couchbase vector store. This retriever will be used to retrieve the previously added documents which are similar to current query.


nithishr · 2025-11-13T19:00:34Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+   - Create a placeholder for streaming the assistant's response.
+   - Use the chain.invoke(question) method to generate the response from the RAG chain.
+   - The response is automatically cached by the CouchbaseCache layer.
+   - [Stream](https://python.langchain.com/docs/use_cases/question_answering/streaming/) the response in real-time using the custom `stream_string` function.


https://docs.langchain.com/oss/python/langchain/streaming

nithishr · 2025-11-13T19:02:48Z

tutorial/markdown/python/python-langchain-pdf-chat-query/python-langchain-pdf-chat-query.md

+
+## Introduction
+
+Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application using Couchbase Vector Search. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.


Can you add a paragraph hhighlighting that this tutorial is using the Query Based Vector Search. If you are looking for Vector Search using Search (fka FTS) service, refer to this tutorial instead.

Added tutorial for Couchbase Vector Search

281b650

azaddhirajkumar requested a review from a team as a code owner November 11, 2025 07:28

azaddhirajkumar requested a review from nithishr November 11, 2025 07:28

gemini-code-assist bot reviewed Nov 11, 2025

View reviewed changes

fixed failing CI

3df16b8

azaddhirajkumar changed the title ~~Added tutorial for Couchbase Vector Search~~ Added tutorial for Langchain Couchbase Vector Search Nov 11, 2025

nithishr reviewed Nov 13, 2025

View reviewed changes


		The `index_description` parameter controls how Couchbase optimizes vector storage and search performance:

		Format: `'IVF[<centroids>],{PQ\|SQ}<settings>'`

	Format: `'IVF[<centroids>],{PQ\|SQ}<settings>'`
	Format: `IVF[<centroids>],{PQ\|SQ}<settings>`


		> Note: Unlike the Search-based approach, this method does NOT require `INDEX_NAME` as vector indexes are optional and automatically used when available.

		> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.

	> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.
	> `LOGIN_PASSWORD` of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.


		![Main Screen Default View](main_screen_default_view_query.png)

		In the main area, there's a chat screen where you can ask questions about the uploaded PDF document. You will receive two responses: one with context from the PDF (Couchbase Logo - <img src="image.ico" alt="couchbase" width="16" style="display: inline;" /> ) , and one without the PDF context (Bot Logo - 🤖). This demonstrates how the Retrieval Augmented Generation (RAG) model enhances the answers provided by the language model using the PDF content.

		connect_string = connection_string
		cluster = Cluster(connect_string, options)

	connect_string = connection_string
	cluster = Cluster(connect_string, options)
	cluster = Cluster(connection_string, options)


		This function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage or processing in the vector store. It first checks if file was actually uploaded. Then the uploaded file is saved to a temporary file in `binary` format.

		From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format


		### Add Documents to Vector Store

		We will utilize the vector store created at [Initialize OpenAI and Couchbase Vector Store](#initialize-openai-and-couchbase-vector-store). In this we will add the documents using add_documents method of Couchbase vector store. This method will utilize the OpenAI embeddings to create embeddings(vectors) from text and add it to Couchbase documents in the specified collection.


		### LangChain Expression Language (LCEL)

		We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://python.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.


		### Create Retriever Chain

		We also create the [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore) of the couchbase vector store. This retriever will be used to retrieve the previously added documents which are similar to current query.


		## Introduction

		Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application using Couchbase Vector Search. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.

Added tutorial for Langchain Couchbase Vector Search #75

Are you sure you want to change the base?

Added tutorial for Langchain Couchbase Vector Search #75

Uh oh!

Conversation

azaddhirajkumar commented Nov 11, 2025

Uh oh!

gemini-code-assist bot commented Nov 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nithishr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants