Skip to content

Conversation

@azaddhirajkumar
Copy link
Contributor

No description provided.

@azaddhirajkumar azaddhirajkumar requested a review from a team as a code owner November 11, 2025 07:28
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new tutorial designed to walk developers through the creation of an AI-driven PDF chat application. The tutorial highlights how to effectively combine Couchbase Vector Search for document retrieval, LangChain for managing large language model interactions and data processing, and Streamlit for a user-friendly interface. It provides practical guidance on implementing Retrieval-Augmented Generation (RAG) to enhance LLM responses with document context and demonstrates performance improvements through LLM response caching.

Highlights

  • New Tutorial: Introduces a comprehensive guide for building a PDF Chat Application using Couchbase Vector Search.
  • Technology Stack: Demonstrates integration of Couchbase Vector Search (Query Service), LangChain, OpenAI LLMs, and Streamlit for an AI-powered chat experience.
  • Core Concepts: Explains and implements Retrieval-Augmented Generation (RAG) for contextual Q&A and LLM response caching with Couchbase.
  • Vector Indexing: Details the use of Hyperscale and Composite Vector Indexes in Couchbase and programmatic index creation.
  • LangChain Integration: Showcases LangChain's capabilities for PDF processing, embedding generation, vector store integration, and building complex chains with LCEL.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new tutorial for building a PDF chat application using Couchbase Vector Search with the Query Service, LangChain, and Python. The tutorial is well-structured and comprehensive. I've provided several review comments to enhance clarity, fix minor errors in code snippets and text, and improve formatting. Addressing these points will make the tutorial easier for users to follow and help them avoid potential issues.

DB_SCOPE,
DB_COLLECTION,
embedding,
distance_strategy=DistanceStrategy.COSINE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code uses DistanceStrategy.COSINE, but it's not shown where DistanceStrategy is imported from. This will cause a NameError for users trying to run the code. Please add a note or an import statement to clarify that it should be imported from langchain_couchbase.vectorstores. A similar issue exists in the function signature for get_vector_store which uses DistanceStrategy as a type hint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention & link to the distance strategies


The `index_description` parameter controls how Couchbase optimizes vector storage and search performance:

**Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The format string for index_description is enclosed in double single-quotes (''...''), which will render literally in markdown. For better readability and to correctly represent it as a code format, it's recommended to use backticks.

Suggested change
**Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'`
**Format:** `IVF[<centroids>],{PQ|SQ}<settings>`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this


> Note: Unlike the Search-based approach, this method does NOT require `INDEX_NAME` as vector indexes are optional and automatically used when available.

> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistency in the naming of the environment variable for the login password. The configuration block uses LOGIN_PASSWORD, but this explanatory text refers to it as Login_Password. Since environment variables are case-sensitive, this can lead to confusion and prevent the application from working correctly if the user follows the text. Please use the correct case (LOGIN_PASSWORD) consistently.

Suggested change
> Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.
> `LOGIN_PASSWORD` of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant


![Main Screen Default View](main_screen_default_view_query.png)

In the main area, there's a chat screen where you can ask questions about the uploaded PDF document. You will receive two responses: one with context from the PDF (Couchbase Logo - <img src="image.ico" alt="couchbase" width="16" style="display: inline;" /> ) , and one without the PDF context (Bot Logo - 🤖). This demonstrates how the Retrieval Augmented Generation (RAG) model enhances the answers provided by the language model using the PDF content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tutorial uses an <img> tag with a source pointing to an .ico file (image.ico). The ICO format is not a standard image format for web pages and may not render correctly in all browsers. It would be better to use a more common format like PNG or SVG for consistency with other images in the document (e.g., main_screen_default_view_query.png).

- **Streaming Output**: LangChain supports [streaming](https://python.langchain.com/docs/expression_language/streaming/), allowing the app to stream the generated answer to the client in real-time.
- **Caching**: LangChain's [caching layer](https://python.langchain.com/docs/modules/model_io/llms/llm_caching/) integrates with Couchbase to cache LLM responses, reducing costs and improving response times.

By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a grammatical error in this sentence. A semicolon is used where a comma would be more appropriate to separate the items in the introductory list from the main clause.

Suggested change
By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.
By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant

Comment on lines +314 to +315
connect_string = connection_string
cluster = Cluster(connect_string, options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The line connect_string = connection_string is redundant. The connection_string parameter can be used directly when creating the Cluster instance. Simplifying this makes the example code cleaner for learners.

Suggested change
connect_string = connection_string
cluster = Cluster(connect_string, options)
cluster = Cluster(connection_string, options)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage or processing in the vector store. It first checks if file was actually uploaded. Then the uploaded file is saved to a temporary file in `binary` format.

From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link for LangChain Document points to the document loaders page. To provide more accurate information for users who want to learn more, it would be better to link directly to the documentation for the Document class/concept.

Suggested change
From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format
From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document/) Format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead link

@azaddhirajkumar azaddhirajkumar changed the title Added tutorial for Couchbase Vector Search Added tutorial for Langchain Couchbase Vector Search Nov 11, 2025
Copy link
Contributor

@nithishr nithishr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check all the links as LangChain documentation has changed quite a bit.
Ideally, we should also update the FTS tutorial with the same changes.

@@ -0,0 +1,593 @@
---
# frontmatter
path: "/tutorial-python-langchain-pdf-chat-query"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL & title needs to be updated to the new terminology

path: "/tutorial-python-langchain-pdf-chat-query"
# title and description do not need to be added to markdown, start with H2 (##)
title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search
short_title: Build PDF Chat App (Vector Search)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is RAG rather than Vector Search.

# frontmatter
path: "/tutorial-python-langchain-pdf-chat-query"
# title and description do not need to be added to markdown, start with H2 (##)
title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build Chat with PDF App using LangChain and Couchbase Vector Search/Hyperscale and Composite Vector Search

short_title: Build PDF Chat App (Vector Search)
description:
- Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit.
- Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query based Vector Store

description:
- Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit.
- Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search.
- Discover how to use RAG's for context-based Q&A's from PDFs with LLMs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAG's -> RAG
Q&A's -> Q&A


### Add Documents to Vector Store

We will utilize the vector store created at [Initialize OpenAI and Couchbase Vector Store](#initialize-openai-and-couchbase-vector-store). In this we will add the documents using add_documents method of Couchbase vector store. This method will utilize the OpenAI embeddings to create embeddings(vectors) from text and add it to Couchbase documents in the specified collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlight add_documents


### LangChain Expression Language (LCEL)

We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://python.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead link


### Create Retriever Chain

We also create the [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore) of the couchbase vector store. This retriever will be used to retrieve the previously added documents which are similar to current query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead link

- Create a placeholder for streaming the assistant's response.
- Use the chain.invoke(question) method to generate the response from the RAG chain.
- The response is automatically cached by the CouchbaseCache layer.
- [Stream](https://python.langchain.com/docs/use_cases/question_answering/streaming/) the response in real-time using the custom `stream_string` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## Introduction

Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application using Couchbase Vector Search. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a paragraph hhighlighting that this tutorial is using the Query Based Vector Search. If you are looking for Vector Search using Search (fka FTS) service, refer to this tutorial instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants