-
Notifications
You must be signed in to change notification settings - Fork 5
Added tutorial for Langchain Couchbase Vector Search #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a new tutorial designed to walk developers through the creation of an AI-driven PDF chat application. The tutorial highlights how to effectively combine Couchbase Vector Search for document retrieval, LangChain for managing large language model interactions and data processing, and Streamlit for a user-friendly interface. It provides practical guidance on implementing Retrieval-Augmented Generation (RAG) to enhance LLM responses with document context and demonstrates performance improvements through LLM response caching. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for building a PDF chat application using Couchbase Vector Search with the Query Service, LangChain, and Python. The tutorial is well-structured and comprehensive. I've provided several review comments to enhance clarity, fix minor errors in code snippets and text, and improve formatting. Addressing these points will make the tutorial easier for users to follow and help them avoid potential issues.
| DB_SCOPE, | ||
| DB_COLLECTION, | ||
| embedding, | ||
| distance_strategy=DistanceStrategy.COSINE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code uses DistanceStrategy.COSINE, but it's not shown where DistanceStrategy is imported from. This will cause a NameError for users trying to run the code. Please add a note or an import statement to clarify that it should be imported from langchain_couchbase.vectorstores. A similar issue exists in the function signature for get_vector_store which uses DistanceStrategy as a type hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to mention & link to the distance strategies
|
|
||
| The `index_description` parameter controls how Couchbase optimizes vector storage and search performance: | ||
|
|
||
| **Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format string for index_description is enclosed in double single-quotes (''...''), which will render literally in markdown. For better readability and to correctly represent it as a code format, it's recommended to use backticks.
| **Format:** `'IVF[<centroids>],{PQ|SQ}<settings>'` | |
| **Format:** `IVF[<centroids>],{PQ|SQ}<settings>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this
|
|
||
| > Note: Unlike the Search-based approach, this method does NOT require `INDEX_NAME` as vector indexes are optional and automatically used when available. | ||
|
|
||
| > Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency in the naming of the environment variable for the login password. The configuration block uses LOGIN_PASSWORD, but this explanatory text refers to it as Login_Password. Since environment variables are case-sensitive, this can lead to confusion and prevent the application from working correctly if the user follows the text. Please use the correct case (LOGIN_PASSWORD) consistently.
| > Login_Password of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app. | |
| > `LOGIN_PASSWORD` of Streamlit app is a basic password to access the app. You can set the password here and while using the app, password will be required to access the app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant
|
|
||
|  | ||
|
|
||
| In the main area, there's a chat screen where you can ask questions about the uploaded PDF document. You will receive two responses: one with context from the PDF (Couchbase Logo - <img src="image.ico" alt="couchbase" width="16" style="display: inline;" /> ) , and one without the PDF context (Bot Logo - 🤖). This demonstrates how the Retrieval Augmented Generation (RAG) model enhances the answers provided by the language model using the PDF content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tutorial uses an <img> tag with a source pointing to an .ico file (image.ico). The ICO format is not a standard image format for web pages and may not render correctly in all browsers. It would be better to use a more common format like PNG or SVG for consistency with other images in the document (e.g., main_screen_default_view_query.png).
| - **Streaming Output**: LangChain supports [streaming](https://python.langchain.com/docs/expression_language/streaming/), allowing the app to stream the generated answer to the client in real-time. | ||
| - **Caching**: LangChain's [caching layer](https://python.langchain.com/docs/modules/model_io/llms/llm_caching/) integrates with Couchbase to cache LLM responses, reducing costs and improving response times. | ||
|
|
||
| By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a grammatical error in this sentence. A semicolon is used where a comma would be more appropriate to separate the items in the introductory list from the main clause.
| By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain; the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files. | |
| By combining Vector Search with Couchbase Query Service, RAG, LLM Caching, and LangChain, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, cache LLM responses for repeated queries, and generate context-aware and informative responses using large language models. This approach provides users with a powerful and intuitive way to explore and interact with large PDF files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant
| connect_string = connection_string | ||
| cluster = Cluster(connect_string, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line connect_string = connection_string is redundant. The connection_string parameter can be used directly when creating the Cluster instance. Simplifying this makes the example code cleaner for learners.
| connect_string = connection_string | |
| cluster = Cluster(connect_string, options) | |
| cluster = Cluster(connection_string, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from the code snippet in the repo https://github.com/couchbase-examples/rag-demo/blob/main/chat_with_pdf_query.py#L91
|
|
||
| This function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage or processing in the vector store. It first checks if file was actually uploaded. Then the uploaded file is saved to a temporary file in `binary` format. | ||
|
|
||
| From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link for LangChain Document points to the document loaders page. To provide more accurate information for users who want to learn more, it would be better to link directly to the documentation for the Document class/concept.
| From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document_loaders/) Format | |
| From the temporary file, PDF is loaded in [PyPDFLoader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/) from the LangChain library which loads the PDF into [LangChain Document](https://python.langchain.com/docs/modules/data_connection/document/) Format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead link
nithishr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please double check all the links as LangChain documentation has changed quite a bit.
Ideally, we should also update the FTS tutorial with the same changes.
| @@ -0,0 +1,593 @@ | |||
| --- | |||
| # frontmatter | |||
| path: "/tutorial-python-langchain-pdf-chat-query" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URL & title needs to be updated to the new terminology
| path: "/tutorial-python-langchain-pdf-chat-query" | ||
| # title and description do not need to be added to markdown, start with H2 (##) | ||
| title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search | ||
| short_title: Build PDF Chat App (Vector Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is RAG rather than Vector Search.
| # frontmatter | ||
| path: "/tutorial-python-langchain-pdf-chat-query" | ||
| # title and description do not need to be added to markdown, start with H2 (##) | ||
| title: Build PDF Chat App With Couchbase Python SDK, LangChain and Vector Search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build Chat with PDF App using LangChain and Couchbase Vector Search/Hyperscale and Composite Vector Search
| short_title: Build PDF Chat App (Vector Search) | ||
| description: | ||
| - Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit. | ||
| - Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query based Vector Store
| description: | ||
| - Construct a PDF Chat App with LangChain, Couchbase Python SDK, Couchbase Vector Search (Query Service), and Streamlit. | ||
| - Learn to upload PDFs into Couchbase Vector Store with LangChain using Query-based Vector Search. | ||
| - Discover how to use RAG's for context-based Q&A's from PDFs with LLMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAG's -> RAG
Q&A's -> Q&A
|
|
||
| ### Add Documents to Vector Store | ||
|
|
||
| We will utilize the vector store created at [Initialize OpenAI and Couchbase Vector Store](#initialize-openai-and-couchbase-vector-store). In this we will add the documents using add_documents method of Couchbase vector store. This method will utilize the OpenAI embeddings to create embeddings(vectors) from text and add it to Couchbase documents in the specified collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlight add_documents
|
|
||
| ### LangChain Expression Language (LCEL) | ||
|
|
||
| We will now utilize the power of LangChain Chains using the [LangChain Expression Language](https://python.langchain.com/docs/expression_language/) (LCEL). LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead link
|
|
||
| ### Create Retriever Chain | ||
|
|
||
| We also create the [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore) of the couchbase vector store. This retriever will be used to retrieve the previously added documents which are similar to current query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead link
| - Create a placeholder for streaming the assistant's response. | ||
| - Use the chain.invoke(question) method to generate the response from the RAG chain. | ||
| - The response is automatically cached by the CouchbaseCache layer. | ||
| - [Stream](https://python.langchain.com/docs/use_cases/question_answering/streaming/) the response in real-time using the custom `stream_string` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ## Introduction | ||
|
|
||
| Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application using Couchbase Vector Search. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a paragraph hhighlighting that this tutorial is using the Query Based Vector Search. If you are looking for Vector Search using Search (fka FTS) service, refer to this tutorial instead.
No description provided.