Merge pull request #2 from mogith-pn/Added-notebook-example

mogith-pn · web-flow · commit 221f689fa651 · 2024-01-29T16:22:31.000+05:30
Added notebook example
diff --git a/clarifai_llm_retriever_example.ipynb b/clarifai_llm_retriever_example.ipynb
@@ -0,0 +1,361 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# DSPy-Clarifai lm and retriever example notebook\n",
+    "\n",
+    "This notebook will walk you through on the integration of clarifai into DSPy which enables the DSPy users to leverage clarifai capabilities of calling llm models from clarifai platform and to utilize clarifai app as retriever for their vector search use cases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install clarifai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import necessary packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import dspy\n",
+    "from dspy.retrieve.clarifai_rm import ClarifaiRM "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Initialize clarifai app id, user id and PAT.\n",
+    "\n",
+    "You can browse the portal to obtain [MODEL URL](https://clarifai.com/explore/models) for different models in clarifai community."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#for the demo we are going with llama2-70b-chat\n",
+    "MODEL_URL = \"https://clarifai.com/meta/Llama-2/models/llama2-70b-chat\" \n",
+    "PAT = CLARIFAI_PAT\n",
+    "USER_ID = \"YOUR_USER_ID\"\n",
+    "APP_ID = \"YOUR_APP_ID\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data ingestion into clarifai vectordatabase\n",
+    "\n",
+    "To use clarifai as retriever all you have to do is ingest the documents into clarifai app that serves as your vectordatabase to retrieve similar documents.\n",
+    "To simplify the ingestion, we are utilising the clarifaivectordatabase integration for ingestion."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#run this block to ingest the documents into clarifai app as chunks.\n",
+    "# if you encounter any issue, make sure to run `pip install langchain`\n",
+    "\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "from langchain.document_loaders import TextLoader\n",
+    "from langchain.vectorstores import Clarifai as clarifaivectorstore\n",
+    "\n",
+    "loader = TextLoader(\"YOUR_TEXT_FILE_PATH\") #replace with your file path\n",
+    "documents = loader.load()\n",
+    "text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=200)\n",
+    "docs = text_splitter.split_documents(documents)\n",
+    "\n",
+    "clarifai_vector_db = clarifaivectorstore.from_documents(\n",
+    "    user_id=USER_ID,\n",
+    "    app_id=APP_ID,\n",
+    "    documents=docs,\n",
+    "    pat=PAT\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Initialize LLM class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Make sure to pass all the model parameters in inference_params field of clarifaiLLM class. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "llm=dspy.Clarifai(model=MODEL_URL, api_key=PAT, n=2, inference_params={\"max_tokens\":100,'temperature':0.6})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Initialize Clarifai Retriever model class\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "retriever_model=ClarifaiRM(clarifai_user_id=USER_ID, clarfiai_app_id=APP_ID, clarifai_pat=PAT)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "configure dspy with llm and rm models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dspy.settings.configure(lm=llm, rm=retriever_model)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: dspy.signature and dspy.module with clairfaiLLM"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "NEGATIVE\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "sentence = \"disney again ransacks its archives for a quick-buck sequel .\"  # example from the SST-2 dataset.\n",
+    "\n",
+    "classify = dspy.Predict('sentence -> sentiment')\n",
+    "print(classify(sentence=sentence).sentiment)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Quick glimpse into how our retriever works when a query is passed to the dspy.Retrieve class\n",
+    "\n",
+    "Here we have used a guideline manual for procurement of works.\n",
+    "\n",
+    "link : https://doe.gov.in/sites/default/files/Manual%20for%20Procurement%20of%20Works_0.pdf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "retrieve = dspy.Retrieve(k=1)\n",
+    "topK_passages = retrieve(\"what are the stages in planning, sanctioning and execution of public works\").passages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "['1.11\\n\\nProcessing of Public Works\\n\\nFollowing are the stages in planning, sanctioning and execution of work.\\ni)\\n\\nPerspective Planning for works;\\n\\nii)\\n\\nPreparation of Preliminary Project Report (PPR) or Rough Cost Estimate;\\n\\niii) Acceptance of necessity and issue of in-Principle Approval;\\niv) Preparation of Detailed Project Report (DPR) or Preliminary Estimate (PE);\\n\\n15']\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(topK_passages)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## RAG dspy module using clarifai as retriever"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Generally to construct a module in dspy, you might need to define \n",
+    "\n",
+    "Signature: \n",
+    "explain the input and output fields in an intuitive way with just few words.\n",
+    "(\"question\"-> \"answer\")\n",
+    "\n",
+    "Module:\n",
+    "Module can be something where you put the signatures into action by defining a certain module which compiles and generate response for you for the given query."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Construct a signaturre class, which defines the input fields and output fields needed. \n",
+    "Also, give docstrings and description in verbose, so that the dspy signature could understand the context and compile best prompt for the usecase."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class GenerateAnswer(dspy.Signature):\n",
+    "    \"\"\"Answer questions with short factoid answers.\"\"\"\n",
+    "\n",
+    "    context = dspy.InputField(desc=\"may contain relevant facts about \")\n",
+    "    question = dspy.InputField()\n",
+    "    answer = dspy.OutputField(desc=\"often between 1 and 5 words\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define the module with the actions needs to be performed, here we are showing a small RAG use case where we are retrieving similar contexts using our retriever class and generating response based on the factual context using one of the DSPy module `ChainOfThought`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class RAG(dspy.Module):\n",
+    "    def __init__(self, num_passages=3):\n",
+    "        super().__init__()\n",
+    "\n",
+    "        self.retrieve = dspy.Retrieve(k=num_passages)\n",
+    "        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n",
+    "    \n",
+    "    def forward(self, question):\n",
+    "        context = self.retrieve(question).passages\n",
+    "        prediction = self.generate_answer(context=context, question=question)\n",
+    "        return dspy.Prediction(context=context, answer=prediction.answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we are passing our query and retrieving relevant chunks using clarifai retriever and based on factual evidence, model is able to generate response."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Question: Which bid will be termed as L1 ?\n",
+      "Predicted Answer: The bidder who quotes the lowest price among\n",
+      "Retrieved Contexts (truncated): ['If L1 bid is not a ‘Class-I local supplier’, 50 (fifty) percent of the order quantity\\nshall be awarded to L1. Thereafter, the lowest bidder among the ‘Class-I local\\nsupplier’ will be invited to match ...', 'If L1 bid is not a ‘Class-I local supplier’, 50 (fifty) percent of the order quantity\\nshall be awarded to L1. Thereafter, the lowest bidder among the ‘Class-I local\\nsupplier’ will be invited to match ...', 'If L1 bid is not a ‘Class-I local supplier’, 50 (fifty) percent of the order quantity\\nshall be awarded to L1. Thereafter, the lowest bidder among the ‘Class-I local\\nsupplier’ will be invited to match ...']\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Ask any question you like to this RAG program.\n",
+    "my_question = \"Which bid will be termed as L1?\"\n",
+    "\n",
+    "# Get the prediction. This contains `pred.context` and `pred.answer`.\n",
+    "obj= RAG()\n",
+    "pred=obj(my_question)\n",
+    "\n",
+    "# Print the contexts and the answer.\n",
+    "print(f\"Question: {my_question}\")\n",
+    "print(f\"Predicted Answer: {pred.answer}\")\n",
+    "print(f\"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/dspy/retrieve/clarifai_rm.py b/dspy/retrieve/clarifai_rm.py
@@ -81,7 +81,6 @@ def forward(
             if isinstance(query_or_queries, str)
             else query_or_queries
         )
-        self.clarifai_search.top_k = k if k is not None else self.clarifai_search.top_k
         passages = []
         queries = [q for q in queries if q]
 

Original file line number	Diff line number	Diff line change
`@@ -81,7 +81,6 @@ def forward(`
`81`	`81`	`if isinstance(query_or_queries, str)`
`82`	`82`	`else query_or_queries`
`83`	`83`	`)`
`84`		`- self.clarifai_search.top_k = k if k is not None else self.clarifai_search.top_k`
`85`	`84`	`passages = []`
`86`	`85`	`queries = [q for q in queries if q]`
`87`	`86`