Skip to content

Commit 3c57671

Browse files
committed
added ingestion in notebook
1 parent b99673a commit 3c57671

File tree

1 file changed

+38
-2
lines changed

1 file changed

+38
-2
lines changed

clarifai_llm_retriever_example.ipynb

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"cell_type": "markdown",
4747
"metadata": {},
4848
"source": [
49-
"Initialize clarifai app id, user id and PAT.\n",
49+
"#### Initialize clarifai app id, user id and PAT.\n",
5050
"\n",
5151
"You can browse the portal to obtain [MODEL URL](https://clarifai.com/explore/models) for different models in clarifai community."
5252
]
@@ -68,7 +68,43 @@
6868
"cell_type": "markdown",
6969
"metadata": {},
7070
"source": [
71-
"Initialize LLM class"
71+
"### Data ingestion into clarifai vectordatabase\n",
72+
"\n",
73+
"To use clarifai as retriever all you have to do is ingest the documents into clarifai app that serves as your vectordatabase to retrieve similar documents.\n",
74+
"To simplify the ingestion, we are utilising the clarifaivectordatabase integration for ingestion."
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"metadata": {},
81+
"outputs": [],
82+
"source": [
83+
"#run this block to ingest the documents into clarifai app as chunks.\n",
84+
"# if you encounter any issue, make sure to run `pip install langchain`\n",
85+
"\n",
86+
"from langchain.text_splitter import CharacterTextSplitter\n",
87+
"from langchain.document_loaders import TextLoader\n",
88+
"from langchain.vectorstores import Clarifai \n",
89+
"\n",
90+
"loader = TextLoader(\"YOUR_TEXT_FILE_PATH\") #replace with your file path\n",
91+
"documents = loader.load()\n",
92+
"text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=200)\n",
93+
"docs = text_splitter.split_documents(documents)\n",
94+
"\n",
95+
"clarifai_vector_db = Clarifai.from_documents(\n",
96+
" user_id=USER_ID,\n",
97+
" app_id=APP_ID,\n",
98+
" documents=docs,\n",
99+
" pat=PAT\n",
100+
")"
101+
]
102+
},
103+
{
104+
"cell_type": "markdown",
105+
"metadata": {},
106+
"source": [
107+
"#### Initialize LLM class"
72108
]
73109
},
74110
{

0 commit comments

Comments
 (0)