Skip to content

Commit b5ad7c9

Browse files
committed
Add Ragie video-RAG
1 parent 4b76b5a commit b5ad7c9

File tree

9 files changed

+1310
-0
lines changed

9 files changed

+1310
-0
lines changed

mcp-video-rag/.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
RAGIE_API_KEY=<YOUR_RAGIE_API_KEY>

mcp-video-rag/.gitignore

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Python-generated files
2+
__pycache__/
3+
*.py[cod]
4+
build/
5+
dist/
6+
wheels/
7+
*.egg-info/
8+
*.egg
9+
.eggs/
10+
.Python
11+
develop-eggs/
12+
downloads/
13+
lib/
14+
lib64/
15+
parts/
16+
sdist/
17+
var/
18+
.installed.cfg
19+
20+
# Virtual environments
21+
.venv
22+
venv/
23+
ENV/
24+
env/
25+
.env
26+
27+
# IDE specific files
28+
.idea/
29+
.vscode/
30+
*.swp
31+
*.swo
32+
.DS_Store
33+
.project
34+
.pydevproject
35+
.settings/
36+
*.sublime-workspace
37+
*.sublime-project
38+
39+
# Testing and coverage
40+
.tox/
41+
.coverage
42+
.coverage.*
43+
.cache
44+
nosetests.xml
45+
coverage.xml
46+
*.cover
47+
.hypothesis/
48+
.pytest_cache/
49+
htmlcov/
50+
51+
# Documentation
52+
docs/_build/
53+
site/
54+
55+
# Jupyter Notebook
56+
.ipynb_checkpoints
57+
58+
# mypy
59+
.mypy_cache/
60+
.dmypy.json
61+
dmypy.json
62+
63+
# Logs and databases
64+
*.log
65+
*.sqlite
66+
*.db
67+
68+
# Environment variables
69+
.env
70+
.env.local
71+
.env.*.local

mcp-video-rag/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

mcp-video-rag/README.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# MCP-powered video-RAG using Ragie
2+
3+
This project demonstrates how to build a video-based Retrieval Augmented Generation (RAG) system powered by the Model Context Protocol (MCP). It uses [Ragie's](https://www.ragie.ai/) video ingestion and retrieval capabilities to enable semantic search and Q&A over video content and integrate them as MCP tools via Cursor IDE.
4+
5+
We use the following tech stack:
6+
- Ragie for video ingestion + retrieval (video-RAG)
7+
- Cursor as the MCP host
8+
9+
---
10+
## Setup and Installation
11+
12+
Ensure you have Python 3.12 or later installed on your system.
13+
14+
### Install uv
15+
First, let’s install uv and set up our Python project and environment:
16+
```bash
17+
# MacOS/Linux
18+
curl -LsSf https://astral.sh/uv/install.sh | sh
19+
20+
# Windows
21+
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
22+
```
23+
24+
### Install dependencies
25+
```bash
26+
# Create a new directory for our project
27+
uv init project-name
28+
cd project-name
29+
30+
# Create virtual environment and activate it
31+
uv venv
32+
source .venv/bin/activate # MacOS/Linux
33+
34+
.venv\Scripts\activate # Windows
35+
36+
# Install dependencies
37+
uv sync
38+
```
39+
40+
### Configure environment variables
41+
42+
Copy `.env.example` to `.env` and configure the following environment variables:
43+
```
44+
RAGIE_API_KEY=your_ragie_api_key
45+
```
46+
47+
## Run the project
48+
49+
First, set up your MCP server as follows:
50+
- Go to Cursor settings
51+
- Select MCP Tools
52+
- Add new global MCP server.
53+
54+
In the JSON file, add this:
55+
```json
56+
{
57+
"mcpServers": {
58+
"ragie": {
59+
"command": "uv",
60+
"args": [
61+
"--directory",
62+
"/absolute/path/to/project_root",
63+
"run",
64+
"server.py"
65+
],
66+
"env": {
67+
"RAGIE_API_KEY": "YOUR_RAGIE_API_KEY"
68+
}
69+
}
70+
}
71+
}
72+
```
73+
74+
You should now be able to see the MCP server listed in the MCP settings. In Cursor MCP settings make sure to toggle the button to connect the server to the host.
75+
76+
Done! Your server is now up and running.
77+
78+
The custom MCP server has 3 tools:
79+
- `ingest_data_tool`: Ingests the video data to the Ragie index
80+
- `retrieve_data_tool`: Retrieves relevant data from the video based on user query
81+
- `show_video_tool`: Creates a short video chunk from the specified segment from the original video
82+
83+
You can now ingest your videos, retrieve relevant data and query it all using the Cursor Agent.
84+
The agent can even create the desired chunks from your video just with a single query.
85+
86+
---
87+
88+
## 📬 Stay Updated with Our Newsletter!
89+
**Get a FREE Data Science eBook** 📖 with 150+ essential lessons in Data Science when you subscribe to our newsletter! Stay in the loop with the latest tutorials, insights, and exclusive resources. [Subscribe now!](https://join.dailydoseofds.com)
90+
91+
[![Daily Dose of Data Science Newsletter](https://github.com/patchy631/ai-engineering/blob/main/resources/join_ddods.png)](https://join.dailydoseofds.com)
92+
93+
---
94+
95+
## Contribution
96+
97+
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

mcp-video-rag/main.py

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
import os
2+
import time
3+
import logging
4+
from pathlib import Path
5+
6+
from dotenv import load_dotenv
7+
from ragie import Ragie
8+
from moviepy import VideoFileClip
9+
10+
load_dotenv()
11+
12+
logging.basicConfig(level=logging.INFO)
13+
logger = logging.getLogger(__name__)
14+
15+
# initialize ragie client
16+
ragie = Ragie(
17+
auth=os.getenv('RAGIE_API_KEY'),
18+
)
19+
20+
# Remove previous docs from index
21+
def clear_index():
22+
while True:
23+
try:
24+
# List all documents
25+
response = ragie.documents.list()
26+
documents = response.result.documents
27+
28+
# Process each document
29+
for document in documents:
30+
try:
31+
ragie.documents.delete(
32+
document_id=document.id
33+
)
34+
logger.info(f"Deleted document {document.id}")
35+
except Exception as e:
36+
logger.error(f"Failed to delete document {document.id}: {str(e)}")
37+
raise
38+
39+
# Check if there are more documents
40+
if not response.result.pagination.next_cursor:
41+
logger.warning("No more documents\n")
42+
break
43+
44+
except Exception as e:
45+
logger.error(f"Failed to retrieve or process documents: {str(e)}")
46+
raise
47+
48+
# Ingest data from a directory into the Ragie index
49+
def ingest_data(directory):
50+
# Get list of files in directory
51+
directory_path = Path(directory)
52+
files = os.listdir(directory_path)
53+
54+
for file in files:
55+
try:
56+
file_path = directory_path / file
57+
# Read file content
58+
with open(file_path, mode='rb') as f:
59+
file_content = f.read()
60+
# Create document in Ragie
61+
response = ragie.documents.create(request={
62+
"file": {
63+
"file_name": file,
64+
"content": file_content,
65+
},
66+
"mode": {
67+
"video": "audio_video",
68+
"audio": True
69+
}
70+
})
71+
# Wait for document to be ready
72+
while True:
73+
res = ragie.documents.get(document_id=response.id)
74+
if res.status == "ready":
75+
break
76+
77+
time.sleep(2)
78+
79+
logger.info(f"Successfully uploaded {file}")
80+
81+
except Exception as e:
82+
logger.error(f"Failed to process file {file}: {str(e)}")
83+
continue
84+
85+
# Retrieve data from the Ragie index
86+
def retrieve_data(query):
87+
try:
88+
logger.info(f"Retrieving data for query: {query}")
89+
retrieval_response = ragie.retrievals.retrieve(request={
90+
"query": query
91+
})
92+
93+
content = [
94+
{
95+
**chunk.document_metadata,
96+
"text": chunk.text,
97+
"document_name": chunk.document_name,
98+
"start_time": chunk.metadata.get("start_time"),
99+
"end_time": chunk.metadata.get("end_time")
100+
}
101+
for chunk in retrieval_response.scored_chunks
102+
]
103+
104+
logger.info(f"Successfully retrieved {len(content)} chunks")
105+
return content
106+
107+
except Exception as e:
108+
logger.error(f"Failed to retrieve data: {str(e)}")
109+
raise
110+
111+
def chunk_video(document_name, start_time, end_time, directory="videos"):
112+
# Create output filename
113+
output_dir = Path("video_chunks")
114+
output_dir.mkdir(parents=True, exist_ok=True)
115+
116+
chunk_filename = f"video_chunk_{start_time:.1f}_{end_time:.1f}.mp4"
117+
output_path = output_dir / chunk_filename
118+
119+
with VideoFileClip(directory + "/" + document_name) as video:
120+
video_duration = video.duration
121+
actual_end_time = min(end_time, video_duration) if end_time is not None else video_duration
122+
123+
video_chunk = video.subclipped(start_time, actual_end_time)
124+
video_chunk.write_videofile(str(output_path))
125+
126+
return output_path
127+
128+
129+
if __name__ == "__main__":
130+
clear_index()
131+
ingest_data("videos")
132+
print(retrieve_data("What is the main topic of the video?"))

mcp-video-rag/pyproject.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[project]
2+
name = "mcp-video-rag"
3+
version = "0.1.0"
4+
description = "Add your description here"
5+
readme = "README.md"
6+
requires-python = ">=3.12"
7+
dependencies = [
8+
"ipykernel>=6.29.5",
9+
"mcp>=1.9.4",
10+
"moviepy>=2.2.1",
11+
"python-dotenv>=1.1.0",
12+
"ragie>=1.9.0",
13+
]

mcp-video-rag/server.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
from mcp.server.fastmcp import FastMCP
2+
from main import clear_index, ingest_data, retrieve_data, chunk_video
3+
4+
mcp = FastMCP("ragie")
5+
6+
@mcp.tool()
7+
def ingest_data_tool(directory: str) -> None:
8+
"""
9+
Loads data from a directory into the Ragie index. Wait until the data is fully ingested before continuing.
10+
11+
Args:
12+
directory (str): The directory to load data from.
13+
14+
Returns:
15+
str: A message indicating that the data was loaded successfully.
16+
"""
17+
try:
18+
clear_index()
19+
ingest_data(directory)
20+
return "Data loaded successfully"
21+
except Exception as e:
22+
return f"Failed to load data: {str(e)}"
23+
24+
@mcp.tool()
25+
def retrieve_data_tool(query: str) -> list[dict]:
26+
"""
27+
Retrieves data from the Ragie index based on the query. The data is returned as a list of dictionaries, each containing the following keys:
28+
- text: The text of the retrieved chunk
29+
- document_name: The name of the document the chunk belongs to
30+
- start_time: The start time of the chunk
31+
- end_time: The end time of the chunk
32+
33+
Args:
34+
query (str): The query to retrieve data from the Ragie index.
35+
36+
Returns:
37+
list[dict]: The retrieved data.
38+
"""
39+
try:
40+
content = retrieve_data(query)
41+
return content
42+
except Exception as e:
43+
return f"Failed to retrieve data: {str(e)}"
44+
45+
@mcp.tool()
46+
def show_video_tool(document_name: str, start_time: float, end_time: float) -> str:
47+
"""
48+
Creates and saves a video chunk based on the document name, start time, and end time of the chunk.
49+
Returns a message indicating that the video chunk was created successfully.
50+
51+
Args:
52+
document_name (str): The name of the document the chunk belongs to
53+
start_time (float): The start time of the chunk
54+
end_time (float): The end time of the chunk
55+
56+
Returns:
57+
str: A message indicating that the video chunk was created successfully
58+
"""
59+
try:
60+
chunk_video(document_name, start_time, end_time)
61+
return "Video chunk created successfully"
62+
except Exception as e:
63+
return f"Failed to create video chunk: {str(e)}"
64+
65+
# Run the server locally
66+
if __name__ == "__main__":
67+
mcp.run(transport='stdio')

0 commit comments

Comments
 (0)