mongodb-university · parkerfoshay · Nov 11, 2025
diff --git a/Atlas-Vector-Search-Fundamentals-Skill/README.md b/Atlas-Vector-Search-Fundamentals-Skill/README.md
@@ -0,0 +1,144 @@
+# Atlas Vector Search Fundamentals
+
+Learn how to perform semantic search on movie plots using MongoDB Atlas Vector Search! This example demonstrates finding movies similar to your query by comparing plot descriptions using AI-generated embeddings.
+
+## What This Demo Does
+
+🎬 **Search movies by plot description**: Ask for "movies about escaping prison" and find relevant films  
+🔍 **Semantic understanding**: Finds movies with similar themes, not just matching keywords  
+⚡ **Fast vector search**: Uses MongoDB's optimized vector search capabilities  
+
+> **📝 Note for Video Learners**  
+> This code example uses pre-generated embeddings from the `sample_mflix` dataset, which differs slightly from the real-time embedding generation shown in the skill video.
+
+## What You'll Need
+
+Before getting started, make sure you have:
+
+- ✅ **MongoDB Atlas Cluster** with connection string
+- ✅ **Voyage AI API Key** (free tier available)
+- ✅ **Python 3.7+** installed on your machine
+
+## Step-by-Step Setup
+
+### Step 1: Set Up Your Python Environment
+
+Create an isolated environment for this project:
+
+**Windows:**
+```bash
+python -m venv venv
+venv\Scripts\activate
+```
+
+**macOS/Linux:**
+```bash
+python -m venv venv
+source venv/bin/activate
+```
+
+### Step 2: Install Required Packages
+
+```bash
+pip install pymongo requests
+```
+
+### Step 3: Configure Your API Keys
+
+Open the `key_param.py` file and add your credentials:
+
+```python
+VOYAGE_API_KEY="your_voyage_api_key_here"
+MONGODB_URI="your_mongodb_connection_string_here"
+```
+
+💡 **Getting your keys:**
+- **MongoDB URI**: Copy from your Atlas cluster's "Connect" button
+- **Voyage API Key**: Sign up at [voyageai.com](https://voyageai.com) for a free API key
+
+### Step 4: Load Sample Data
+
+1. In your Atlas cluster, go to **Load Sample Dataset**
+2. Load the **Sample Mflix Dataset** (contains movie data with pre-generated embeddings)
+
+### Step 5: Create Vector Search Index
+
+In your Atlas cluster's **Search & Vector Search** tab (On the left sidebar), create a new **Atlas Vector Search** index on the `movies` collection:
+
+**Index Name:** `vectorPlotIndex`
+
+**Index Definition:**
+```json
+{
+  "fields": [
+    {
+      "type": "vector",
+      "path": "plot_embedding_voyage_3_large",
+      "numDimensions": 2048,
+      "similarity": "dotProduct"
+    },
+    {
+      "type": "filter", 
+      "path": "year"
+    }
+  ]
+}
+```
+
+## How to Use
+
+### 1. Customize Your Search
+
+Edit the `query` variable in `vector_search.py`:
+
+```python
+query = "A movie about people trying to escape from prison"
+# Try other queries like:
+# "romantic comedy in New York" 
+# "space adventure with aliens"
+# "detective solving a murder mystery"
+```
+
+### 2. Run the Search
+
+```bash
+python vector_search.py
+```
+
+### 3. View Results
+
+The script will output the top 10 most similar movies with:
+- 🎬 **Movie Title**
+- 📝 **Plot Summary** 
+- 🎯 **Similarity Score** (higher = more similar)
+
+## Example Output
+
+```
+Title: The Shawshank Redemption
+Plot: Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.
+Score: 0.892
+
+Title: Escape from Alcatraz  
+Plot: A group of inmates attempt the impossible - escape from the island prison of Alcatraz.
+Score: 0.847
+```
+
+## How It Works
+
+1. **Query Processing**: Your text query gets converted to a vector embedding using Voyage AI
+2. **Vector Search**: MongoDB compares your query vector against movie plot embeddings  
+3. **Similarity Ranking**: Results are ranked by semantic similarity, not keyword matching
+4. **Fast Results**: Vector indexes enable millisecond search across thousands of movies
+
+## Troubleshooting
+
+**🚫 "No results found"**: Check that your vector search index is built and active  
+**🔑 "Authentication failed"**: Verify your API keys in `key_param.py`  
+**📦 "Module not found"**: Make sure you activated your virtual environment  
+
+## Learn More
+
+- 📚 [MongoDB Atlas Vector Search Documentation](https://docs.atlas.mongodb.com/atlas-vector-search/)
+- 🎓 [Earn the Vector Search Fundamentals Badge](https://learn.mongodb.com/courses/vector-search-fundamentals)
+- 🤖 [Voyage AI](https://voyageai.com/)
diff --git a/Atlas-Vector-Search-Fundamentals-Skill/embeddings.py b/Atlas-Vector-Search-Fundamentals-Skill/embeddings.py
@@ -0,0 +1,21 @@
+import os
+import requests
+import json
+
+def get_embeddings(text, model, api_key, input_type):
+    url = 'https://api.voyageai.com/v1/embeddings'
+    headers = {
+        'Content-Type': 'application/json',
+        'Authorization': 'Bearer ' +  api_key
+    }
+    data = {
+        'input': text,
+        'model': model,
+        'input_type': input_type,
+        'output_dimension': 2048
+    }
+
+    response = requests.post(url, headers=headers, data=json.dumps(data))
+    responseData = response.json()
+
+    return responseData['data'][0]['embedding']
diff --git a/Atlas-Vector-Search-Fundamentals-Skill/key_param.py b/Atlas-Vector-Search-Fundamentals-Skill/key_param.py
@@ -0,0 +1,2 @@
+VOYAGE_API_KEY=""
+MONGODB_URI=""
diff --git a/Atlas-Vector-Search-Fundamentals-Skill/vector_search.py b/Atlas-Vector-Search-Fundamentals-Skill/vector_search.py
@@ -0,0 +1,48 @@
+from pymongo import MongoClient
+from embeddings import get_embeddings
+
+import key_param
+
+client = MongoClient(key_param.MONGODB_URI)
+db_name = "sample_mflix"
+collection_name = "embedded_movies"
+model = "voyage-3-large"
+collection = client[db_name][collection_name]
+
+query = "A movie about people who are trying to escape from a maximum security facility."
+input_type = "query"
+embedding = get_embeddings(query, model, key_param.VOYAGE_API_KEY, input_type)
+
+pipeline = [
+    {
+        '$vectorSearch': {
+            'exact': False, # Set to True to use ENN
+            'index': 'vectorPlotIndex',
+            'path': 'plot_embedding_voyage_3_large',
+            'queryVector': embedding,
+            'numCandidates': 200,
+            'limit': 10,
+            # 'filter': {
+            #     'year': {
+            #         '$gt’: 2010'
+            #     }
+            # }
+        }
+    },
+    {
+      '$project': {
+          'title': 1,
+          'plot': 1,
+          'score': {
+              '$meta': 'vectorSearchScore'
+          }
+      }
+    }
+]
+
+results = collection.aggregate(pipeline)
+for doc in results:
+    print(f"Title: {doc['title']}")
+    print(f"Plot: {doc['plot']}")
+    print(f"Score: {doc['score']}")
+