Added Llama 3.3 Nemotron Super 49B examples (#280)

chrisalexiuk-nvidia · web-flow · commit b4f5054ac7e5 · 2025-03-18T12:16:15.000-07:00
diff --git a/llama_3.3_nemotron_super_49B/.python-version b/llama_3.3_nemotron_super_49B/.python-version
@@ -0,0 +1 @@
+3.13
diff --git a/llama_3.3_nemotron_super_49B/Detailed Thinking Mode with Llama 3.3 Nemotron Super 49B.ipynb b/llama_3.3_nemotron_super_49B/Detailed Thinking Mode with Llama 3.3 Nemotron Super 49B.ipynb
@@ -0,0 +1,193 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Detailed Thinking Mode with Llama 3.3 Nemotron Super 49B\n",
+    "\n",
+    "In this notebook, we'll explore how simple it is to leverage thinking mode on, and off, using the Llama 3.3 Nemotron Super 49B NIM. \n",
+    "\n",
+    "If you'd like to learn more about this model - please check out our [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/), which goes into exactly how this model was produced.\n",
+    "\n",
+    "> NOTE: In order to move forward in this notebook - please ensure you've followed the instructions on the [README.md](./README.md)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Detailed Thinking Mode On\n",
+    "\n",
+    "For the first example, we'll look at the model with detailed thinking \"on\". With the system prompt set to enable detailed thinking, the model will behave as a long-thinking reasoning model and is most effective for complex reasoning tasks. Before outputting a final response, the model will generate a number of tokens enclosed by \"thinking\" tags. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<think>\n",
+      "Okay, so I need to solve the equation x*(sin(x) + 2) = 0. Hmm, let me think. I remember from algebra that if a product of two things is zero, then at least one of them has to be zero. That's the zero product property. So, in this case, either x = 0 or sin(x) + 2 = 0. Let me write that down:\n",
+      "\n",
+      "1. x = 0\n",
+      "2. sin(x) + 2 = 0\n",
+      "\n",
+      "Alright, starting with the first one, x = 0. That seems straightforward. If x is 0, then plugging back into the original equation: 0*(sin(0) + 2) = 0*(0 + 2) = 0*2 = 0. Yep, that works. So x = 0 is definitely a solution.\n",
+      "\n",
+      "Now, the second part: sin(x) + 2 = 0. Let me solve for sin(x). Subtract 2 from both sides: sin(x) = -2. Wait a minute, sin(x) equals -2? But the sine function has a range of [-1, 1]. That means sin(x) can never be less than -1 or greater than 1. So sin(x) = -2 is impossible. Therefore, there are no solutions from the second equation.\n",
+      "\n",
+      "So, putting it all together, the only solution is x = 0. Let me double-check. If x is 0, then the equation holds true. If x is any other number, sin(x) + 2 would be at least -1 + 2 = 1, so the product x*(something at least 1) would only be zero if x is zero. Yeah, that makes sense.\n",
+      "\n",
+      "Wait, but what if x is a complex number? The problem didn't specify, but usually when solving equations like this without context, we assume real numbers. So I think it's safe to stick with real solutions here. In the complex plane, sine can take on values outside [-1, 1], but solving sin(x) = -2 in complex analysis is more complicated and probably beyond the scope of this problem. The question seems to be expecting real solutions.\n",
+      "\n",
+      "Therefore, the only real solution is x = 0.\n",
+      "\n",
+      "**Final Answer**\n",
+      "The solution is \\boxed{0}.\n",
+      "</think>\n",
+      "\n",
+      "To solve the equation \\( x(\\sin(x) + 2) = 0 \\), we use the zero product property, which states that if a product of two factors is zero, then at least one of the factors must be zero. This gives us two cases to consider:\n",
+      "\n",
+      "1. \\( x = 0 \\)\n",
+      "2. \\( \\sin(x) + 2 = 0 \\)\n",
+      "\n",
+      "For the first case, \\( x = 0 \\):\n",
+      "- Substituting \\( x = 0 \\) into the original equation, we get \\( 0(\\sin(0) + 2) = 0 \\), which is true. Therefore, \\( x = 0 \\) is a solution.\n",
+      "\n",
+      "For the second case, \\( \\sin(x) + 2 = 0 \\):\n",
+      "- Solving for \\( \\sin(x) \\), we get \\( \\sin(x) = -2 \\). However, the sine function has a range of \\([-1, 1]\\), so \\( \\sin(x) = -2 \\) is impossible. Therefore, there are no solutions from this case.\n",
+      "\n",
+      "Since the second case yields no solutions, the only solution is \\( x = 0 \\).\n",
+      "\n",
+      "\\[\n",
+      "\\boxed{0}\n",
+      "\\]"
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "  base_url = \"http://0.0.0.0:8000/v1\",\n",
+    "  api_key = \"not used\"\n",
+    ")\n",
+    "\n",
+    "completion = client.chat.completions.create(\n",
+    "  model=\"nvidia/llama-3.3-nemotron-super-49b-v1\",\n",
+    "  messages=[\n",
+    "    {\"role\":\"system\",\"content\":\"detailed thinking on\"},\n",
+    "    {\"role\":\"user\",\"content\":\"Solve x*(sin(x)+2)=0\"}\n",
+    "  ],\n",
+    "  temperature=0.6,\n",
+    "  top_p=0.95,\n",
+    "  max_tokens=32768,\n",
+    "  frequency_penalty=0,\n",
+    "  presence_penalty=0,\n",
+    "  stream=True\n",
+    ")\n",
+    "\n",
+    "for chunk in completion:\n",
+    "  if chunk.choices[0].delta.content is not None:\n",
+    "    print(chunk.choices[0].delta.content, end=\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Detailed Thinking Mode Off\n",
+    "\n",
+    "For our second example, we will look at the model with detailed thinking \"off\". With the system prompt set to disable detailed thinking, the model will behave as a typical instruction-tuned model. It will immediately begin generating the final response, with no thinking tokens produced. This mode is most effective for things like Tool Calling, Chat applications, or other use-cases where a direct response is preferred."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "NVIDIA is a leading American technology company known for designing and manufacturing a wide range of products, but most notably for its graphics processing units (GPUs), which have become indispensable in various fields. Here's a breakdown of what NVIDIA is and what it does across its main areas of focus:\n",
+      "\n",
+      "### 1. **Graphics Processing Units (GPUs) for Gaming**\n",
+      "- **Primary Use**: Enhancing gaming experiences by accelerating graphics rendering.\n",
+      "- **Products**: GeForce series (e.g., GeForce RTX 30 series) for consumers and enthusiasts.\n",
+      "- **Key Features**: High-resolution gaming, ray tracing, artificial intelligence (AI) enhanced graphics, and more.\n",
+      "\n",
+      "### 2. **Professional Graphics (Quadro)**\n",
+      "- **Primary Use**: For professionals requiring high-end graphics capabilities (e.g., 3D modeling, video editing, engineering).\n",
+      "- **Products**: Quadro series, designed for reliability and performance in professional applications.\n",
+      "\n",
+      "### 3. **Datacenter and AI Computing**\n",
+      "- **Primary Use**: Accelerating compute-intensive workloads in data centers, including AI, deep learning, and high-performance computing (HPC).\n",
+      "- **Products**: Tesla series (for data centers and cloud computing), H100 for AI and HPC.\n",
+      "- **Key Technologies**: NVIDIA's CUDA platform for parallel computing, Tensor Cores for AI acceleration.\n",
+      "\n",
+      "### 4. **Automotive**\n",
+      "- **Primary Use**: Developing and supplying technologies for autonomous vehicles, including computer vision, sensor fusion, and AI processing.\n",
+      "- **Products/Platforms**: DRIVE series, including hardware (e.g., DRIVE PX) and software (e.g., DRIVE OS) for autonomous vehicle development.\n",
+      "\n",
+      "### 5. **Other Areas**\n",
+      "- **NVIDIA Shield**: A series of Android-based devices for gaming and streaming media.\n",
+      "- **OEM (Original Equipment Manufacturer) Supply**: NVIDIA chips and technologies are integrated into various devices by other manufacturers.\n",
+      "- **Research and Development**: Actively involved in advancing fields like AI, robotics, and healthcare through technological innovations.\n",
+      "\n",
+      "### Key Technologies and Initiatives:\n",
+      "- **CUDA**: A parallel computing platform and programming model for NVIDIA GPUs.\n",
+      "- **TensorRT**: For optimizing and deploying AI models in production environments.\n",
+      "- **NVIDIA Research**: Focused on future technologies, including AI, computer vision, and more.\n",
+      "- **Acquisitions and Partnerships**: NVIDIA engages in strategic acquisitions (e.g., Arm Ltd. acquisition attempt) and partnerships to expand its ecosystem and capabilities.\n",
+      "\n",
+      "### Summary:\n",
+      "NVIDIA is a multifaceted technology company that:\n",
+      "- **Drives Gaming Innovation** with consumer and enthusiast GPUs.\n",
+      "- **Empowers Professionals** with high-end graphics solutions.\n",
+      "- **Accelerates Datacenter and AI Workloads** with specialized GPUs and software.\n",
+      "- **Pioneers Autonomous Vehicle Technologies**.\n",
+      "- **Continuously Innovates** across various technological fronts."
+     ]
+    }
+   ],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "  base_url = \"http://0.0.0.0:8000/v1\",\n",
+    "  api_key = \"not used\"\n",
+    ")\n",
+    "\n",
+    "completion = client.chat.completions.create(\n",
+    "  model=\"nvidia/llama-3.3-nemotron-super-49b-v1\",\n",
+    "  messages=[\n",
+    "    {\"role\":\"system\",\"content\":\"detailed thinking off\"},\n",
+    "    {\"role\":\"user\",\"content\":\"What is NVIDIA?\"}\n",
+    "  ],\n",
+    "  temperature=0,\n",
+    "  max_tokens=32768,\n",
+    "  frequency_penalty=0,\n",
+    "  presence_penalty=0,\n",
+    "  stream=True\n",
+    ")\n",
+    "\n",
+    "for chunk in completion:\n",
+    "  if chunk.choices[0].delta.content is not None:\n",
+    "    print(chunk.choices[0].delta.content, end=\"\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/llama_3.3_nemotron_super_49B/README.md b/llama_3.3_nemotron_super_49B/README.md
@@ -0,0 +1,39 @@
+# Detailed Thinking Mode with Llama 3.3 Nemotron Super 49B
+
+In the notebook in this directory, we'll explore how simple it is to leverage thinking mode on, and off, using the Llama 3.3 Nemotron Super 49B NIM - a single model with the ability to modify how it generates responses through a simple toggle in the system prompt.
+
+If you'd like to learn more about this model - please check out our [blog](https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/), which goes into exactly how this model was produced.
+
+To begin, we'll first need to download our NIM - which we can do following the detailed instructions on the Llama 3.3 Nemotron Super 49B [model card on build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).
+
+## Downloading Our NIM
+
+First, we'll need to generate our API key, you can find this by navigating to the "Deploy" tab on the build.nvidia.com website.
+
+![image](./images/api_key.png)
+
+Next, let's login to the NVIDIA Container Registry using the following command:
+
+```bash
+docker login nvcr.io
+```
+
+Next, all we need to do is run the following command and wait for our NIM to spin up!
+
+```bash
+export NGC_API_KEY=<PASTE_API_KEY_HERE>
+export LOCAL_NIM_CACHE=~/.cache/nim
+mkdir -p "$LOCAL_NIM_CACHE"
+docker run -it --rm \
+    --gpus all \
+    --shm-size=16GB \
+    -e NGC_API_KEY \
+    -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
+    -u $(id -u) \
+    -p 8000:8000 \
+    nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1:latest
+```
+
+## Using Our NIM!
+
+We'll follow [this notebook](./Detailed%20Thinking%20Mode%20with%20Llama%203.3%20Nemotron%20Super%2049B.ipynb) for some examples on how to use the Llama 3.3 Nemotron Super 49B NIM in both Detailed Think On, and Off mode!
diff --git a/llama_3.3_nemotron_super_49B/images/api_key.png b/llama_3.3_nemotron_super_49B/images/api_key.png
diff --git a/llama_3.3_nemotron_super_49B/pyproject.toml b/llama_3.3_nemotron_super_49B/pyproject.toml
@@ -0,0 +1,10 @@
+[project]
+name = "nimexample"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "jupyter>=1.1.1",
+    "openai>=1.66.3",
+]