docs: improve LLM tool caling introduction (#390)

sabhatinas · web-flow · commit f962f60c39c2 · 2025-11-07T15:00:25.000-05:00
* docs: improve LLM tool caling introduction

* docs: add an example of tool calling request with and wihtout fine-tuning

* docs: improve LLM tool caling introduction

docs: add an example of tool calling request with and wihtout fine-tuning

docs: fix example titles

* docs: fix gemma3 model name

* docs: added notes on example queries
diff --git a/nemo/data-flywheel/tool-calling/README.md b/nemo/data-flywheel/tool-calling/README.md
@@ -1,12 +1,32 @@
-# Tool Calling Fine-tuning, Inference, and Evaluation with NVIDIA NeMo Microservices and NIM
+# Fine-tuning, Inference, and Evaluation for LLM tool calling with NVIDIA NeMo Microservices 
 
 ## Introduction
 
 Tool calling enables Large Language Models (LLMs) to interact with external systems, execute programs, and access real-time information unavailable in their training data. This capability allows LLMs to process natural language queries, map them to specific functions or APIs, and populate required parameters from user inputs. It's essential for building AI agents capable of tasks like checking inventory, retrieving weather data, managing workflows, and more. It imbues generally improved decision making in agents in the presence of real-time information.
 
+### How LLM Tool Calling Works
+
+- Tools 
+
+A **function** or a **tool** refers to a external functionality provided by the user to the model. As the model generates a reponse to the prompt, it may be decide or can be told to use this functionality provided by a tool to respond to the prompt. In a real world use case, the user can provide a list of tools to get weather for a location, access account details for give nuser id or issue refunds for lost orders etc. Note that these are scenarios where the LLM needs to respond with real-time information outside the pretrained knowledge alone.
+
+- Tool calls 
+
+A **function call** or a **tool call** refers to a model's response when it decides or has been told that it needs to call one of the tools that were made available to it. From the above examples to real-world tools made availabel to a model, if a user sends a prompt like "What's the weather in Paris?", the model will respond to that prompt with a tool call for the **get_weather** tool with **Paris** as the **location** argument. 
+
+- Tool call outputs 
+
+A **function call output** or **tool call output** refers to the response a tool generates using the input from a models's tool call. The tool call outputs can be a structured JSON or plain text. In reference to the real world use cases discussed above, in response to a prompt like "Whats the weather in Paris?", the model returns a tool call that contains the **location** argument with a value of Paris. The tool call output might return a JSON object (e.g., {"temperature": "25", "unit": "C"}, indicating a current temperature of 25 degrees). The model then receives original prompt, tool definition, model's tool call, and the tool call output to generate a text response like:
+
+```bash
+The weather in Paris today is 25C.
+```
+
+**Note:** Common failure patterns include malformed formats which then end up in the model response instead of the actual tool call outputs.
+
 <div style="text-align: center;">
 <img src="./img/tool-use.png" alt="Example of a single-turn function call" width="60%" />
-<p><strong>Figure 1:</strong> Example of a single-turn function call</p>
+<p><strong>Figure 1:</strong> Example of a single-turn function call. User prompts the model to get the weather in Santa Clara on 15th March 2025. The model has also recieved a function description called get_weather() with location and date as arguments. The model outputs a function call by extracting the location (Santa Clara) and date(15 March) from user's prompt. The application then receives the function call and generates the function call output. Note that in addition to the illustrated blocks, the model also receives the function call output along with the original prompt to generate the final text response. </p>
 </div>
 
 ### Customizing LLMs for Function Calling
@@ -17,6 +37,121 @@ To effectively perform function calling, an LLM must:
 - Extract and populate the appropriate parameters for each chosen tool from a user's natural language query.
 - In multi-turn (interact with users back-and-forth), and multi-step (break its response into smaller parts) use cases, the LLM may need to plan, and have the capability to chain multiple actions together.
 
+#### Tool calling with base model
+
+REQUEST 
+
+```bash
+curl "$NEMO_URL/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemma-3.2-1b",
+    "messages": [
+      {
+        "role": "user",
+        "content": "What will the weather be in Berlin on November 7, 2025?"
+      }
+    ],
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "get_weather",
+          "description": "Get the weather for a given location and date.",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "location": { "type": "string" },
+              "date": { "type": "string" }
+            },
+            "required": ["location", "date"]
+          }
+        }
+      }
+    ],
+    "tool_choice": "auto"
+  }' | jq
+```
+
+RESPONSE
+ 
+```bash
+"choices": [
+    {
+      "index": 0,
+      "message": {
+        "content": "{\"name\": \"get_weather\", \"parameters\": {}}",
+        "role": "assistant",
+        "reasoning_content": null
+      },
+      "finish_reason": "stop"
+    }
+  ],
+```
+
+**Note:** Response is missing the required function arguments, namely **location** and **date**. 
+
+#### Tool calling with LoRA fine-tuning
+
+REQUEST 
+
+```bash
+curl "$NEMO_URL/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemma-3-1b-lora-sft@v1",
+    "messages": [
+      {
+        "role": "user",
+        "content": "What will the weather be in Berlin on November 7, 2025?"
+      }
+    ],
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "get_weather",
+          "description": "Get the weather for a given location and date.",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "location": { "type": "string" },
+              "date": { "type": "string" }
+            },
+            "required": ["location", "date"]
+          }
+        }
+      }
+    ],
+    "tool_choice": "auto"
+  }' | jq
+```
+
+RESPONSE
+ 
+```bash
+choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": null,
+        "tool_calls": [
+          {
+            "type": "function",
+            "function": {
+              "name": "get_weather",
+              "arguments": "{\"location\": \"Berlin\", \"date\": \"2025-11-07\"}"
+            }
+          }
+        ]
+      },
+      "finish_reason": "tool_calls",
+    }
+```
+
+**Note:**: Tool calls are not part of the response content, but have a separate field called **tool_calls**. The model generates the function arguments,  **location** and **date** and name, **get_weather** based on the user query and tool.  
+
 As the number of tools and their complexity increases, customization becomes critical for maintaining accuracy and efficiency. Also, smaller models can achieve comparable performance to larger ones through parameter-efficient techniques like [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685). LoRA is compute- and data-efficient, which involves a smaller one-time investment to train the LoRA adapter, allowing you to reap inference-time benefits with a more efficient "bespoke" model.
 
 ### About the xLAM dataset