Skip to content

Commit f962f60

Browse files
authored
docs: improve LLM tool caling introduction (#390)
* docs: improve LLM tool caling introduction * docs: add an example of tool calling request with and wihtout fine-tuning * docs: improve LLM tool caling introduction docs: add an example of tool calling request with and wihtout fine-tuning docs: fix example titles * docs: fix gemma3 model name * docs: added notes on example queries
1 parent f3d38dd commit f962f60

File tree

1 file changed

+137
-2
lines changed

1 file changed

+137
-2
lines changed

nemo/data-flywheel/tool-calling/README.md

Lines changed: 137 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,32 @@
1-
# Tool Calling Fine-tuning, Inference, and Evaluation with NVIDIA NeMo Microservices and NIM
1+
# Fine-tuning, Inference, and Evaluation for LLM tool calling with NVIDIA NeMo Microservices
22

33
## Introduction
44

55
Tool calling enables Large Language Models (LLMs) to interact with external systems, execute programs, and access real-time information unavailable in their training data. This capability allows LLMs to process natural language queries, map them to specific functions or APIs, and populate required parameters from user inputs. It's essential for building AI agents capable of tasks like checking inventory, retrieving weather data, managing workflows, and more. It imbues generally improved decision making in agents in the presence of real-time information.
66

7+
### How LLM Tool Calling Works
8+
9+
- Tools
10+
11+
A **function** or a **tool** refers to a external functionality provided by the user to the model. As the model generates a reponse to the prompt, it may be decide or can be told to use this functionality provided by a tool to respond to the prompt. In a real world use case, the user can provide a list of tools to get weather for a location, access account details for give nuser id or issue refunds for lost orders etc. Note that these are scenarios where the LLM needs to respond with real-time information outside the pretrained knowledge alone.
12+
13+
- Tool calls
14+
15+
A **function call** or a **tool call** refers to a model's response when it decides or has been told that it needs to call one of the tools that were made available to it. From the above examples to real-world tools made availabel to a model, if a user sends a prompt like "What's the weather in Paris?", the model will respond to that prompt with a tool call for the **get_weather** tool with **Paris** as the **location** argument.
16+
17+
- Tool call outputs
18+
19+
A **function call output** or **tool call output** refers to the response a tool generates using the input from a models's tool call. The tool call outputs can be a structured JSON or plain text. In reference to the real world use cases discussed above, in response to a prompt like "Whats the weather in Paris?", the model returns a tool call that contains the **location** argument with a value of Paris. The tool call output might return a JSON object (e.g., {"temperature": "25", "unit": "C"}, indicating a current temperature of 25 degrees). The model then receives original prompt, tool definition, model's tool call, and the tool call output to generate a text response like:
20+
21+
```bash
22+
The weather in Paris today is 25C.
23+
```
24+
25+
**Note:** Common failure patterns include malformed formats which then end up in the model response instead of the actual tool call outputs.
26+
727
<div style="text-align: center;">
828
<img src="./img/tool-use.png" alt="Example of a single-turn function call" width="60%" />
9-
<p><strong>Figure 1:</strong> Example of a single-turn function call</p>
29+
<p><strong>Figure 1:</strong> Example of a single-turn function call. User prompts the model to get the weather in Santa Clara on 15th March 2025. The model has also recieved a function description called get_weather() with location and date as arguments. The model outputs a function call by extracting the location (Santa Clara) and date(15 March) from user's prompt. The application then receives the function call and generates the function call output. Note that in addition to the illustrated blocks, the model also receives the function call output along with the original prompt to generate the final text response. </p>
1030
</div>
1131

1232
### Customizing LLMs for Function Calling
@@ -17,6 +37,121 @@ To effectively perform function calling, an LLM must:
1737
- Extract and populate the appropriate parameters for each chosen tool from a user's natural language query.
1838
- In multi-turn (interact with users back-and-forth), and multi-step (break its response into smaller parts) use cases, the LLM may need to plan, and have the capability to chain multiple actions together.
1939

40+
#### Tool calling with base model
41+
42+
REQUEST
43+
44+
```bash
45+
curl "$NEMO_URL/v1/chat/completions" \
46+
-H "Content-Type: application/json" \
47+
-d '{
48+
"model": "gemma-3.2-1b",
49+
"messages": [
50+
{
51+
"role": "user",
52+
"content": "What will the weather be in Berlin on November 7, 2025?"
53+
}
54+
],
55+
"tools": [
56+
{
57+
"type": "function",
58+
"function": {
59+
"name": "get_weather",
60+
"description": "Get the weather for a given location and date.",
61+
"parameters": {
62+
"type": "object",
63+
"properties": {
64+
"location": { "type": "string" },
65+
"date": { "type": "string" }
66+
},
67+
"required": ["location", "date"]
68+
}
69+
}
70+
}
71+
],
72+
"tool_choice": "auto"
73+
}' | jq
74+
```
75+
76+
RESPONSE
77+
78+
```bash
79+
"choices": [
80+
{
81+
"index": 0,
82+
"message": {
83+
"content": "{\"name\": \"get_weather\", \"parameters\": {}}",
84+
"role": "assistant",
85+
"reasoning_content": null
86+
},
87+
"finish_reason": "stop"
88+
}
89+
],
90+
```
91+
92+
**Note:** Response is missing the required function arguments, namely **location** and **date**.
93+
94+
#### Tool calling with LoRA fine-tuning
95+
96+
REQUEST
97+
98+
```bash
99+
curl "$NEMO_URL/v1/chat/completions" \
100+
-H "Content-Type: application/json" \
101+
-d '{
102+
"model": "gemma-3-1b-lora-sft@v1",
103+
"messages": [
104+
{
105+
"role": "user",
106+
"content": "What will the weather be in Berlin on November 7, 2025?"
107+
}
108+
],
109+
"tools": [
110+
{
111+
"type": "function",
112+
"function": {
113+
"name": "get_weather",
114+
"description": "Get the weather for a given location and date.",
115+
"parameters": {
116+
"type": "object",
117+
"properties": {
118+
"location": { "type": "string" },
119+
"date": { "type": "string" }
120+
},
121+
"required": ["location", "date"]
122+
}
123+
}
124+
}
125+
],
126+
"tool_choice": "auto"
127+
}' | jq
128+
```
129+
130+
RESPONSE
131+
132+
```bash
133+
choices": [
134+
{
135+
"index": 0,
136+
"message": {
137+
"role": "assistant",
138+
"content": null,
139+
"tool_calls": [
140+
{
141+
"type": "function",
142+
"function": {
143+
"name": "get_weather",
144+
"arguments": "{\"location\": \"Berlin\", \"date\": \"2025-11-07\"}"
145+
}
146+
}
147+
]
148+
},
149+
"finish_reason": "tool_calls",
150+
}
151+
```
152+
153+
**Note:**: Tool calls are not part of the response content, but have a separate field called **tool_calls**. The model generates the function arguments, **location** and **date** and name, **get_weather** based on the user query and tool.
154+
20155
As the number of tools and their complexity increases, customization becomes critical for maintaining accuracy and efficiency. Also, smaller models can achieve comparable performance to larger ones through parameter-efficient techniques like [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685). LoRA is compute- and data-efficient, which involves a smaller one-time investment to train the LoRA adapter, allowing you to reap inference-time benefits with a more efficient "bespoke" model.
21156
22157
### About the xLAM dataset

0 commit comments

Comments
 (0)