Skip to content

Commit 5339e8c

Browse files
authored
fix(experiment): Finetune experiment docs (#103)
1 parent 2a0c952 commit 5339e8c

16 files changed

+668
-189
lines changed

datasets/quick-start.mdx

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: "Quick Start"
3+
---
4+
5+
<Frame>
6+
<img
7+
className="block dark:hidden"
8+
src="/img/dataset/dataset-list-light.png"
9+
/>
10+
<img className="hidden dark:block" src="/img/dataset/dataset-list-dark.png" />
11+
</Frame>
12+
13+
Datasets are simple data tables that you can use to manage your data for experiments and evaluation of your AI applications.
14+
Datasets are available in the SDK, and they enable you to create versioned snapshots for reproducible testing.
15+
16+
<Steps>
17+
<Step title="Create a new dataset">
18+
19+
Click **New Dataset** to create a dataset, give it a descriptive name that reflects its purpose or use case, add a description to help your team understand its context, and provide a slug that allows you to use the dataset in the SDK.
20+
21+
</Step>
22+
23+
<Step title="Add your data">
24+
25+
Add rows and columns to structure your dataset.
26+
You can add different column types:
27+
- **Text**: For prompts, model responses, or any textual data
28+
- **Number**: For numerical values, scores, or metrics
29+
- **Boolean**: For true/false flags or binary classifications
30+
31+
<Tip>
32+
Use meaningful column names that clearly describe what each field contains,
33+
making it easier to work with your dataset in code, ensure clarity when using evaluators, and collaborate with team members.
34+
</Tip>
35+
36+
</Step>
37+
38+
<Step title="Publish your dataset version">
39+
40+
<Frame>
41+
<img
42+
className="block dark:hidden"
43+
src="/img/dataset/dataset-view-light.png"
44+
/>
45+
<img className="hidden dark:block" src="/img/dataset/dataset-view-dark.png" />
46+
</Frame>
47+
48+
Once you're satisfied with your dataset structure and data:
49+
1. Click **Publish Version** to create a stable snapshot
50+
2. Published versions are immutable
51+
3. Publish versions are accessible in the SDK
52+
53+
</Step>
54+
55+
<Step title="View your version history">
56+
57+
You can access all published versions of your dataset by opening the version history modal. This allows you to:
58+
- Compare different versions of your dataset
59+
- Track changes over time
60+
- Switch between versions
61+
62+
</Step>
63+
</Steps>

datasets/sdk-usage.mdx

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: "SDK usage"
3+
description: "Access your managed datasets with the Traceloop SDK"
4+
---
5+
6+
## SDK Initialization
7+
8+
First, initialize the Traceloop SDK.
9+
10+
<CodeGroup>
11+
12+
```python Python
13+
from traceloop.sdk import Traceloop
14+
15+
# Initialize with dataset sync enabled
16+
client = Traceloop.init()
17+
```
18+
19+
```js Typescript
20+
import * as traceloop from "@traceloop/node-server-sdk";
21+
22+
// Initialize with comprehensive configuration
23+
traceloop.initialize({
24+
appName: "your-app-name",
25+
apiKey: process.env.TRACELOOP_API_KEY,
26+
disableBatch: true,
27+
traceloopSyncEnabled: true,
28+
});
29+
30+
// Wait for initialization to complete
31+
await traceloop.waitForInitialization();
32+
33+
// Get the client instance for dataset operations
34+
const client = traceloop.getClient();
35+
```
36+
37+
</CodeGroup>
38+
39+
<Note>
40+
Make sure you've created an API key and set it as an environment variable
41+
`TRACELOOP_API_KEY` before you start. Check out the SDK's [getting started
42+
guide](/openllmetry/getting-started-python) for more information.
43+
</Note>
44+
45+
The SDK fetches your datasets from Traceloop servers. Changes made to a draft dataset version are immediately available in the UI.
46+
47+
## Dataset Operations
48+
49+
### Create a dataset
50+
51+
You can create datasets in different ways depending on your data source:
52+
- **Python**: Import from CSV file or pandas DataFrame
53+
- **TypeScript**: Import from CSV data or create manually
54+
55+
<CodeGroup>
56+
57+
```python Python
58+
import pandas as pd
59+
from traceloop.sdk import Traceloop
60+
61+
client = Traceloop.init()
62+
63+
# Create dataset from CSV file
64+
dataset_csv = client.datasets.from_csv(
65+
file_path="path/to/your/data.csv",
66+
slug="medical-questions",
67+
name="Medical Questions",
68+
description="Dataset with patients medical questions"
69+
)
70+
71+
# Create dataset from pandas DataFrame
72+
data = {
73+
"product": ["Laptop", "Mouse", "Keyboard", "Monitor"],
74+
"price": [999.99, 29.99, 79.99, 299.99],
75+
"in_stock": [True, True, False, True],
76+
"category": ["Electronics", "Accessories", "Accessories", "Electronics"],
77+
}
78+
df = pd.DataFrame(data)
79+
80+
# Create dataset from DataFrame
81+
dataset_df = client.datasets.from_dataframe(
82+
df=df,
83+
slug="product-inventory",
84+
name="Product Inventory",
85+
description="Sample product inventory data",
86+
)
87+
```
88+
89+
```js Typescript
90+
const client = traceloop.getClient();
91+
92+
// Option 1: Create dataset manually
93+
const myDataset = await client.datasets.create({
94+
name: "Medical Questions",
95+
slug: "medical-questions",
96+
description: "Dataset with patients medical questions"
97+
});
98+
99+
// Option 2: Create and import from CSV data
100+
const csvData = `user_id,prompt,response,model,satisfaction_score
101+
user_001,"What is React?","React is a JavaScript library...","gpt-3.5-turbo",4
102+
user_002,"Explain Docker","Docker is a containerization platform...","gpt-3.5-turbo",5`;
103+
104+
await myDataset.fromCSV(csvData, { hasHeader: true });
105+
```
106+
107+
</CodeGroup>
108+
109+
### Get a dataset
110+
The dataset can be retrieved using its slug, which is available on the dataset page in the UI
111+
<CodeGroup>
112+
113+
```python Python
114+
# Get dataset by slug - current draft version
115+
my_dataset = client.datasets.get_by_slug("medical-questions")
116+
117+
# Get specific version as CSV
118+
dataset_csv = client.datasets.get_version_csv(
119+
slug="medical-questions",
120+
version="v2"
121+
)
122+
```
123+
124+
```js Typescript
125+
// Get dataset by slug - current draft version
126+
const myDataset = await client.datasets.get("medical-questions");
127+
128+
// Get specific version as CSV
129+
const datasetCsv = await client.datasets.getVersionCSV("medical-questions", "v1");
130+
131+
```
132+
133+
</CodeGroup>
134+
135+
### Adding a Column
136+
137+
<CodeGroup>
138+
139+
```python Python
140+
from traceloop.sdk.dataset import ColumnType
141+
142+
# Add a new column to your dataset
143+
new_column = my_dataset.add_column(
144+
slug="confidence_score",
145+
name="Confidence Score",
146+
col_type=ColumnType.NUMBER
147+
)
148+
```
149+
150+
```js Typescript
151+
// Define schema by adding multiple columns
152+
const columnsToAdd = [
153+
{
154+
name: "User ID",
155+
slug: "user-id",
156+
type: "string" as const,
157+
description: "Unique identifier for the user"
158+
},
159+
{
160+
name: "Satisfaction score",
161+
slug: "satisfaction-score",
162+
type: "number" as const,
163+
description: "User satisfaction rating (1-5)"
164+
}
165+
];
166+
167+
await myDataset.addColumn(columnsToAdd);
168+
console.log("Schema defined with multiple columns");
169+
```
170+
171+
</CodeGroup>
172+
173+
### Adding Rows
174+
175+
Map the column slug to its relevant value
176+
<CodeGroup>
177+
178+
```python Python
179+
# Add new rows to your dataset
180+
row_data = {
181+
"product": "TV Screen",
182+
"price": 1500.0,
183+
"in_stock": True,
184+
"category": "Electronics"
185+
}
186+
187+
my_dataset.add_rows([row_data])
188+
```
189+
190+
```js Typescript
191+
// Add individual rows to dataset
192+
const userId = "user_001";
193+
const prompt = "Explain machine learning in simple terms";
194+
const startTime = Date.now();
195+
196+
const rowData = {
197+
user_id: userId,
198+
prompt: prompt,
199+
response: `This is the model response`,
200+
model: "gpt-3.5-turbo",
201+
satisfaction_score: 1,
202+
};
203+
204+
await myDataset.addRow(rowData);
205+
```
206+
207+
</CodeGroup>
208+
209+
## Dataset Versions
210+
211+
### Publish a dataset
212+
Dataset versions and history can be viewed in the UI. Versioning allows you to run the same evaluations and experiments across different datasets, making valuable comparisons possible.
213+
<CodeGroup>
214+
215+
```python Python
216+
# Publish the current dataset state as a new version
217+
published_version = my_dataset.publish()
218+
```
219+
220+
```js Typescript
221+
// Publish dataset with version and description
222+
const publishedVersion = await myDataset.publish();
223+
```
224+
225+
</CodeGroup>
226+

experiments/introduction.mdx

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: "Introduction"
3+
---
4+
5+
Building reliable LLM applications means knowing whether a new prompt, model, or change of flow actually makes things better.
6+
7+
<Frame>
8+
<img
9+
className="block dark:hidden"
10+
src="/img/experiment/exp-list-light.png"
11+
/>
12+
<img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
13+
</Frame>
14+
15+
Experiments in Traceloop provide teams with a structured workflow for testing and comparing results across different prompt, model, and evaluator checks, all against real datasets.
16+
## What You Can Do with Experiments
17+
18+
<CardGroup cols={2}>
19+
<Card title="Run Multiple Evaluators" icon="list-check">
20+
Execute multiple evaluation checks against your dataset
21+
</Card>
22+
<Card title="View Complete Results" icon="table">
23+
See all experiment run outputs in a comprehensive table view with relevant indicators and detailed reasoning
24+
</Card>
25+
<Card title="Compare Experiment Runs Results" icon="code-compare">
26+
Run the same experiment across different dataset versions to see how it affects your workflow
27+
</Card>
28+
<Card title="Custom Task Pipelines" icon="code">
29+
Add a tailored task to the experiment to create evaluator input. For example: LLM calls, semantic search, etc.
30+
</Card>
31+
</CardGroup>

experiments/result-overview.mdx

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: "Result Overview"
3+
---
4+
5+
All experiments are logged in the Traceloop platform. Each experiment is executed through the SDK.
6+
<Frame>
7+
<img
8+
className="block dark:hidden"
9+
src="/img/experiment/exp-list-light.png"
10+
/>
11+
<img className="hidden dark:block" src="/img/experiment/exp-list-dark.png" />
12+
</Frame>
13+
14+
## Experiment Runs
15+
An experiment can be run multiple times against different datasets and tasks. All runs are logged in the Traceloop platform to enable easy comparison.
16+
17+
<Frame>
18+
<img
19+
className="block dark:hidden"
20+
src="/img/experiment/exp-run-list-light.png"
21+
/>
22+
<img className="hidden dark:block" src="/img/experiment/exp-run-list-dark.png" />
23+
</Frame>
24+
25+
## Experiment Tasks
26+
27+
An experiment run is made up of multiple tasks, where each task represents the experiment flow applied to a single dataset row.
28+
29+
The task logging captures:
30+
31+
- Task input – the data taken from the dataset row.
32+
33+
- Task outputs – the results produced by running the task, which are then passed as input to the evaluator.
34+
35+
- Evaluator results – the evaluator’s assessment based on the task outputs.
36+
<Frame>
37+
<img
38+
className="block dark:hidden"
39+
src="/img/experiment/exp-run-light.png"
40+
/>
41+
<img className="hidden dark:block" src="/img/experiment/exp-run-dark.png" />
42+
</Frame>
43+
44+

0 commit comments

Comments
 (0)