|
| 1 | +--- |
| 2 | +title: "SDK usage" |
| 3 | +description: "Access your managed datasets with the Traceloop SDK" |
| 4 | +--- |
| 5 | + |
| 6 | +## SDK Initialization |
| 7 | + |
| 8 | +First, initialize the Traceloop SDK. |
| 9 | + |
| 10 | +<CodeGroup> |
| 11 | + |
| 12 | +```python Python |
| 13 | +from traceloop.sdk import Traceloop |
| 14 | + |
| 15 | +# Initialize with dataset sync enabled |
| 16 | +client = Traceloop.init() |
| 17 | +``` |
| 18 | + |
| 19 | +```js Typescript |
| 20 | +import * as traceloop from "@traceloop/node-server-sdk"; |
| 21 | + |
| 22 | +// Initialize with comprehensive configuration |
| 23 | +traceloop.initialize({ |
| 24 | + appName: "your-app-name", |
| 25 | + apiKey: process.env.TRACELOOP_API_KEY, |
| 26 | + disableBatch: true, |
| 27 | + traceloopSyncEnabled: true, |
| 28 | +}); |
| 29 | + |
| 30 | +// Wait for initialization to complete |
| 31 | +await traceloop.waitForInitialization(); |
| 32 | + |
| 33 | +// Get the client instance for dataset operations |
| 34 | +const client = traceloop.getClient(); |
| 35 | +``` |
| 36 | + |
| 37 | +</CodeGroup> |
| 38 | + |
| 39 | +<Note> |
| 40 | + Make sure you've created an API key and set it as an environment variable |
| 41 | + `TRACELOOP_API_KEY` before you start. Check out the SDK's [getting started |
| 42 | + guide](/openllmetry/getting-started-python) for more information. |
| 43 | +</Note> |
| 44 | + |
| 45 | +The SDK fetches your datasets from Traceloop servers. Changes made to a draft dataset version are immediately available in the UI. |
| 46 | + |
| 47 | +## Dataset Operations |
| 48 | + |
| 49 | +### Create a dataset |
| 50 | + |
| 51 | +You can create datasets in different ways depending on your data source: |
| 52 | +- **Python**: Import from CSV file or pandas DataFrame |
| 53 | +- **TypeScript**: Import from CSV data or create manually |
| 54 | + |
| 55 | +<CodeGroup> |
| 56 | + |
| 57 | +```python Python |
| 58 | +import pandas as pd |
| 59 | +from traceloop.sdk import Traceloop |
| 60 | + |
| 61 | +client = Traceloop.init() |
| 62 | + |
| 63 | +# Create dataset from CSV file |
| 64 | +dataset_csv = client.datasets.from_csv( |
| 65 | + file_path="path/to/your/data.csv", |
| 66 | + slug="medical-questions", |
| 67 | + name="Medical Questions", |
| 68 | + description="Dataset with patients medical questions" |
| 69 | +) |
| 70 | + |
| 71 | +# Create dataset from pandas DataFrame |
| 72 | +data = { |
| 73 | + "product": ["Laptop", "Mouse", "Keyboard", "Monitor"], |
| 74 | + "price": [999.99, 29.99, 79.99, 299.99], |
| 75 | + "in_stock": [True, True, False, True], |
| 76 | + "category": ["Electronics", "Accessories", "Accessories", "Electronics"], |
| 77 | +} |
| 78 | +df = pd.DataFrame(data) |
| 79 | + |
| 80 | +# Create dataset from DataFrame |
| 81 | +dataset_df = client.datasets.from_dataframe( |
| 82 | + df=df, |
| 83 | + slug="product-inventory", |
| 84 | + name="Product Inventory", |
| 85 | + description="Sample product inventory data", |
| 86 | +) |
| 87 | +``` |
| 88 | + |
| 89 | +```js Typescript |
| 90 | +const client = traceloop.getClient(); |
| 91 | + |
| 92 | +// Option 1: Create dataset manually |
| 93 | +const myDataset = await client.datasets.create({ |
| 94 | + name: "Medical Questions", |
| 95 | + slug: "medical-questions", |
| 96 | + description: "Dataset with patients medical questions" |
| 97 | +}); |
| 98 | + |
| 99 | +// Option 2: Create and import from CSV data |
| 100 | +const csvData = `user_id,prompt,response,model,satisfaction_score |
| 101 | +user_001,"What is React?","React is a JavaScript library...","gpt-3.5-turbo",4 |
| 102 | +user_002,"Explain Docker","Docker is a containerization platform...","gpt-3.5-turbo",5`; |
| 103 | + |
| 104 | +await myDataset.fromCSV(csvData, { hasHeader: true }); |
| 105 | +``` |
| 106 | + |
| 107 | +</CodeGroup> |
| 108 | + |
| 109 | +### Get a dataset |
| 110 | +The dataset can be retrieved using its slug, which is available on the dataset page in the UI |
| 111 | +<CodeGroup> |
| 112 | + |
| 113 | +```python Python |
| 114 | +# Get dataset by slug - current draft version |
| 115 | +my_dataset = client.datasets.get_by_slug("medical-questions") |
| 116 | + |
| 117 | +# Get specific version as CSV |
| 118 | +dataset_csv = client.datasets.get_version_csv( |
| 119 | + slug="medical-questions", |
| 120 | + version="v2" |
| 121 | +) |
| 122 | +``` |
| 123 | + |
| 124 | +```js Typescript |
| 125 | +// Get dataset by slug - current draft version |
| 126 | +const myDataset = await client.datasets.get("medical-questions"); |
| 127 | + |
| 128 | +// Get specific version as CSV |
| 129 | +const datasetCsv = await client.datasets.getVersionCSV("medical-questions", "v1"); |
| 130 | + |
| 131 | +``` |
| 132 | + |
| 133 | +</CodeGroup> |
| 134 | + |
| 135 | +### Adding a Column |
| 136 | + |
| 137 | +<CodeGroup> |
| 138 | + |
| 139 | +```python Python |
| 140 | +from traceloop.sdk.dataset import ColumnType |
| 141 | + |
| 142 | +# Add a new column to your dataset |
| 143 | +new_column = my_dataset.add_column( |
| 144 | + slug="confidence_score", |
| 145 | + name="Confidence Score", |
| 146 | + col_type=ColumnType.NUMBER |
| 147 | +) |
| 148 | +``` |
| 149 | + |
| 150 | +```js Typescript |
| 151 | +// Define schema by adding multiple columns |
| 152 | +const columnsToAdd = [ |
| 153 | + { |
| 154 | + name: "User ID", |
| 155 | + slug: "user-id", |
| 156 | + type: "string" as const, |
| 157 | + description: "Unique identifier for the user" |
| 158 | + }, |
| 159 | + { |
| 160 | + name: "Satisfaction score", |
| 161 | + slug: "satisfaction-score", |
| 162 | + type: "number" as const, |
| 163 | + description: "User satisfaction rating (1-5)" |
| 164 | + } |
| 165 | +]; |
| 166 | + |
| 167 | +await myDataset.addColumn(columnsToAdd); |
| 168 | +console.log("Schema defined with multiple columns"); |
| 169 | +``` |
| 170 | + |
| 171 | +</CodeGroup> |
| 172 | + |
| 173 | +### Adding Rows |
| 174 | + |
| 175 | +Map the column slug to its relevant value |
| 176 | +<CodeGroup> |
| 177 | + |
| 178 | +```python Python |
| 179 | +# Add new rows to your dataset |
| 180 | +row_data = { |
| 181 | + "product": "TV Screen", |
| 182 | + "price": 1500.0, |
| 183 | + "in_stock": True, |
| 184 | + "category": "Electronics" |
| 185 | +} |
| 186 | + |
| 187 | +my_dataset.add_rows([row_data]) |
| 188 | +``` |
| 189 | + |
| 190 | +```js Typescript |
| 191 | +// Add individual rows to dataset |
| 192 | +const userId = "user_001"; |
| 193 | +const prompt = "Explain machine learning in simple terms"; |
| 194 | +const startTime = Date.now(); |
| 195 | + |
| 196 | +const rowData = { |
| 197 | + user_id: userId, |
| 198 | + prompt: prompt, |
| 199 | + response: `This is the model response`, |
| 200 | + model: "gpt-3.5-turbo", |
| 201 | + satisfaction_score: 1, |
| 202 | +}; |
| 203 | + |
| 204 | +await myDataset.addRow(rowData); |
| 205 | +``` |
| 206 | + |
| 207 | +</CodeGroup> |
| 208 | + |
| 209 | +## Dataset Versions |
| 210 | + |
| 211 | +### Publish a dataset |
| 212 | +Dataset versions and history can be viewed in the UI. Versioning allows you to run the same evaluations and experiments across different datasets, making valuable comparisons possible. |
| 213 | +<CodeGroup> |
| 214 | + |
| 215 | +```python Python |
| 216 | +# Publish the current dataset state as a new version |
| 217 | +published_version = my_dataset.publish() |
| 218 | +``` |
| 219 | + |
| 220 | +```js Typescript |
| 221 | +// Publish dataset with version and description |
| 222 | +const publishedVersion = await myDataset.publish(); |
| 223 | +``` |
| 224 | + |
| 225 | +</CodeGroup> |
| 226 | + |
0 commit comments