|
| 1 | +OpenTelemetry OpenAI Agents Instrumentation |
| 2 | +========================================== |
| 3 | + |
| 4 | +This package instruments OpenAI agent frameworks ("openai-agents") and forwards spans |
| 5 | +into OpenTelemetry using the draft GenAI semantic conventions proposed upstream in |
| 6 | +OpenTelemetry Semantic Conventions PR 2528_ (subject to change until stabilized). |
| 7 | + |
| 8 | +Status: Experimental / Alpha |
| 9 | + |
| 10 | +What You Get |
| 11 | +------------ |
| 12 | +This instrumentation wires the OpenAI Agents (``openai-agents``) tracing hook into |
| 13 | +OpenTelemetry and maps agent / tool / response span data to the draft GenAI |
| 14 | +semantic conventions shipped alongside this package. It currently provides: |
| 15 | + |
| 16 | +* Span kinds & names for ``create_agent``, ``invoke_agent``, ``execute_tool``, ``chat`` (responses), ``embeddings`` (when available) |
| 17 | +* Core request attributes: model, max_tokens, temperature, top_p, top_k, penalties, stop sequences, seed, encoding formats |
| 18 | +* Choice count (``gen_ai.request.choice.count``) when ``n``/``choice_count`` > 1 |
| 19 | +* Output type (``gen_ai.output.type``) from ``response_format.type`` |
| 20 | +* Response attributes: id, model, aggregated finish reasons |
| 21 | +* Token usage (input/prompt + output/completion) from both span data or response object |
| 22 | +* OpenAI specific: request / response service tier, system fingerprint |
| 23 | +* Server endpoint host/port extraction (``server.address`` / ``server.port``) |
| 24 | +* Conversation / thread id (``gen_ai.conversation.id``) |
| 25 | +* Agent metadata: id, name, description |
| 26 | +* Tool metadata: name, id, type, description; tool call arguments & results (opt‑in) |
| 27 | +* Tool definitions (opt‑in) & orchestrator agent definitions |
| 28 | +* Data source id for retrieval / RAG scenarios |
| 29 | +* Optional input / output message capture with truncation |
| 30 | + - Messages are serialized to JSON strings (instead of raw lists of dicts) to comply with OpenTelemetry attribute type requirements and avoid exporter warnings. |
| 31 | +* Events: user / assistant / tool messages, tool call + tool result, per-choice events |
| 32 | +* Metrics (duration + token usage histograms) behind an env toggle |
| 33 | +* Size guarding via JSON serialization + max length truncation (default 20 KB) |
| 34 | + |
| 35 | +Usage |
| 36 | +----- |
| 37 | +.. code:: python |
| 38 | +
|
| 39 | + from openai import OpenAI |
| 40 | + from opentelemetry.instrumentation.openai_agents import OpenAIAgentsInstrumentor |
| 41 | +
|
| 42 | + OpenAIAgentsInstrumentor().instrument() |
| 43 | + client = OpenAI() |
| 44 | + # run your agent framework (openai-agents) code |
| 45 | +
|
| 46 | +Configuration (Environment Variables) |
| 47 | +------------------------------------- |
| 48 | +Set these before importing / enabling instrumentation: |
| 49 | + |
| 50 | +* ``OTEL_INSTRUMENTATION_OPENAI_AGENTS_CAPTURE_CONTENT`` (default ``false``) |
| 51 | + - When ``true`` records ``gen_ai.input.messages`` + ``gen_ai.output.messages`` and emits message events. |
| 52 | +* ``OTEL_INSTRUMENTATION_OPENAI_AGENTS_CAPTURE_TOOL_DEFINITIONS`` (default ``false``) |
| 53 | + - When ``true`` records ``gen_ai.tool.definitions`` (can be large & sensitive). |
| 54 | +* ``OTEL_INSTRUMENTATION_OPENAI_AGENTS_CAPTURE_TOOL_IO`` (default ``false``) |
| 55 | + - When ``true`` records tool call arguments / results & emits assistant/tool message events. |
| 56 | +* ``OTEL_INSTRUMENTATION_OPENAI_AGENTS_CAPTURE_METRICS`` (default ``true``) |
| 57 | + - Set to ``false`` to disable ``gen_ai.operation.duration`` and ``gen_ai.token.usage`` histograms. |
| 58 | +* ``OTEL_INSTRUMENTATION_OPENAI_AGENTS_MAX_VALUE_LENGTH`` (default ``20480``) |
| 59 | + - Max serialized length for large attribute values (tool defs, messages, arguments, results). Values exceeding are truncated with ``...``. |
| 60 | + |
| 61 | +Security & PII Note: Content, tool IO, and tool definitions may contain sensitive data. Leave toggles off in production unless required. |
| 62 | + |
| 63 | +Metrics |
| 64 | +------- |
| 65 | +When metrics are enabled the following histograms are recorded: |
| 66 | + |
| 67 | +* ``gen_ai.operation.duration`` (seconds) - per span operation duration. |
| 68 | +* ``gen_ai.token.usage`` (tokens) - one data point per input/output token count with attribute ``gen_ai.token.type`` = ``input`` or ``output``. |
| 69 | + |
| 70 | +Both include low‑cardinality attributes: ``gen_ai.provider.name``, ``gen_ai.operation.name``, and ``gen_ai.request.model`` (if available). Errors add ``error.type``. |
| 71 | + |
| 72 | +Exporting to Azure Application Insights |
| 73 | +--------------------------------------- |
| 74 | +Use the OpenTelemetry Azure Monitor exporter in your app: |
| 75 | + |
| 76 | +.. code:: python |
| 77 | +
|
| 78 | + from azure.monitor.opentelemetry import configure_azure_monitor |
| 79 | +
|
| 80 | + configure_azure_monitor( |
| 81 | + connection_string="InstrumentationKey=...;IngestionEndpoint=..." |
| 82 | + ) |
| 83 | +
|
| 84 | + from opentelemetry.instrumentation.openai_agents import OpenAIAgentsInstrumentor |
| 85 | + OpenAIAgentsInstrumentor().instrument() |
| 86 | +
|
| 87 | + # start your agent workflow |
| 88 | +
|
| 89 | +The exporter will ship the agent / tool / model invocation spans with attributes |
| 90 | +outlined in the upstream semantic conventions (see PR 2528_) (create_agent, invoke_agent, |
| 91 | +execute_tool, etc.). |
| 92 | + |
| 93 | +Truncation Strategy |
| 94 | +------------------- |
| 95 | +Large structured values are JSON serialized and truncated to the configured byte length (default 20480). This affects: tool definitions, tool arguments/results, input/output messages, orchestrator agent definitions. Truncation preserves valid UTF‑8 boundaries by operating on Python strings (code points) before final attribute assignment. |
| 96 | + |
| 97 | +Limitations & Future Work |
| 98 | +------------------------- |
| 99 | +* No explicit instrumentation for streaming chunk events yet (would require upstream hooks) — intentionally deferred. |
| 100 | +* Draft semantic conventions may evolve; attribute names could change before stabilization. When the schema version updates, re‑validate against the upstream semantic conventions (see PR 2528_). |
| 101 | +* Extended test coverage is growing; please add cases when introducing new span data types or attributes (see existing tests for patterns). |
| 102 | + |
| 103 | +Removed Previous Limitations |
| 104 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 105 | +The following earlier limitations have been addressed: |
| 106 | + |
| 107 | +* Deactivation: ``uninstrument()`` now calls processor ``shutdown()`` which prevents new spans and ends any open ones. |
| 108 | +* Operation naming: ``create_agent`` vs ``invoke_agent`` now prefers an explicit upstream flag (``is_creation``) and falls back to a refined heuristic (no parent + description implies creation). |
| 109 | +* Additional span types (embeddings, transcription, speech, guardrail, handoff) are recognized with stable operation names. |
| 110 | +* Metrics, truncation, tool IO/content gating all implemented and configurable. |
| 111 | +* Time handling uses timezone-aware UTC timestamps; no deprecated ``datetime.utcnow`` usage. |
| 112 | + |
| 113 | +Contributing |
| 114 | +------------ |
| 115 | +Please add tests when extending attribute coverage or events. Keep attribute cardinality low for metrics. Avoid capturing content unless explicitly gated by an env var. |
| 116 | + |
| 117 | +License |
| 118 | +------- |
| 119 | +Apache 2.0 |
| 120 | + |
| 121 | +.. _2528: https://github.com/open-telemetry/semantic-conventions/pull/2528 |
0 commit comments