Skip to content

Commit 1fc6c02

Browse files
author
you
committed
Add gen_ai.agent.invocation.duration and gen_ai.tool.execution.duration metrics
Adds two new GenAI semantic convention metrics for agent and tool latency, modeled on the recently-added gen_ai.workflow.duration metric: * gen_ai.agent.invocation.duration (histogram, seconds): end-to-end duration of a single agent invocation, aligned with the existing gen_ai.invoke_agent.{client,internal} spans. * gen_ai.tool.execution.duration (histogram, seconds): duration of a single tool execution, aligned with the existing gen_ai.execute_tool.internal span. Also adds the gen_ai.tool.version attribute, used as a dimension on gen_ai.tool.execution.duration (mirrors the existing gen_ai.agent.version). NOTE: docs/registry/ and schema-snapshot/ regeneration via 'make generate-all' has NOT been run in this commit (no Docker available in the authoring environment). Run it locally before pushing for review.
1 parent 8508fbf commit 1fc6c02

6 files changed

Lines changed: 525 additions & 16 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@
1919
([#97](https://github.com/open-telemetry/semantic-conventions-genai/pull/97))
2020
- Add `gen_ai.workflow.duration` metric to track duration of a workflow.
2121
([#126](https://github.com/open-telemetry/semantic-conventions-genai/pull/126))
22+
- Add `gen_ai.agent.invocation.duration` metric to track the end-to-end duration
23+
of a single agent invocation, and `gen_ai.tool.execution.duration` metric to
24+
track the duration of a single tool execution. Add the `gen_ai.tool.version`
25+
attribute used as a dimension on the tool execution metric.
26+
([#XXX](https://github.com/open-telemetry/semantic-conventions-genai/pull/XXX))
2227

2328
### 🧰 Bug fixes 🧰
2429

docs/gen-ai/gen-ai-metrics.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ linkTitle: Metrics
1919
- [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)
2020
- [Generative AI workflow metrics](#generative-ai-workflow-metrics)
2121
- [Metric: `gen_ai.workflow.duration`](#metric-gen_aiworkflowduration)
22+
- [Generative AI agent metrics](#generative-ai-agent-metrics)
23+
- [Metric: `gen_ai.agent.invocation.duration`](#metric-gen_aiagentinvocationduration)
24+
- [Generative AI tool metrics](#generative-ai-tool-metrics)
25+
- [Metric: `gen_ai.tool.execution.duration`](#metric-gen_aitoolexecutionduration)
2226

2327
<!-- tocstop -->
2428

@@ -901,6 +905,137 @@ If there is no low-cardinality workflow name available for a given framework, th
901905
<!-- END AUTOGENERATED TEXT -->
902906
<!-- endweaver -->
903907

908+
## Generative AI agent metrics
909+
910+
Individual systems may include additional system-specific attributes.
911+
It is recommended to check system-specific documentation, if available.
912+
913+
`gen_ai.agent.invocation.duration` represents the end-to-end duration of a
914+
single agent invocation, measured from the point where the agent is invoked
915+
to the point where it produces its final response (or terminates with an
916+
error). It is intended for instrumentations of agent frameworks (for example,
917+
ADK, LangChain agents, CrewAI agents) that can reliably bound a single agent
918+
invocation.
919+
920+
If instrumentation can only measure a single provider-facing client operation
921+
(for example, one model API call), `gen_ai.client.operation.duration` SHOULD
922+
be used instead. If instrumentation can reliably bound a higher-level workflow
923+
that coordinates multiple agents, `gen_ai.workflow.duration` SHOULD be used
924+
for that workflow. Instrumentation MAY emit several of these metrics for the
925+
same request path when more than one boundary is available.
926+
927+
### Metric: `gen_ai.agent.invocation.duration`
928+
929+
This metric is [required][MetricRequired] when the instrumented component
930+
implements agent invocation operations.
931+
932+
When this metric is reported alongside a `gen_ai.invoke_agent` span, the
933+
metric value SHOULD be the same as the span duration.
934+
935+
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
936+
[0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92].
937+
938+
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.invocation.duration") -->
939+
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
940+
<!-- see templates/registry/markdown/snippet.md.j2 -->
941+
<!-- prettier-ignore-start -->
942+
943+
| Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
944+
| -------- | --------------- | ----------- | -------------- | --------- | ------ |
945+
| `gen_ai.agent.invocation.duration` | Histogram | `s` | GenAI agent invocation duration. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
946+
947+
**[1]:** This metric measures the end-to-end duration of a single agent invocation, from the moment the agent is invoked to the moment it produces its final response (or terminates with an error).
948+
When this metric is reported alongside a `gen_ai.invoke_agent` span, the metric value SHOULD be the same as the span duration.
949+
950+
**Attributes:**
951+
952+
| Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
953+
| --- | --- | --- | --- | --- | --- |
954+
| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.0/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` if the operation ended in an error | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
955+
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
956+
| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
957+
958+
**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
959+
the canonical name of exception that occurred, or another low-cardinality error identifier.
960+
Instrumentations SHOULD document the list of errors they report.
961+
962+
---
963+
964+
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
965+
966+
| Value | Description | Stability |
967+
| --- | --- | --- |
968+
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
969+
970+
<!-- prettier-ignore-end -->
971+
<!-- END AUTOGENERATED TEXT -->
972+
<!-- endweaver -->
973+
974+
## Generative AI tool metrics
975+
976+
Individual systems may include additional system-specific attributes.
977+
It is recommended to check system-specific documentation, if available.
978+
979+
`gen_ai.tool.execution.duration` represents the duration of a single tool
980+
execution performed by or on behalf of a GenAI agent. It is intended for
981+
instrumentations of agent frameworks (or of application code that executes
982+
tools on behalf of an agent) that can reliably bound a single tool call.
983+
984+
### Metric: `gen_ai.tool.execution.duration`
985+
986+
This metric is [recommended][MetricRecommended] for instrumentations that can
987+
observe tool executions performed by or on behalf of a GenAI agent.
988+
989+
When this metric is reported alongside a `gen_ai.execute_tool` span, the
990+
metric value SHOULD be the same as the span duration.
991+
992+
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
993+
[0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92].
994+
995+
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.tool.execution.duration") -->
996+
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
997+
<!-- see templates/registry/markdown/snippet.md.j2 -->
998+
<!-- prettier-ignore-start -->
999+
1000+
| Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
1001+
| -------- | --------------- | ----------- | -------------- | --------- | ------ |
1002+
| `gen_ai.tool.execution.duration` | Histogram | `s` | GenAI tool execution duration. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
1003+
1004+
**[1]:** This metric measures the duration of a single tool execution performed by or on behalf of a GenAI agent.
1005+
When this metric is reported alongside a `gen_ai.execute_tool` span, the metric value SHOULD be the same as the span duration.
1006+
1007+
**Attributes:**
1008+
1009+
| Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
1010+
| --- | --- | --- | --- | --- | --- |
1011+
| [`gen_ai.tool.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Required` | string | Name of the tool utilized by the agent. | `Flights` |
1012+
| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.0/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` if the operation ended in an error | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
1013+
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
1014+
| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
1015+
| [`gen_ai.tool.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the tool utilized by the agent. [2] | `1.0.0`; `2025-05-01` |
1016+
1017+
**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
1018+
the canonical name of exception that occurred, or another low-cardinality error identifier.
1019+
Instrumentations SHOULD document the list of errors they report.
1020+
1021+
**[2] `gen_ai.tool.version`:** The tool version is usually provided by the application that defines the
1022+
tool. It is typically a static value (for example, a release tag of the
1023+
tool's package) and is expected to have low cardinality.
1024+
1025+
`gen_ai.tool.version` MUST have low cardinality.
1026+
1027+
---
1028+
1029+
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
1030+
1031+
| Value | Description | Stability |
1032+
| --- | --- | --- |
1033+
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1034+
1035+
<!-- prettier-ignore-end -->
1036+
<!-- END AUTOGENERATED TEXT -->
1037+
<!-- endweaver -->
1038+
9041039
[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
9051040
[MetricRequired]: https://github.com/open-telemetry/semantic-conventions/blob/v1.40.0/docs/general/metric-requirement-level.md#required
9061041
[MetricRecommended]: https://github.com/open-telemetry/semantic-conventions/blob/v1.40.0/docs/general/metric-requirement-level.md#recommended

docs/registry/attributes/gen-ai.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -51,12 +51,13 @@
5151
| <a id="gen-ai-tool-description" href="#gen-ai-tool-description">`gen_ai.tool.description`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The tool description. | `Multiply two numbers` |
5252
| <a id="gen-ai-tool-name" href="#gen-ai-tool-name">`gen_ai.tool.name`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Name of the tool utilized by the agent. | `Flights` |
5353
| <a id="gen-ai-tool-type" href="#gen-ai-tool-type">`gen_ai.tool.type`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Type of the tool utilized by the agent [15] | `function`; `extension`; `datastore` |
54-
| <a id="gen-ai-usage-cache-creation-input-tokens" href="#gen-ai-usage-cache-creation-input-tokens">`gen_ai.usage.cache_creation.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of input tokens written to a provider-managed cache. [16] | `25` |
55-
| <a id="gen-ai-usage-cache-read-input-tokens" href="#gen-ai-usage-cache-read-input-tokens">`gen_ai.usage.cache_read.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of input tokens served from a provider-managed cache. [17] | `50` |
56-
| <a id="gen-ai-usage-input-tokens" href="#gen-ai-usage-input-tokens">`gen_ai.usage.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of tokens used in the GenAI input (prompt). [18] | `100` |
54+
| <a id="gen-ai-tool-version" href="#gen-ai-tool-version">`gen_ai.tool.version`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | The version of the tool utilized by the agent. [16] | `1.0.0`; `2025-05-01` |
55+
| <a id="gen-ai-usage-cache-creation-input-tokens" href="#gen-ai-usage-cache-creation-input-tokens">`gen_ai.usage.cache_creation.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of input tokens written to a provider-managed cache. [17] | `25` |
56+
| <a id="gen-ai-usage-cache-read-input-tokens" href="#gen-ai-usage-cache-read-input-tokens">`gen_ai.usage.cache_read.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of input tokens served from a provider-managed cache. [18] | `50` |
57+
| <a id="gen-ai-usage-input-tokens" href="#gen-ai-usage-input-tokens">`gen_ai.usage.input_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of tokens used in the GenAI input (prompt). [19] | `100` |
5758
| <a id="gen-ai-usage-output-tokens" href="#gen-ai-usage-output-tokens">`gen_ai.usage.output_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of tokens used in the GenAI response (completion). | `180` |
58-
| <a id="gen-ai-usage-reasoning-output-tokens" href="#gen-ai-usage-reasoning-output-tokens">`gen_ai.usage.reasoning.output_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of output tokens used for reasoning (e.g. chain-of-thought, extended thinking). [19] | `50` |
59-
| <a id="gen-ai-workflow-name" href="#gen-ai-workflow-name">`gen_ai.workflow.name`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Human-readable name of the GenAI workflow provided by the application. [20] | `multi_agent_rag`; `customer_support_pipeline` |
59+
| <a id="gen-ai-usage-reasoning-output-tokens" href="#gen-ai-usage-reasoning-output-tokens">`gen_ai.usage.reasoning.output_tokens`</a> | ![Development](https://img.shields.io/badge/-development-blue) | int | The number of output tokens used for reasoning (e.g. chain-of-thought, extended thinking). [20] | `50` |
60+
| <a id="gen-ai-workflow-name" href="#gen-ai-workflow-name">`gen_ai.workflow.name`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | Human-readable name of the GenAI workflow provided by the application. [21] | `multi_agent_rag`; `customer_support_pipeline` |
6061

6162

6263
**[1] `gen_ai.data_source.id`:** Data sources are used by AI agents and RAG applications to store grounding data. A data source may be an external database, object store, document collection, website, or any other storage system used by the GenAI agent or application. The `gen_ai.data_source.id` SHOULD match the identifier used by the GenAI system rather than a name specific to the external storage, such as a database or object store. Semantic conventions referencing `gen_ai.data_source.id` MAY also leverage additional attributes, such as `db.*`, to further identify and describe the data source.
@@ -193,18 +194,24 @@ Function: A tool executed on the client-side, where the agent generates paramete
193194
Client-side operations are actions taken on the user's end or within the client application.
194195
Datastore: A tool used by the agent to access and query structured or unstructured external data for retrieval-augmented tasks or knowledge updates.
195196

196-
**[16] `gen_ai.usage.cache_creation.input_tokens`:** The value SHOULD be included in `gen_ai.usage.input_tokens`.
197+
**[16] `gen_ai.tool.version`:** The tool version is usually provided by the application that defines the
198+
tool. It is typically a static value (for example, a release tag of the
199+
tool's package) and is expected to have low cardinality.
197200

198-
**[17] `gen_ai.usage.cache_read.input_tokens`:** The value SHOULD be included in `gen_ai.usage.input_tokens`.
201+
`gen_ai.tool.version` MUST have low cardinality.
199202

200-
**[18] `gen_ai.usage.input_tokens`:** This value SHOULD include all types of input tokens, including cached tokens.
203+
**[17] `gen_ai.usage.cache_creation.input_tokens`:** The value SHOULD be included in `gen_ai.usage.input_tokens`.
204+
205+
**[18] `gen_ai.usage.cache_read.input_tokens`:** The value SHOULD be included in `gen_ai.usage.input_tokens`.
206+
207+
**[19] `gen_ai.usage.input_tokens`:** This value SHOULD include all types of input tokens, including cached tokens.
201208
Instrumentations SHOULD make a best effort to populate this value, using a total
202209
provided by the provider when available or, depending on the provider API,
203210
by summing different token types parsed from the provider output.
204211

205-
**[19] `gen_ai.usage.reasoning.output_tokens`:** The value SHOULD be included in `gen_ai.usage.output_tokens`.
212+
**[20] `gen_ai.usage.reasoning.output_tokens`:** The value SHOULD be included in `gen_ai.usage.output_tokens`.
206213

207-
**[20] `gen_ai.workflow.name`:** This attribute can be populated in different frameworks; for example, as the name of the first chain in LangChain or the name of the crew in CrewAI.
214+
**[21] `gen_ai.workflow.name`:** This attribute can be populated in different frameworks; for example, as the name of the first chain in LangChain or the name of the crew in CrewAI.
208215
The workflow name is usually provided by the application in a way that is specific to the generative AI framework or library that orchestrates the workflow.
209216
It is usually a static name that is expected to be unique within an application.
210217

@@ -252,21 +259,21 @@ If there is no low-cardinality workflow name available for a given framework, th
252259
| `azure.ai.openai` | [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/overview) | ![Development](https://img.shields.io/badge/-development-blue) |
253260
| `cohere` | [Cohere](https://cohere.com/) | ![Development](https://img.shields.io/badge/-development-blue) |
254261
| `deepseek` | [DeepSeek](https://www.deepseek.com/) | ![Development](https://img.shields.io/badge/-development-blue) |
255-
| `gcp.gemini` | [Gemini](https://cloud.google.com/products/gemini) [21] | ![Development](https://img.shields.io/badge/-development-blue) |
256-
| `gcp.gen_ai` | Any Google generative AI endpoint [22] | ![Development](https://img.shields.io/badge/-development-blue) |
257-
| `gcp.vertex_ai` | [Vertex AI](https://cloud.google.com/vertex-ai) [23] | ![Development](https://img.shields.io/badge/-development-blue) |
262+
| `gcp.gemini` | [Gemini](https://cloud.google.com/products/gemini) [22] | ![Development](https://img.shields.io/badge/-development-blue) |
263+
| `gcp.gen_ai` | Any Google generative AI endpoint [23] | ![Development](https://img.shields.io/badge/-development-blue) |
264+
| `gcp.vertex_ai` | [Vertex AI](https://cloud.google.com/vertex-ai) [24] | ![Development](https://img.shields.io/badge/-development-blue) |
258265
| `groq` | [Groq](https://groq.com/) | ![Development](https://img.shields.io/badge/-development-blue) |
259266
| `ibm.watsonx.ai` | [IBM Watsonx AI](https://www.ibm.com/products/watsonx-ai) | ![Development](https://img.shields.io/badge/-development-blue) |
260267
| `mistral_ai` | [Mistral AI](https://mistral.ai/) | ![Development](https://img.shields.io/badge/-development-blue) |
261268
| `openai` | [OpenAI](https://openai.com/) | ![Development](https://img.shields.io/badge/-development-blue) |
262269
| `perplexity` | [Perplexity](https://www.perplexity.ai/) | ![Development](https://img.shields.io/badge/-development-blue) |
263270
| `x_ai` | [xAI](https://x.ai/) | ![Development](https://img.shields.io/badge/-development-blue) |
264271

265-
**[21]:** Used when accessing the 'generativelanguage.googleapis.com' endpoint. Also known as the AI Studio API.
272+
**[22]:** Used when accessing the 'generativelanguage.googleapis.com' endpoint. Also known as the AI Studio API.
266273

267-
**[22]:** May be used when specific backend is unknown.
274+
**[23]:** May be used when specific backend is unknown.
268275

269-
**[23]:** Used when accessing the 'aiplatform.googleapis.com' endpoint.
276+
**[24]:** Used when accessing the 'aiplatform.googleapis.com' endpoint.
270277

271278
---
272279

0 commit comments

Comments
 (0)