Address review comments on PR #201 (round 2)

pvlsirotkin · pvlsirotkin · commit f6a297897709 · 2026-06-11T14:41:15.000Z
Following lmolkova's review and trask's towncrier reminder: * Rename per #249 (lmolkova confirmed on PR #201 SIG call discussion): - gen_ai.agent.invocation.duration -> gen_ai.invoke_agent.duration - gen_ai.tool.execution.duration -> gen_ai.execute_tool.duration Metric names now align with the operation name on spans (gen_ai.invoke_agent, gen_ai.execute_tool). * Move CHANGELOG entry to a Towncrier fragment per trask's reminder about #275: changelog.d/201.enhancement.md. * Bump gen_ai.agent.name from 'recommended' back to 'conditionally_required: When available.' (lmolkova: keep consistent with the internal invoke_agent span; entity work in #270 will reshape this later anyway). * Capitalize 'When available' / 'If available' per #245 sentence-case convention on every requirement_level note. * Apply lmolkova's suggested rewrites on metric briefs/notes: - Agent: more concise brief about the invocation start/end. - Tool: drop 'performed by or on behalf of a GenAI agent' since generic apps (not just agents) can execute tools. - Tool note: simplify the requirement statement (drops the explicit 'required vs recommended' framing; semconv is moving away from those labels for metrics per open-telemetry/semantic-conventions#3278). * Add a few more low-cardinality attributes on invoke_agent.duration per lmolkova: gen_ai.agent.id, gen_ai.agent.version, gen_ai.request.model (all conditionally_required When/If available). They mirror what the invoke_agent span carries and will be reshaped once #270 introduces agent entities.
diff --git a/changelog.d/201.enhancement.md b/changelog.d/201.enhancement.md
@@ -0,0 +1 @@
+Add `gen_ai.invoke_agent.duration` metric to track the end-to-end duration of a single agent invocation, and `gen_ai.execute_tool.duration` metric to track the duration of a single tool execution.
diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md
@@ -20,9 +20,9 @@ linkTitle: Metrics
 - [Generative AI workflow metrics](#generative-ai-workflow-metrics)
   - [Metric: `gen_ai.workflow.duration`](#metric-gen_aiworkflowduration)
 - [Generative AI agent metrics](#generative-ai-agent-metrics)
-  - [Metric: `gen_ai.agent.invocation.duration`](#metric-gen_aiagentinvocationduration)
+  - [Metric: `gen_ai.invoke_agent.duration`](#metric-gen_aiinvoke_agentduration)
 - [Generative AI tool metrics](#generative-ai-tool-metrics)
-  - [Metric: `gen_ai.tool.execution.duration`](#metric-gen_aitoolexecutionduration)
+  - [Metric: `gen_ai.execute_tool.duration`](#metric-gen_aiexecute_toolduration)
 
 <!-- tocstop -->
 
@@ -940,22 +940,22 @@ If there is no low-cardinality workflow name available for a given framework, th
 Individual systems may include additional system-specific attributes.
 It is recommended to check system-specific documentation, if available.
 
-### Metric: `gen_ai.agent.invocation.duration`
+### Metric: `gen_ai.invoke_agent.duration`
 
 This metric is [required][MetricRequired] when the instrumented component
 implements agent invocation operations.
 
 This metric SHOULD be specified with [ExplicitBucketBoundaries] of
 [0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8, 25.6, 51.2, 102.4, 204.8, 409.6].
 
-<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.invocation.duration") -->
+<!-- weaver .registry.metrics[] | select(.name == "gen_ai.invoke_agent.duration") -->
 <!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
 <!-- see templates/registry/markdown/snippet.md.j2 -->
 <!-- prettier-ignore-start -->
 
 | Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
 | -------- | --------------- | ----------- | -------------- | --------- | ------ |
-| `gen_ai.agent.invocation.duration` | Histogram | `s` | The end-to-end duration of a single agent invocation, from the moment the agent is invoked to the moment it produces its final response (or terminates with an error). [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
+| `gen_ai.invoke_agent.duration` | Histogram | `s` | The end-to-end duration of a single agent invocation, from the moment the invocation starts until the agent emits the last chunk of its final response or terminates with an error. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
 
 **[1]:** Intended for instrumentations of agent frameworks (for example, ADK,
 LangChain agents, CrewAI agents) that can reliably bound a single
@@ -978,12 +978,18 @@ the metric value SHOULD be the same as the span duration.
 | Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
 | --- | --- | --- | --- | --- | --- |
 | [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
-| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
+| [`gen_ai.agent.id`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` When available. | string | The unique and stable identifier of the GenAI hosted agent resource. [2] | `asst_5j66UpCpwteGg4YSxUnt7lPY`; `arn:aws:bedrock:us-east-1:123:agent/42`; `urn:agent:projects-123:projects:123:locations:us-east1:aiplatform:reasoningEngines:456` |
+| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` When available. | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
+| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` When available. | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
+| [`gen_ai.request.model`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` If available. | string | The name of the GenAI model a request is being made to. | `gpt-4` |
 
 **[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
 the canonical name of exception that occurred, or another low-cardinality error identifier.
 Instrumentations SHOULD document the list of errors they report.
 
+**[2] `gen_ai.agent.id`:** For hosted agents, this SHOULD be the provider-assigned stable identifier of the agent resource such as [AWS Bedrock agent ARN](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_Agent.html) or [GCP Agent Registry identifier](https://docs.cloud.google.com/agent-registry/concepts#agent-identifier).
+It's NOT RECOMMENDED to record in-memory agent instance ids on this attribute due to their transient nature.
+
 ---
 
 `error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
@@ -1001,34 +1007,25 @@ Instrumentations SHOULD document the list of errors they report.
 Individual systems may include additional system-specific attributes.
 It is recommended to check system-specific documentation, if available.
 
-### Metric: `gen_ai.tool.execution.duration`
+### Metric: `gen_ai.execute_tool.duration`
 
 This metric is [recommended][MetricRecommended] for instrumentations that can
-observe tool executions performed by or on behalf of a GenAI agent.
+observe tool executions.
 
 This metric SHOULD be specified with [ExplicitBucketBoundaries] of
 [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92].
 
-<!-- weaver .registry.metrics[] | select(.name == "gen_ai.tool.execution.duration") -->
+<!-- weaver .registry.metrics[] | select(.name == "gen_ai.execute_tool.duration") -->
 <!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
 <!-- see templates/registry/markdown/snippet.md.j2 -->
 <!-- prettier-ignore-start -->
 
 | Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
 | -------- | --------------- | ----------- | -------------- | --------- | ------ |
-| `gen_ai.tool.execution.duration` | Histogram | `s` | The duration of a single tool execution performed by or on behalf of a GenAI agent. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
-
-**[1]:** Intended for instrumentations of agent frameworks (or of application
-code that executes tools on behalf of an agent) that can reliably
-bound a single tool call.
+| `gen_ai.execute_tool.duration` | Histogram | `s` | The duration of a single tool execution. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
 
-Unlike `gen_ai.agent.invocation.duration` (which is required), this
-metric is only recommended because tools may be executed through
-paths that the agent framework does not observe — for example,
-external MCP servers or application-managed dispatch.
-Instrumentations SHOULD record this metric for every tool execution
-they observe but are not required to capture all tool calls across
-the agentic system.
+**[1]:** Instrumentation that can reliably bound a single tool call SHOULD
+record this metric for every tool execution they can observe.
 
 When this metric is reported alongside a `gen_ai.execute_tool` span,
 the metric value SHOULD be the same as the span duration.
@@ -1039,7 +1036,7 @@ the metric value SHOULD be the same as the span duration.
 | --- | --- | --- | --- | --- | --- |
 | [`gen_ai.tool.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Required` | string | Name of the tool utilized by the agent. | `Flights` |
 | [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
-| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
+| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` When available. | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
 | [`gen_ai.tool.type`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | Type of the tool utilized by the agent [2] | `function`; `extension`; `datastore` |
 
 **[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
diff --git a/model/gen-ai/metrics.yaml b/model/gen-ai/metrics.yaml
@@ -116,14 +116,14 @@ metrics:
       - ref: gen_ai.workflow.name
         requirement_level:
           conditionally_required: If available.
-  - name: gen_ai.agent.invocation.duration
+  - name: gen_ai.invoke_agent.duration
     annotations:
       code_generation:
         metric_value_type: double
     brief: >
       The end-to-end duration of a single agent invocation,
-      from the moment the agent is invoked to the moment it produces its final
-      response (or terminates with an error).
+      from the moment the invocation starts until the agent emits
+      the last chunk of its final response or terminates with an error.
     note: |
       Intended for instrumentations of agent frameworks (for example, ADK,
       LangChain agents, CrewAI agents) that can reliably bound a single
@@ -147,26 +147,25 @@ metrics:
       - ref_group: attributes.gen_ai.error
       - ref: gen_ai.agent.name
         requirement_level:
-          conditionally_required: when available
-  - name: gen_ai.tool.execution.duration
+          conditionally_required: When available.
+      - ref: gen_ai.agent.id
+        requirement_level:
+          conditionally_required: When available.
+      - ref: gen_ai.agent.version
+        requirement_level:
+          conditionally_required: When available.
+      - ref: gen_ai.request.model
+        requirement_level:
+          conditionally_required: If available.
+  - name: gen_ai.execute_tool.duration
     annotations:
       code_generation:
         metric_value_type: double
     brief: >
-      The duration of a single tool execution performed by or on behalf of a
-      GenAI agent.
+      The duration of a single tool execution.
     note: |
-      Intended for instrumentations of agent frameworks (or of application
-      code that executes tools on behalf of an agent) that can reliably
-      bound a single tool call.
-
-      Unlike `gen_ai.agent.invocation.duration` (which is required), this
-      metric is only recommended because tools may be executed through
-      paths that the agent framework does not observe — for example,
-      external MCP servers or application-managed dispatch.
-      Instrumentations SHOULD record this metric for every tool execution
-      they observe but are not required to capture all tool calls across
-      the agentic system.
+      Instrumentation that can reliably bound a single tool call SHOULD
+      record this metric for every tool execution they can observe.
 
       When this metric is reported alongside a `gen_ai.execute_tool` span,
       the metric value SHOULD be the same as the span duration.
@@ -181,7 +180,7 @@ metrics:
         requirement_level: recommended
       - ref: gen_ai.agent.name
         requirement_level:
-          conditionally_required: when available
+          conditionally_required: When available.
 
 metric_refinements:
   - id: openai.client.token.usage
diff --git a/schema-snapshot/registry.yaml b/schema-snapshot/registry.yaml

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+Add `gen_ai.invoke_agent.duration` metric to track the end-to-end duration of a single agent invocation, and `gen_ai.execute_tool.duration` metric to track the duration of a single tool execution.