Address review comments on PR open-telemetry#202 (lmolkova, Mike, trask)

pvlsirotkin · pvlsirotkin · commit 92cc049da95f · 2026-06-10T14:50:36.000Z
* Rename metrics: gen_ai.agent.request.size -> gen_ai.agent.input.content.size and gen_ai.agent.response.size -> gen_ai.agent.output.content.size. The new names don't imply a physical HTTP/gRPC request and are explicit that the metric is about content bytes (lmolkova's feedback). * Drop the per-invocation-increment framing. Metric semantics now: byte size of content the agent receives/produces at its entrypoint, whatever the framework sees natively. Addresses lmolkova's point that 'what's new' is framework-dependent and ambiguous, and trask's question about defining in terms of gen_ai.input.messages (which would force frameworks to serialize full chat history). * Spell out the byte-counting algorithm concretely: UTF-8 byte length for text parts, raw byte length for binary parts, framing bytes (JSON keys, role/metadata) not counted. Matches what the ADK reference implementation does. Addresses both Mike's and lmolkova's precision requests. * Bump gen_ai.agent.name from 'conditionally_required: when available' to 'recommended'. Same compromise as PR open-telemetry#201 - stronger than current but doesn't break unnamed-agent frameworks. * Add error.type via attributes.gen_ai.error ref_group (Mike's suggestion); held off on metric_attributes.gen_ai since address/port/provider/model don't add much for an in-process content-size metric. * Drop gen_ai.agent.version from attribute lists (same reasoning as PR open-telemetry#201 - service.version covers it). * Remove cross-reference to gen_ai.agent.invocation.duration since open-telemetry#201 has not landed yet. Will re-add later. * Restructure the docs/gen-ai/gen-ai-metrics.md section to follow the thin MD wrapper + rich YAML note pattern (same as PR open-telemetry#201 revision).
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -31,8 +31,9 @@
   ([#126](https://github.com/open-telemetry/semantic-conventions-genai/pull/126))
 - Add `moonshot_ai` to `gen_ai.provider.name` well-known values.
   ([#99](https://github.com/open-telemetry/semantic-conventions-genai/pull/99))
-- Add `gen_ai.agent.request.size` and `gen_ai.agent.response.size` metrics to
-  track the byte size of GenAI agent input and output payloads.
+- Add `gen_ai.agent.input.content.size` and `gen_ai.agent.output.content.size`
+  metrics to track the byte size of content the GenAI agent receives and
+  produces at the agent boundary.
   ([#202](https://github.com/open-telemetry/semantic-conventions-genai/pull/202))
 
 ### 🧰 Bug fixes 🧰
diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md
@@ -19,9 +19,9 @@ linkTitle: Metrics
   - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)
 - [Generative AI workflow metrics](#generative-ai-workflow-metrics)
   - [Metric: `gen_ai.workflow.duration`](#metric-gen_aiworkflowduration)
-- [Generative AI agent payload size metrics](#generative-ai-agent-payload-size-metrics)
-  - [Metric: `gen_ai.agent.request.size`](#metric-gen_aiagentrequestsize)
-  - [Metric: `gen_ai.agent.response.size`](#metric-gen_aiagentresponsesize)
+- [Generative AI agent content size metrics](#generative-ai-agent-content-size-metrics)
+  - [Metric: `gen_ai.agent.input.content.size`](#metric-gen_aiagentinputcontentsize)
+  - [Metric: `gen_ai.agent.output.content.size`](#metric-gen_aiagentoutputcontentsize)
 
 <!-- tocstop -->
 
@@ -934,117 +934,122 @@ If there is no low-cardinality workflow name available for a given framework, th
 <!-- END AUTOGENERATED TEXT -->
 <!-- endweaver -->
 
-## Generative AI agent payload size metrics
+## Generative AI agent content size metrics
 
 Individual systems may include additional system-specific attributes.
 It is recommended to check system-specific documentation, if available.
 
-`gen_ai.agent.request.size` and `gen_ai.agent.response.size` measure the
-serialized byte size of the **per-invocation** input and output of a single
-GenAI agent invocation:
-
-* `gen_ai.agent.request.size` is the byte size of the new content provided to
-  the agent for this invocation (typically the latest user message), not the
-  cumulative chat history that might be assembled into a model call.
-* `gen_ai.agent.response.size` is the byte size of the final response the
-  agent produces for this invocation, excluding intermediate tool calls or
-  reasoning steps.
-
-They are intended for instrumentations of agent frameworks (for example,
-ADK, LangChain agents, CrewAI agents) that can reliably observe the agent's
-input and final output.
-
-These metrics complement `gen_ai.agent.invocation.duration` by giving
-operators visibility into payload volume — useful for capacity planning,
-spotting unusually large per-turn requests or responses, and correlating
-size with latency or error rate.
-
-### Metric: `gen_ai.agent.request.size`
+### Metric: `gen_ai.agent.input.content.size`
 
 This metric is [recommended][MetricRecommended] for instrumentations that
-can observe the input payload provided to an agent at invocation time.
-
-Instrumentations SHOULD record the size as the byte length of the serialized
-request content as the agent receives it. For multi-part content (for example,
-text plus inline binary data), the size SHOULD be the sum of the byte lengths
-of each part.
+can observe the content passed to a GenAI agent at its entrypoint.
 
 This metric SHOULD be specified with [ExplicitBucketBoundaries] of
 [1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864].
 
-<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.request.size") -->
+<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.input.content.size") -->
 <!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
 <!-- see templates/registry/markdown/snippet.md.j2 -->
 <!-- prettier-ignore-start -->
 
 | Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
 | -------- | --------------- | ----------- | -------------- | --------- | ------ |
-| `gen_ai.agent.request.size` | Histogram | `By` | GenAI agent request size. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
-
-**[1]:** This metric measures the size, in bytes, of the input payload provided to
-a GenAI agent at invocation time (for example, the user message that
-triggered the agent).
-
-Instrumentations SHOULD compute the size as the byte length of the
-serialized request content as the agent receives it. For multi-part
-content (for example, text plus inline binary data), the size SHOULD be
-the sum of the byte lengths of each part.
-
-This metric is intended for instrumentations of agent frameworks that
-can reliably observe an agent's input payload (for example, ADK,
-LangChain agents, CrewAI agents).
+| `gen_ai.agent.input.content.size` | Histogram | `By` | The byte size of the content the GenAI agent receives at the agent boundary for a single invocation. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
+
+**[1]:** Intended for instrumentations of agent frameworks (for example, ADK,
+LangChain agents, CrewAI agents) that can observe the content passed
+to the agent at its entrypoint. Useful for capacity planning,
+anomaly detection (for example, a user pasting a very large prompt),
+and sizing downstream services (token budgets, vector DB inputs,
+storage).
+
+Instrumentations SHOULD record the byte size of the content the
+agent receives, as observed at the framework's entrypoint. The exact
+encoding is framework-defined (for example, a framework that exposes
+content as typed parts MAY sum the UTF-8 byte length of text parts
+and the raw byte length of binary parts; a framework that handles
+content as a serialized payload MAY use the byte length of that
+serialization). Instrumentations SHOULD document what they count so
+operators can interpret the values correctly within a given
+framework.
 
 **Attributes:**
 
 | Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
 | --- | --- | --- | --- | --- | --- |
-| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
-| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
+| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
+| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
+
+**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
+the canonical name of exception that occurred, or another low-cardinality error identifier.
+Instrumentations SHOULD document the list of errors they report.
+
+---
+
+`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value | Description | Stability |
+| --- | --- | --- |
+| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
 
 <!-- prettier-ignore-end -->
 <!-- END AUTOGENERATED TEXT -->
 <!-- endweaver -->
 
-### Metric: `gen_ai.agent.response.size`
+### Metric: `gen_ai.agent.output.content.size`
 
 This metric is [recommended][MetricRecommended] for instrumentations that
-can observe the final response produced by an agent for a single invocation.
-
-Instrumentations SHOULD record the size as the byte length of the serialized
-response content as it leaves the agent. For multi-part content (for example,
-text plus inline binary data), the size SHOULD be the sum of the byte lengths
-of each part.
+can observe the final response produced by a GenAI agent.
 
 This metric SHOULD be specified with [ExplicitBucketBoundaries] of
 [1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864].
 
-<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.response.size") -->
+<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.output.content.size") -->
 <!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
 <!-- see templates/registry/markdown/snippet.md.j2 -->
 <!-- prettier-ignore-start -->
 
 | Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
 | -------- | --------------- | ----------- | -------------- | --------- | ------ |
-| `gen_ai.agent.response.size` | Histogram | `By` | GenAI agent response size. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
-
-**[1]:** This metric measures the size, in bytes, of the final response payload
-produced by a GenAI agent for a single invocation.
-
-Instrumentations SHOULD compute the size as the byte length of the
-serialized response content as it leaves the agent. For multi-part
-content (for example, text plus inline binary data), the size SHOULD be
-the sum of the byte lengths of each part.
-
-This metric is intended for instrumentations of agent frameworks that
-can reliably observe an agent's final response (for example, ADK,
-LangChain agents, CrewAI agents).
+| `gen_ai.agent.output.content.size` | Histogram | `By` | The byte size of the content the GenAI agent produces at the agent boundary for a single invocation. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
+
+**[1]:** Intended for instrumentations of agent frameworks (for example, ADK,
+LangChain agents, CrewAI agents) that can observe the agent's final
+output. Useful for capacity planning, spotting unusually large
+responses, and correlating size with latency or error rate.
+
+Includes only the agent's final response content. Intermediate
+content produced inside the invocation (tool calls, tool results,
+reasoning steps) SHOULD NOT be counted.
+
+Instrumentations SHOULD record the byte size of the content the
+agent produces, as observed at the framework's exit point. The exact
+encoding is framework-defined (for example, a framework that exposes
+content as typed parts MAY sum the UTF-8 byte length of text parts
+and the raw byte length of binary parts; a framework that handles
+content as a serialized payload MAY use the byte length of that
+serialization). Instrumentations SHOULD document what they count so
+operators can interpret the values correctly within a given
+framework.
 
 **Attributes:**
 
 | Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
 | --- | --- | --- | --- | --- | --- |
-| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
-| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
+| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
+| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
+
+**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
+the canonical name of exception that occurred, or another low-cardinality error identifier.
+Instrumentations SHOULD document the list of errors they report.
+
+---
+
+`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value | Description | Stability |
+| --- | --- | --- |
+| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
 
 <!-- prettier-ignore-end -->
 <!-- END AUTOGENERATED TEXT -->
diff --git a/model/gen-ai/metrics.yaml b/model/gen-ai/metrics.yaml
@@ -116,61 +116,70 @@ metrics:
       - ref: gen_ai.workflow.name
         requirement_level:
           conditionally_required: If available.
-  - name: gen_ai.agent.request.size
+  - name: gen_ai.agent.input.content.size
     annotations:
       code_generation:
         metric_value_type: int
-    brief: 'GenAI agent request size.'
+    brief: >
+      The byte size of the content the GenAI agent receives at the agent
+      boundary for a single invocation.
     note: |
-      This metric measures the size, in bytes, of the input payload provided to
-      a GenAI agent at invocation time (for example, the user message that
-      triggered the agent).
-
-      Instrumentations SHOULD compute the size as the byte length of the
-      serialized request content as the agent receives it. For multi-part
-      content (for example, text plus inline binary data), the size SHOULD be
-      the sum of the byte lengths of each part.
+      Intended for instrumentations of agent frameworks (for example, ADK,
+      LangChain agents, CrewAI agents) that can observe the content passed
+      to the agent at its entrypoint. Useful for capacity planning,
+      anomaly detection (for example, a user pasting a very large prompt),
+      and sizing downstream services (token budgets, vector DB inputs,
+      storage).
 
-      This metric is intended for instrumentations of agent frameworks that
-      can reliably observe an agent's input payload (for example, ADK,
-      LangChain agents, CrewAI agents).
+      Instrumentations SHOULD record the byte size of the content the
+      agent receives, as observed at the framework's entrypoint. The exact
+      encoding is framework-defined (for example, a framework that exposes
+      content as typed parts MAY sum the UTF-8 byte length of text parts
+      and the raw byte length of binary parts; a framework that handles
+      content as a serialized payload MAY use the byte length of that
+      serialization). Instrumentations SHOULD document what they count so
+      operators can interpret the values correctly within a given
+      framework.
     instrument: histogram
     unit: "By"
     stability: development
     attributes:
+      - ref_group: attributes.gen_ai.error
       - ref: gen_ai.agent.name
-        requirement_level:
-          conditionally_required: when available
-      - ref: gen_ai.agent.version
-        requirement_level:
-          conditionally_required: when available
-  - name: gen_ai.agent.response.size
+        requirement_level: recommended
+  - name: gen_ai.agent.output.content.size
     annotations:
       code_generation:
         metric_value_type: int
-    brief: 'GenAI agent response size.'
+    brief: >
+      The byte size of the content the GenAI agent produces at the agent
+      boundary for a single invocation.
     note: |
-      This metric measures the size, in bytes, of the final response payload
-      produced by a GenAI agent for a single invocation.
+      Intended for instrumentations of agent frameworks (for example, ADK,
+      LangChain agents, CrewAI agents) that can observe the agent's final
+      output. Useful for capacity planning, spotting unusually large
+      responses, and correlating size with latency or error rate.
 
-      Instrumentations SHOULD compute the size as the byte length of the
-      serialized response content as it leaves the agent. For multi-part
-      content (for example, text plus inline binary data), the size SHOULD be
-      the sum of the byte lengths of each part.
+      Includes only the agent's final response content. Intermediate
+      content produced inside the invocation (tool calls, tool results,
+      reasoning steps) SHOULD NOT be counted.
 
-      This metric is intended for instrumentations of agent frameworks that
-      can reliably observe an agent's final response (for example, ADK,
-      LangChain agents, CrewAI agents).
+      Instrumentations SHOULD record the byte size of the content the
+      agent produces, as observed at the framework's exit point. The exact
+      encoding is framework-defined (for example, a framework that exposes
+      content as typed parts MAY sum the UTF-8 byte length of text parts
+      and the raw byte length of binary parts; a framework that handles
+      content as a serialized payload MAY use the byte length of that
+      serialization). Instrumentations SHOULD document what they count so
+      operators can interpret the values correctly within a given
+      framework.
     instrument: histogram
     unit: "By"
     stability: development
     attributes:
+      - ref_group: attributes.gen_ai.error
       - ref: gen_ai.agent.name
-        requirement_level:
-          conditionally_required: when available
-      - ref: gen_ai.agent.version
-        requirement_level:
-          conditionally_required: when available
+        requirement_level: recommended
 
 metric_refinements:
   - id: openai.client.token.usage
diff --git a/schema-snapshot/registry.yaml b/schema-snapshot/registry.yaml