Skip to content

Commit 92cc049

Browse files
committed
Address review comments on PR open-telemetry#202 (lmolkova, Mike, trask)
* Rename metrics: gen_ai.agent.request.size -> gen_ai.agent.input.content.size and gen_ai.agent.response.size -> gen_ai.agent.output.content.size. The new names don't imply a physical HTTP/gRPC request and are explicit that the metric is about content bytes (lmolkova's feedback). * Drop the per-invocation-increment framing. Metric semantics now: byte size of content the agent receives/produces at its entrypoint, whatever the framework sees natively. Addresses lmolkova's point that 'what's new' is framework-dependent and ambiguous, and trask's question about defining in terms of gen_ai.input.messages (which would force frameworks to serialize full chat history). * Spell out the byte-counting algorithm concretely: UTF-8 byte length for text parts, raw byte length for binary parts, framing bytes (JSON keys, role/metadata) not counted. Matches what the ADK reference implementation does. Addresses both Mike's and lmolkova's precision requests. * Bump gen_ai.agent.name from 'conditionally_required: when available' to 'recommended'. Same compromise as PR open-telemetry#201 - stronger than current but doesn't break unnamed-agent frameworks. * Add error.type via attributes.gen_ai.error ref_group (Mike's suggestion); held off on metric_attributes.gen_ai since address/port/provider/model don't add much for an in-process content-size metric. * Drop gen_ai.agent.version from attribute lists (same reasoning as PR open-telemetry#201 - service.version covers it). * Remove cross-reference to gen_ai.agent.invocation.duration since open-telemetry#201 has not landed yet. Will re-add later. * Restructure the docs/gen-ai/gen-ai-metrics.md section to follow the thin MD wrapper + rich YAML note pattern (same as PR open-telemetry#201 revision).
1 parent 03b1c50 commit 92cc049

4 files changed

Lines changed: 296 additions & 215 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,9 @@
3131
([#126](https://github.com/open-telemetry/semantic-conventions-genai/pull/126))
3232
- Add `moonshot_ai` to `gen_ai.provider.name` well-known values.
3333
([#99](https://github.com/open-telemetry/semantic-conventions-genai/pull/99))
34-
- Add `gen_ai.agent.request.size` and `gen_ai.agent.response.size` metrics to
35-
track the byte size of GenAI agent input and output payloads.
34+
- Add `gen_ai.agent.input.content.size` and `gen_ai.agent.output.content.size`
35+
metrics to track the byte size of content the GenAI agent receives and
36+
produces at the agent boundary.
3637
([#202](https://github.com/open-telemetry/semantic-conventions-genai/pull/202))
3738

3839
### 🧰 Bug fixes 🧰

docs/gen-ai/gen-ai-metrics.md

Lines changed: 76 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ linkTitle: Metrics
1919
- [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token)
2020
- [Generative AI workflow metrics](#generative-ai-workflow-metrics)
2121
- [Metric: `gen_ai.workflow.duration`](#metric-gen_aiworkflowduration)
22-
- [Generative AI agent payload size metrics](#generative-ai-agent-payload-size-metrics)
23-
- [Metric: `gen_ai.agent.request.size`](#metric-gen_aiagentrequestsize)
24-
- [Metric: `gen_ai.agent.response.size`](#metric-gen_aiagentresponsesize)
22+
- [Generative AI agent content size metrics](#generative-ai-agent-content-size-metrics)
23+
- [Metric: `gen_ai.agent.input.content.size`](#metric-gen_aiagentinputcontentsize)
24+
- [Metric: `gen_ai.agent.output.content.size`](#metric-gen_aiagentoutputcontentsize)
2525

2626
<!-- tocstop -->
2727

@@ -934,117 +934,122 @@ If there is no low-cardinality workflow name available for a given framework, th
934934
<!-- END AUTOGENERATED TEXT -->
935935
<!-- endweaver -->
936936

937-
## Generative AI agent payload size metrics
937+
## Generative AI agent content size metrics
938938

939939
Individual systems may include additional system-specific attributes.
940940
It is recommended to check system-specific documentation, if available.
941941

942-
`gen_ai.agent.request.size` and `gen_ai.agent.response.size` measure the
943-
serialized byte size of the **per-invocation** input and output of a single
944-
GenAI agent invocation:
945-
946-
* `gen_ai.agent.request.size` is the byte size of the new content provided to
947-
the agent for this invocation (typically the latest user message), not the
948-
cumulative chat history that might be assembled into a model call.
949-
* `gen_ai.agent.response.size` is the byte size of the final response the
950-
agent produces for this invocation, excluding intermediate tool calls or
951-
reasoning steps.
952-
953-
They are intended for instrumentations of agent frameworks (for example,
954-
ADK, LangChain agents, CrewAI agents) that can reliably observe the agent's
955-
input and final output.
956-
957-
These metrics complement `gen_ai.agent.invocation.duration` by giving
958-
operators visibility into payload volume — useful for capacity planning,
959-
spotting unusually large per-turn requests or responses, and correlating
960-
size with latency or error rate.
961-
962-
### Metric: `gen_ai.agent.request.size`
942+
### Metric: `gen_ai.agent.input.content.size`
963943

964944
This metric is [recommended][MetricRecommended] for instrumentations that
965-
can observe the input payload provided to an agent at invocation time.
966-
967-
Instrumentations SHOULD record the size as the byte length of the serialized
968-
request content as the agent receives it. For multi-part content (for example,
969-
text plus inline binary data), the size SHOULD be the sum of the byte lengths
970-
of each part.
945+
can observe the content passed to a GenAI agent at its entrypoint.
971946

972947
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
973948
[1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864].
974949

975-
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.request.size") -->
950+
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.input.content.size") -->
976951
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
977952
<!-- see templates/registry/markdown/snippet.md.j2 -->
978953
<!-- prettier-ignore-start -->
979954

980955
| Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
981956
| -------- | --------------- | ----------- | -------------- | --------- | ------ |
982-
| `gen_ai.agent.request.size` | Histogram | `By` | GenAI agent request size. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
983-
984-
**[1]:** This metric measures the size, in bytes, of the input payload provided to
985-
a GenAI agent at invocation time (for example, the user message that
986-
triggered the agent).
987-
988-
Instrumentations SHOULD compute the size as the byte length of the
989-
serialized request content as the agent receives it. For multi-part
990-
content (for example, text plus inline binary data), the size SHOULD be
991-
the sum of the byte lengths of each part.
992-
993-
This metric is intended for instrumentations of agent frameworks that
994-
can reliably observe an agent's input payload (for example, ADK,
995-
LangChain agents, CrewAI agents).
957+
| `gen_ai.agent.input.content.size` | Histogram | `By` | The byte size of the content the GenAI agent receives at the agent boundary for a single invocation. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
958+
959+
**[1]:** Intended for instrumentations of agent frameworks (for example, ADK,
960+
LangChain agents, CrewAI agents) that can observe the content passed
961+
to the agent at its entrypoint. Useful for capacity planning,
962+
anomaly detection (for example, a user pasting a very large prompt),
963+
and sizing downstream services (token budgets, vector DB inputs,
964+
storage).
965+
966+
Instrumentations SHOULD record the byte size of the content the
967+
agent receives, as observed at the framework's entrypoint. The exact
968+
encoding is framework-defined (for example, a framework that exposes
969+
content as typed parts MAY sum the UTF-8 byte length of text parts
970+
and the raw byte length of binary parts; a framework that handles
971+
content as a serialized payload MAY use the byte length of that
972+
serialization). Instrumentations SHOULD document what they count so
973+
operators can interpret the values correctly within a given
974+
framework.
996975

997976
**Attributes:**
998977

999978
| Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
1000979
| --- | --- | --- | --- | --- | --- |
1001-
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
1002-
| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
980+
| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
981+
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
982+
983+
**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
984+
the canonical name of exception that occurred, or another low-cardinality error identifier.
985+
Instrumentations SHOULD document the list of errors they report.
986+
987+
---
988+
989+
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
990+
991+
| Value | Description | Stability |
992+
| --- | --- | --- |
993+
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
1003994

1004995
<!-- prettier-ignore-end -->
1005996
<!-- END AUTOGENERATED TEXT -->
1006997
<!-- endweaver -->
1007998

1008-
### Metric: `gen_ai.agent.response.size`
999+
### Metric: `gen_ai.agent.output.content.size`
10091000

10101001
This metric is [recommended][MetricRecommended] for instrumentations that
1011-
can observe the final response produced by an agent for a single invocation.
1012-
1013-
Instrumentations SHOULD record the size as the byte length of the serialized
1014-
response content as it leaves the agent. For multi-part content (for example,
1015-
text plus inline binary data), the size SHOULD be the sum of the byte lengths
1016-
of each part.
1002+
can observe the final response produced by a GenAI agent.
10171003

10181004
This metric SHOULD be specified with [ExplicitBucketBoundaries] of
10191005
[1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864].
10201006

1021-
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.response.size") -->
1007+
<!-- weaver .registry.metrics[] | select(.name == "gen_ai.agent.output.content.size") -->
10221008
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
10231009
<!-- see templates/registry/markdown/snippet.md.j2 -->
10241010
<!-- prettier-ignore-start -->
10251011

10261012
| Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
10271013
| -------- | --------------- | ----------- | -------------- | --------- | ------ |
1028-
| `gen_ai.agent.response.size` | Histogram | `By` | GenAI agent response size. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
1029-
1030-
**[1]:** This metric measures the size, in bytes, of the final response payload
1031-
produced by a GenAI agent for a single invocation.
1032-
1033-
Instrumentations SHOULD compute the size as the byte length of the
1034-
serialized response content as it leaves the agent. For multi-part
1035-
content (for example, text plus inline binary data), the size SHOULD be
1036-
the sum of the byte lengths of each part.
1037-
1038-
This metric is intended for instrumentations of agent frameworks that
1039-
can reliably observe an agent's final response (for example, ADK,
1040-
LangChain agents, CrewAI agents).
1014+
| `gen_ai.agent.output.content.size` | Histogram | `By` | The byte size of the content the GenAI agent produces at the agent boundary for a single invocation. [1] | ![Development](https://img.shields.io/badge/-development-blue) | |
1015+
1016+
**[1]:** Intended for instrumentations of agent frameworks (for example, ADK,
1017+
LangChain agents, CrewAI agents) that can observe the agent's final
1018+
output. Useful for capacity planning, spotting unusually large
1019+
responses, and correlating size with latency or error rate.
1020+
1021+
Includes only the agent's final response content. Intermediate
1022+
content produced inside the invocation (tool calls, tool results,
1023+
reasoning steps) SHOULD NOT be counted.
1024+
1025+
Instrumentations SHOULD record the byte size of the content the
1026+
agent produces, as observed at the framework's exit point. The exact
1027+
encoding is framework-defined (for example, a framework that exposes
1028+
content as typed parts MAY sum the UTF-8 byte length of text parts
1029+
and the raw byte length of binary parts; a framework that handles
1030+
content as a serialized payload MAY use the byte length of that
1031+
serialization). Instrumentations SHOULD document what they count so
1032+
operators can interpret the values correctly within a given
1033+
framework.
10411034

10421035
**Attributes:**
10431036

10441037
| Key | Stability | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Value Type | Description | Example Values |
10451038
| --- | --- | --- | --- | --- | --- |
1046-
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
1047-
| [`gen_ai.agent.version`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Conditionally Required` when available | string | The version of the GenAI agent. | `1.0.0`; `2025-05-01` |
1039+
| [`error.type`](https://github.com/open-telemetry/semantic-conventions/blob/v1.41.1/docs/registry/attributes/error.md) | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `Conditionally Required` If the operation ended in an error. | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` |
1040+
| [`gen_ai.agent.name`](/docs/registry/attributes/gen-ai.md) | ![Development](https://img.shields.io/badge/-development-blue) | `Recommended` | string | Human-readable name of the GenAI agent provided by the application. | `Math Tutor`; `Fiction Writer` |
1041+
1042+
**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library,
1043+
the canonical name of exception that occurred, or another low-cardinality error identifier.
1044+
Instrumentations SHOULD document the list of errors they report.
1045+
1046+
---
1047+
1048+
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
1049+
1050+
| Value | Description | Stability |
1051+
| --- | --- | --- |
1052+
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
10481053

10491054
<!-- prettier-ignore-end -->
10501055
<!-- END AUTOGENERATED TEXT -->

model/gen-ai/metrics.yaml

Lines changed: 43 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -116,61 +116,70 @@ metrics:
116116
- ref: gen_ai.workflow.name
117117
requirement_level:
118118
conditionally_required: If available.
119-
- name: gen_ai.agent.request.size
119+
- name: gen_ai.agent.input.content.size
120120
annotations:
121121
code_generation:
122122
metric_value_type: int
123-
brief: 'GenAI agent request size.'
123+
brief: >
124+
The byte size of the content the GenAI agent receives at the agent
125+
boundary for a single invocation.
124126
note: |
125-
This metric measures the size, in bytes, of the input payload provided to
126-
a GenAI agent at invocation time (for example, the user message that
127-
triggered the agent).
128-
129-
Instrumentations SHOULD compute the size as the byte length of the
130-
serialized request content as the agent receives it. For multi-part
131-
content (for example, text plus inline binary data), the size SHOULD be
132-
the sum of the byte lengths of each part.
127+
Intended for instrumentations of agent frameworks (for example, ADK,
128+
LangChain agents, CrewAI agents) that can observe the content passed
129+
to the agent at its entrypoint. Useful for capacity planning,
130+
anomaly detection (for example, a user pasting a very large prompt),
131+
and sizing downstream services (token budgets, vector DB inputs,
132+
storage).
133133
134-
This metric is intended for instrumentations of agent frameworks that
135-
can reliably observe an agent's input payload (for example, ADK,
136-
LangChain agents, CrewAI agents).
134+
Instrumentations SHOULD record the byte size of the content the
135+
agent receives, as observed at the framework's entrypoint. The exact
136+
encoding is framework-defined (for example, a framework that exposes
137+
content as typed parts MAY sum the UTF-8 byte length of text parts
138+
and the raw byte length of binary parts; a framework that handles
139+
content as a serialized payload MAY use the byte length of that
140+
serialization). Instrumentations SHOULD document what they count so
141+
operators can interpret the values correctly within a given
142+
framework.
137143
instrument: histogram
138144
unit: "By"
139145
stability: development
140146
attributes:
147+
- ref_group: attributes.gen_ai.error
141148
- ref: gen_ai.agent.name
142-
requirement_level:
143-
conditionally_required: when available
144-
- ref: gen_ai.agent.version
145-
requirement_level:
146-
conditionally_required: when available
147-
- name: gen_ai.agent.response.size
149+
requirement_level: recommended
150+
- name: gen_ai.agent.output.content.size
148151
annotations:
149152
code_generation:
150153
metric_value_type: int
151-
brief: 'GenAI agent response size.'
154+
brief: >
155+
The byte size of the content the GenAI agent produces at the agent
156+
boundary for a single invocation.
152157
note: |
153-
This metric measures the size, in bytes, of the final response payload
154-
produced by a GenAI agent for a single invocation.
158+
Intended for instrumentations of agent frameworks (for example, ADK,
159+
LangChain agents, CrewAI agents) that can observe the agent's final
160+
output. Useful for capacity planning, spotting unusually large
161+
responses, and correlating size with latency or error rate.
155162
156-
Instrumentations SHOULD compute the size as the byte length of the
157-
serialized response content as it leaves the agent. For multi-part
158-
content (for example, text plus inline binary data), the size SHOULD be
159-
the sum of the byte lengths of each part.
163+
Includes only the agent's final response content. Intermediate
164+
content produced inside the invocation (tool calls, tool results,
165+
reasoning steps) SHOULD NOT be counted.
160166
161-
This metric is intended for instrumentations of agent frameworks that
162-
can reliably observe an agent's final response (for example, ADK,
163-
LangChain agents, CrewAI agents).
167+
Instrumentations SHOULD record the byte size of the content the
168+
agent produces, as observed at the framework's exit point. The exact
169+
encoding is framework-defined (for example, a framework that exposes
170+
content as typed parts MAY sum the UTF-8 byte length of text parts
171+
and the raw byte length of binary parts; a framework that handles
172+
content as a serialized payload MAY use the byte length of that
173+
serialization). Instrumentations SHOULD document what they count so
174+
operators can interpret the values correctly within a given
175+
framework.
164176
instrument: histogram
165177
unit: "By"
166178
stability: development
167179
attributes:
180+
- ref_group: attributes.gen_ai.error
168181
- ref: gen_ai.agent.name
169-
requirement_level:
170-
conditionally_required: when available
171-
- ref: gen_ai.agent.version
172-
requirement_level:
173-
conditionally_required: when available
182+
requirement_level: recommended
174183

175184
metric_refinements:
176185
- id: openai.client.token.usage

0 commit comments

Comments
 (0)