Add experimental GenAI context selection event by caioribeiroclw-pixel · Pull Request #190 · open-telemetry/semantic-conventions-genai

caioribeiroclw-pixel · 2026-05-22T21:06:37Z

Summary

Adds an experimental gen_ai.context.selection.evaluated event for privacy-preserving context selection counts.

This is the smallest shape from the discussion in #181 that can answer the operator question raised there: "did this agent run load too much context before we know which context was decision-relevant?"

The event captures counts only:

candidate context inputs considered
selected inputs
suppressed inputs
delivered-context hash count, when available
low-cardinality selection reason/policy

It intentionally does not define a full relevance/evaluator layer and should not capture raw prompt text, raw context text, tool outputs, memory bodies, or repository excerpts.

Why

Agent harnesses increasingly do retrieval, memory lookup, skills/rules loading, tool search, compaction, and other context selection steps before the model call. Token/cost on the model span can show the final bill, but not whether the harness over-selected context upstream.

A cheap count-only event gives operators an early waste signal without requiring raw content capture or a decision-relevance evaluator.

Validation

make generate-all WEAVER=/home/azureuser/.openclaw/workspace/bin/weaver
make check-policies WEAVER=/home/azureuser/.openclaw/workspace/bin/weaver

Both passed locally with the existing definition/2 stability warnings.

Related discussion: #181

linux-foundation-easycla · 2026-05-22T21:06:44Z

❌ - login: @caioribeiroclw-pixel / name: Caio Ribeiro. The commit (0b43c55, 0e44d6b, 1120e1d, 324291e, 3a08feb, 4e0c746, 68177d3, 9d6f7fd, d72826b, e34a9d2) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new GenAI semantic convention event to report privacy-preserving context selection telemetry (candidate/selected/suppressed and delivered-hash counts), along with the supporting attributes and documentation.

Changes:

Add gen_ai.context.selection.evaluated event to the GenAI events registry.
Introduce new gen_ai.context.selection.* attributes for context selection counts and reasoning.
Regenerate schema snapshot + update registry docs and changelog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
schema-snapshot/registry.yaml	Updates the generated snapshot to include the new event/attributes.
model/gen-ai/registry.yaml	Adds new `gen_ai.context.selection.*` attribute definitions.
model/gen-ai/events.yaml	Defines the new `gen_ai.context.selection.evaluated` event and its attribute refs/requirements.
docs/registry/attributes/gen-ai.md	Documents the new attributes and renumbers footnotes.
docs/gen-ai/gen-ai-events.md	Documents the new event in the events reference page.
CHANGELOG.md	Notes the new event in the Unreleased section.

Comments suppressed due to low confidence (1)

docs/registry/attributes/gen-ai.md:1

In the description column, the footnote marker is appended without punctuation (agent [24]). For consistency with other rows (which generally use . [n]), consider updating this to agent. [24].

<!-- NOTE: THIS FILE IS AUTOGENERATED. DO NOT EDIT BY HAND. -->

+    brief: The implementation-specific reason or policy that selected and suppressed context inputs.
+    note: >
+      The value SHOULD have low cardinality. Examples include `budget`, `relevance`, `dedupe`,
+      `target_agent`, `policy`, and `unknown`.
+    examples: ["budget", "relevance"]



 ## Unreleased

+- Add experimental `gen_ai.context.selection.evaluated` event for privacy-preserving context selection counts.


hippoley · 2026-05-26T02:55:03Z

This is exactly the kind of observability primitive we need for RAG-heavy agent benchmarks.

In our EnterpriseAgentBench setup we run agents on RAG tasks (document QA, knowledge retrieval) and one of the hardest questions to answer post-hoc is: "did the agent retrieve too many candidates and then silently drop the relevant ones, or did it never retrieve them at all?" The candidate.count / selected.count / suppressed.count triad answers that directly without exposing raw content.

A few thoughts from the benchmark instrumentation angle:

gen_ai.context.selection.policy — very useful. In our setup we have named retrieval strategies (BM25, dense, hybrid). Would it make sense to allow multiple values here (e.g., when a hybrid retriever applies both BM25 and dense in sequence), or is the intent to capture the top-level policy name only?
gen_ai.context.selection.delivered_hash.count — smart privacy-preserving design. One question: is this a count of unique hashes, or total hash entries? For deduplication tracking the distinction matters.
The event name gen_ai.context.selection.evaluated reads naturally. One alternative worth considering: gen_ai.retrieval.context.evaluated — this would sit alongside the existing gen_ai.retrieval.client span and make the retrieval subsystem more cohesive. But I can see the argument for keeping it under gen_ai.context.* as a cross-cutting concern.

Overall strongly supportive of this direction. Happy to add a reference scenario for a RAG-style retrieval pipeline if that would help validate the event shape.

caioribeiroclw-pixel · 2026-05-26T16:06:32Z

Thanks — this is a very useful validation point, especially the EnterpriseAgentBench RAG case. I pushed a small update in e34a9d2 to make the two ambiguous bits explicit:

gen_ai.context.selection.policy is now the recommended attribute name. I intended it as the top-level selection policy/strategy, not a list of every internal retrieval stage. For a hybrid pipeline I would use one stable low-cardinality value like hybrid_bm25_dense or rag_hybrid_v2, and keep per-stage details on retrieval spans/events if needed.
gen_ai.context.selection.delivered_hash.count now says it counts unique delivered-context hashes, not total observations. That should make the dedupe signal computable by comparing selected.count with delivered_hash.count.

On the event namespace: I kept gen_ai.context.selection.evaluated for now because the same selection boundary shows up outside classic RAG too: memory records, tool/search results, repository excerpts, skill bodies, and agent handoff context. The retrieval span can still be the producer; this event is meant to describe the cross-cutting boundary where candidates become actual model/agent context.

A reference scenario for a RAG-style pipeline would be great. The concrete benchmark question you gave — “retrieved too many and dropped the relevant one vs never retrieved it” — is exactly the acceptance case I think this event should cover.

Local validation I could run here:

YAML parse for model/gen-ai/events.yaml, model/gen-ai/registry.yaml, schema-snapshot/registry.yaml
git diff --check

I could not run make generate-all locally because this environment does not have Docker/Podman available; I updated the generated docs/schema snapshot consistently and will watch CI for the authoritative check.

caioribeiroclw-pixel · 2026-05-26T17:16:11Z

Follow-up to make the EnterpriseAgentBench/RAG case executable rather than only described in prose:

added an Anthropic reference scenario that emits gen_ai.context.selection.evaluated for a RAG-style boundary (18 candidates → 5 selected / delivered hashes → 13 suppressed) using policy=hybrid_bm25_dense
extended the reference matrix tooling so this event is recognized by EVENT_SPECS
kept the event privacy-safe: only counts + low-cardinality policy, no raw prompt, query, document, tool output, memory body, or repository excerpt

CI is green again on 68177d3 (including the Anthropic reference scenario and generated-docs/status checks). This should give reviewers a concrete fixture for the exact “retrieved candidates vs actually delivered context” benchmark question.

…n-receipt

trask · 2026-05-30T21:37:23Z

hi @caioribeiroclw-pixel, can you please fill out the PR template (https://raw.githubusercontent.com/open-telemetry/semantic-conventions-genai/refs/heads/main/.github/PULL_REQUEST_TEMPLATE.md) and sign the CLA?

otelbot · 2026-06-05T22:25:16Z

Copilot has reviewed this PR. Copilot's suggestions aren't always correct or applicable, so please evaluate each comment on its merits and then handle it in one of these ways:

click GitHub's "Apply suggestion" button (auto-resolves the thread);
reply that it was applied (ideally linking to the commit); or
reply that it was not applied, with the reason; or
reply with a question for reviewers.

Automation flags a PR for human review once every Copilot comment has a reply or is marked as resolved, so keeping these threads up to date helps reviewers know when the PR is ready.

Status across open PRs is visible on the pull request dashboard.

trask · 2026-06-10T19:01:58Z

Hi! As part of #275, this repository switched to Towncrier changelog fragments to reduce merge conflicts in CHANGELOG.md.

Please move this PR's changelog entry out of CHANGELOG.md and into this Towncrier fragment:

Create changelog.d/190.enhancement.md containing the change log entry, e.g.:

Add experimental `gen_ai.context.selection.evaluated` event for privacy-preserving context selection counts.

After adding the fragment, please remove this PR's direct edit to CHANGELOG.md. Towncrier will add the PR link from the fragment filename when release notes are generated.

Thanks!

caioribeiroclw-pixel · 2026-06-13T09:29:08Z

Updated for the Towncrier migration request:

moved the changelog text into changelog.d/190.enhancement.md
removed the direct CHANGELOG.md entry

Local checks:

git diff --check
python3 - <<'PY'
from pathlib import Path
frag = Path('changelog.d/190.enhancement.md')
assert frag.exists()
assert 'gen_ai.context.selection.evaluated' in frag.read_text()
assert '- Add experimental `gen_ai.context.selection.evaluated` event' not in Path('CHANGELOG.md').read_text().split('### 🛑 Breaking changes 🛑')[0]
print('fragment-ok')
PY

Copilot AI review requested due to automatic review settings May 22, 2026 21:06

caioribeiroclw-pixel requested a review from a team as a code owner May 22, 2026 21:06

github-actions Bot mentioned this pull request May 22, 2026

Pull Request Dashboard #102

Closed

Copilot AI reviewed May 22, 2026

View reviewed changes

This was referenced May 22, 2026

[RFC] Semantic Conventions for AI Agent Observability traceloop/openllmetry#3460

Open

SEP-1576: Mitigating Token Bloat in MCP: Reducing Schema Redundancy and Optimizing Tool Selection modelcontextprotocol/modelcontextprotocol#1576

Open

lmolkova added this to GenAI Semantic Conventions and Instrumentation libraries May 26, 2026

lmolkova moved this to In Progress in GenAI Semantic Conventions and Instrumentation libraries May 26, 2026

github-actions Bot mentioned this pull request May 26, 2026

Pull Request Dashboard #196

Closed

caioribeiroclw-pixel added 2 commits May 26, 2026 16:04

Add GenAI context selection event

4e0c746

Clarify context selection policy and hash counts

e34a9d2

caioribeiroclw-pixel force-pushed the caio/context-selection-receipt branch from 4f8429c to e34a9d2 Compare May 26, 2026 16:06

caioribeiroclw-pixel added 6 commits May 26, 2026 16:07

Regenerate context selection schema snapshot

324291e

Sync context selection policy examples

0b43c55

test: add context selection reference scenario

1120e1d

chore: keep generated reference index stable

9d6f7fd

test: cover context selection in reference matrix

d72826b

chore: leave reference index generated

68177d3

Merge remote-tracking branch 'origin/main' into caio/context-selectio…

3a08feb

…n-receipt

github-actions Bot mentioned this pull request May 27, 2026

Pull Request Dashboard #204

Open

trask mentioned this pull request Jun 5, 2026

Add Reviewers column to PR dashboard #251

Merged

chore: move changelog entry to towncrier fragment

0e44d6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental GenAI context selection event#190

Add experimental GenAI context selection event#190
caioribeiroclw-pixel wants to merge 10 commits into
open-telemetry:mainfrom
caioribeiroclw-pixel:caio/context-selection-receipt

caioribeiroclw-pixel commented May 22, 2026

Uh oh!

linux-foundation-easycla Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

hippoley commented May 26, 2026

Uh oh!

caioribeiroclw-pixel commented May 26, 2026

Uh oh!

caioribeiroclw-pixel commented May 26, 2026

Uh oh!

trask commented May 30, 2026

Uh oh!

otelbot Bot commented Jun 5, 2026

Uh oh!

trask commented Jun 10, 2026

Uh oh!

caioribeiroclw-pixel commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		## Unreleased

		- Add experimental `gen_ai.context.selection.evaluated` event for privacy-preserving context selection counts.

Conversation

caioribeiroclw-pixel commented May 22, 2026

Summary

Why

Validation

Uh oh!

linux-foundation-easycla Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

hippoley commented May 26, 2026

Uh oh!

caioribeiroclw-pixel commented May 26, 2026

Uh oh!

caioribeiroclw-pixel commented May 26, 2026

Uh oh!

trask commented May 30, 2026

Uh oh!

otelbot Bot commented Jun 5, 2026

Uh oh!

trask commented Jun 10, 2026

Uh oh!

caioribeiroclw-pixel commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

linux-foundation-easycla Bot commented May 22, 2026 •

edited

Loading