Skip to content

Add experimental GenAI context selection event#190

Open
caioribeiroclw-pixel wants to merge 10 commits into
open-telemetry:mainfrom
caioribeiroclw-pixel:caio/context-selection-receipt
Open

Add experimental GenAI context selection event#190
caioribeiroclw-pixel wants to merge 10 commits into
open-telemetry:mainfrom
caioribeiroclw-pixel:caio/context-selection-receipt

Conversation

@caioribeiroclw-pixel

Copy link
Copy Markdown

Summary

Adds an experimental gen_ai.context.selection.evaluated event for privacy-preserving context selection counts.

This is the smallest shape from the discussion in #181 that can answer the operator question raised there: "did this agent run load too much context before we know which context was decision-relevant?"

The event captures counts only:

  • candidate context inputs considered
  • selected inputs
  • suppressed inputs
  • delivered-context hash count, when available
  • low-cardinality selection reason/policy

It intentionally does not define a full relevance/evaluator layer and should not capture raw prompt text, raw context text, tool outputs, memory bodies, or repository excerpts.

Why

Agent harnesses increasingly do retrieval, memory lookup, skills/rules loading, tool search, compaction, and other context selection steps before the model call. Token/cost on the model span can show the final bill, but not whether the harness over-selected context upstream.

A cheap count-only event gives operators an early waste signal without requiring raw content capture or a decision-relevance evaluator.

Validation

  • make generate-all WEAVER=/home/azureuser/.openclaw/workspace/bin/weaver
  • make check-policies WEAVER=/home/azureuser/.openclaw/workspace/bin/weaver

Both passed locally with the existing definition/2 stability warnings.

Related discussion: #181

Copilot AI review requested due to automatic review settings May 22, 2026 21:06
@caioribeiroclw-pixel caioribeiroclw-pixel requested a review from a team as a code owner May 22, 2026 21:06
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 22, 2026

Copy link
Copy Markdown

CLA Not Signed

@github-actions github-actions Bot mentioned this pull request May 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new GenAI semantic convention event to report privacy-preserving context selection telemetry (candidate/selected/suppressed and delivered-hash counts), along with the supporting attributes and documentation.

Changes:

  • Add gen_ai.context.selection.evaluated event to the GenAI events registry.
  • Introduce new gen_ai.context.selection.* attributes for context selection counts and reasoning.
  • Regenerate schema snapshot + update registry docs and changelog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
schema-snapshot/registry.yaml Updates the generated snapshot to include the new event/attributes.
model/gen-ai/registry.yaml Adds new gen_ai.context.selection.* attribute definitions.
model/gen-ai/events.yaml Defines the new gen_ai.context.selection.evaluated event and its attribute refs/requirements.
docs/registry/attributes/gen-ai.md Documents the new attributes and renumbers footnotes.
docs/gen-ai/gen-ai-events.md Documents the new event in the events reference page.
CHANGELOG.md Notes the new event in the Unreleased section.
Comments suppressed due to low confidence (1)

docs/registry/attributes/gen-ai.md:1

  • In the description column, the footnote marker is appended without punctuation (agent [24]). For consistency with other rows (which generally use . [n]), consider updating this to agent. [24].
<!-- NOTE: THIS FILE IS AUTOGENERATED. DO NOT EDIT BY HAND. -->

Comment thread schema-snapshot/registry.yaml
Comment thread model/gen-ai/registry.yaml Outdated
Comment on lines +529 to +533
brief: The implementation-specific reason or policy that selected and suppressed context inputs.
note: >
The value SHOULD have low cardinality. Examples include `budget`, `relevance`, `dedupe`,
`target_agent`, `policy`, and `unknown`.
examples: ["budget", "relevance"]
Comment thread CHANGELOG.md Outdated

## Unreleased

- Add experimental `gen_ai.context.selection.evaluated` event for privacy-preserving context selection counts.
@hippoley

Copy link
Copy Markdown
Contributor

This is exactly the kind of observability primitive we need for RAG-heavy agent benchmarks.

In our EnterpriseAgentBench setup we run agents on RAG tasks (document QA, knowledge retrieval) and one of the hardest questions to answer post-hoc is: "did the agent retrieve too many candidates and then silently drop the relevant ones, or did it never retrieve them at all?" The candidate.count / selected.count / suppressed.count triad answers that directly without exposing raw content.

A few thoughts from the benchmark instrumentation angle:

  1. gen_ai.context.selection.policy — very useful. In our setup we have named retrieval strategies (BM25, dense, hybrid). Would it make sense to allow multiple values here (e.g., when a hybrid retriever applies both BM25 and dense in sequence), or is the intent to capture the top-level policy name only?

  2. gen_ai.context.selection.delivered_hash.count — smart privacy-preserving design. One question: is this a count of unique hashes, or total hash entries? For deduplication tracking the distinction matters.

  3. The event name gen_ai.context.selection.evaluated reads naturally. One alternative worth considering: gen_ai.retrieval.context.evaluated — this would sit alongside the existing gen_ai.retrieval.client span and make the retrieval subsystem more cohesive. But I can see the argument for keeping it under gen_ai.context.* as a cross-cutting concern.

Overall strongly supportive of this direction. Happy to add a reference scenario for a RAG-style retrieval pipeline if that would help validate the event shape.

@caioribeiroclw-pixel caioribeiroclw-pixel force-pushed the caio/context-selection-receipt branch from 4f8429c to e34a9d2 Compare May 26, 2026 16:06
@caioribeiroclw-pixel

Copy link
Copy Markdown
Author

Thanks — this is a very useful validation point, especially the EnterpriseAgentBench RAG case. I pushed a small update in e34a9d2 to make the two ambiguous bits explicit:

  • gen_ai.context.selection.policy is now the recommended attribute name. I intended it as the top-level selection policy/strategy, not a list of every internal retrieval stage. For a hybrid pipeline I would use one stable low-cardinality value like hybrid_bm25_dense or rag_hybrid_v2, and keep per-stage details on retrieval spans/events if needed.
  • gen_ai.context.selection.delivered_hash.count now says it counts unique delivered-context hashes, not total observations. That should make the dedupe signal computable by comparing selected.count with delivered_hash.count.

On the event namespace: I kept gen_ai.context.selection.evaluated for now because the same selection boundary shows up outside classic RAG too: memory records, tool/search results, repository excerpts, skill bodies, and agent handoff context. The retrieval span can still be the producer; this event is meant to describe the cross-cutting boundary where candidates become actual model/agent context.

A reference scenario for a RAG-style pipeline would be great. The concrete benchmark question you gave — “retrieved too many and dropped the relevant one vs never retrieved it” — is exactly the acceptance case I think this event should cover.

Local validation I could run here:

  • YAML parse for model/gen-ai/events.yaml, model/gen-ai/registry.yaml, schema-snapshot/registry.yaml
  • git diff --check

I could not run make generate-all locally because this environment does not have Docker/Podman available; I updated the generated docs/schema snapshot consistently and will watch CI for the authoritative check.

@caioribeiroclw-pixel

Copy link
Copy Markdown
Author

Follow-up to make the EnterpriseAgentBench/RAG case executable rather than only described in prose:

  • added an Anthropic reference scenario that emits gen_ai.context.selection.evaluated for a RAG-style boundary (18 candidates → 5 selected / delivered hashes → 13 suppressed) using policy=hybrid_bm25_dense
  • extended the reference matrix tooling so this event is recognized by EVENT_SPECS
  • kept the event privacy-safe: only counts + low-cardinality policy, no raw prompt, query, document, tool output, memory body, or repository excerpt

CI is green again on 68177d3 (including the Anthropic reference scenario and generated-docs/status checks). This should give reviewers a concrete fixture for the exact “retrieved candidates vs actually delivered context” benchmark question.

@trask

trask commented May 30, 2026

Copy link
Copy Markdown
Member

@otelbot

otelbot Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Copilot has reviewed this PR. Copilot's suggestions aren't always correct or applicable, so please evaluate each comment on its merits and then handle it in one of these ways:

  • click GitHub's "Apply suggestion" button (auto-resolves the thread);
  • reply that it was applied (ideally linking to the commit); or
  • reply that it was not applied, with the reason; or
  • reply with a question for reviewers.

Automation flags a PR for human review once every Copilot comment has a reply or is marked as resolved, so keeping these threads up to date helps reviewers know when the PR is ready.

Status across open PRs is visible on the pull request dashboard.

@trask

trask commented Jun 10, 2026

Copy link
Copy Markdown
Member

Hi! As part of #275, this repository switched to Towncrier changelog fragments to reduce merge conflicts in CHANGELOG.md.

Please move this PR's changelog entry out of CHANGELOG.md and into this Towncrier fragment:

Create changelog.d/190.enhancement.md containing the change log entry, e.g.:

Add experimental `gen_ai.context.selection.evaluated` event for privacy-preserving context selection counts.

After adding the fragment, please remove this PR's direct edit to CHANGELOG.md. Towncrier will add the PR link from the fragment filename when release notes are generated.

Thanks!

@caioribeiroclw-pixel

Copy link
Copy Markdown
Author

Updated for the Towncrier migration request:

  • moved the changelog text into changelog.d/190.enhancement.md
  • removed the direct CHANGELOG.md entry

Local checks:

git diff --check
python3 - <<'PY'
from pathlib import Path
frag = Path('changelog.d/190.enhancement.md')
assert frag.exists()
assert 'gen_ai.context.selection.evaluated' in frag.read_text()
assert '- Add experimental `gen_ai.context.selection.evaluated` event' not in Path('CHANGELOG.md').read_text().split('### 🛑 Breaking changes 🛑')[0]
print('fragment-ok')
PY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants