Skip to content

KEP-2837: Add instrumentation section and update GA criteria#6180

Open
ndixita wants to merge 1 commit into
kubernetes:masterfrom
ndixita:plr-ga
Open

KEP-2837: Add instrumentation section and update GA criteria#6180
ndixita wants to merge 1 commit into
kubernetes:masterfrom
ndixita:plr-ga

Conversation

@ndixita

@ndixita ndixita commented Jun 8, 2026

Copy link
Copy Markdown
Contributor
  • One-line PR description: Add instrumentation and update GA criteria

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 8, 2026
@k8s-ci-robot k8s-ci-robot requested review from dchen1107 and mrunalp June 8, 2026 23:31
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 8, 2026
@ndixita

ndixita commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/assign @tallclair

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ndixita
Once this PR has been reviewed and has the lgtm label, please assign mrunalp, soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 9, 2026
#### Admission & API Validation
These metrics track feature adoption, user intent, and validation friction within the control plane.

##### `pod_level_resources_admission_total`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### `pod_level_resources_admission_total`
##### `kubelet_pod_level_resources_admission_total`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This should be the main adoption metric (ALPHA, removed in a future release). I recommend dropping the other adoption related metrics listed below.

Let's make a note here that this metrics is ALPHA, temporary, and removed in 2-3 release.

These metrics track feature adoption, user intent, and validation friction within the control plane.

##### `pod_level_resources_admission_total`
Total number of pods processed during Kubelet admission, categorized by resource configuration strategy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you anticipate using this? I wonder whether kube-state-metrics could meet the use case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to monitor the adoption and if there's any friction in the new validation.


This metric is recorded as a counter.

##### `pod_level_resources_validation_errors_total`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any generic metrics tracking validation errors? I wonder whether we could count validation errors by field path (stripping out specific identifiers like index or key), or if the cardinality would be too high?

/cc @jpbetz @yongruilin

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — we don't have a generic one today; the existing apiserver_validation_* metrics only track declarative-vs-handwritten parity.

I think the catch is cardinality: raw field paths are unbounded. We could strip subscripts (spec.containers[].resources.limits[]) or label by field.Error.Origin (the rule kind) instead.

But it feels more like declarative-validation-framework infra though, not really this KEP — maybe a separate issue under sig/api-machinery? Keeping pod_level_resources_validation_errors_total here for now.

@jpbetz jpbetz Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cardinality is already high just for the individual fields across all versions of all APIs. If then we try to multiply that by any other labels it get's completely out of control (even resources are risky this way... consider the CRD cases). Better for feature owners to identify what is actually needed an add metrics for just those neesd.


This metric is recorded as a counter.

##### `pod_level_resources_defaulting_total`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we don't have any metrics measuring defaulting or validation decisions. I don't know if there's a technical reason for this or if we just haven't had a use case for it, but I'm hesitant to introduce the metrics here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can skip this.

Tracks operation failures during the container lifecycle that are specific to the shared pod-level resource pool.

##### `kubelet_pod_level_oom_kills_total`
Total number of OOM kills triggered specifically because the shared pod-level memory pool was exhausted. This metric is crucial for identifying cases where a container was killed even if it was under its own individual limit, but the pod's aggregate limit was reached.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this be measured / detected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree that this is worth tracking, we could measure by:

  1. fetching memory.events for all containers and pod cgroup using cadvisorStatsPRovider
  2. pod limit ooms = pod ooms - sum(container ooms)
    IMO it is worth tracking as pod-level resources allow resource sharing among the containers. So this metric could help understand if pod-level limit is set correctly or not.

#### Resource State (State Metrics)
These metrics expose the current state of Pod-level resource requests and limits, primarily for consumption by kube-state-metrics and observability dashboards.

##### `kube_pod_level_resources_requests`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think drop the level on these. Either kube_pod_resources_requests or kube_pod_spec_resources_requests would be more consistent with the other metrics.

- `namespace` - Namespace of the pod.
- `uid` - Kubernetes UID of the pod.
- `resource` - The resource type (e.g., `cpu`, `memory`).
- `unit` - The unit of the resource (e.g., `core`, `bytes`).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container level metrics also include a node label.

@k8s-ci-robot k8s-ci-robot requested review from jpbetz and yongruilin June 9, 2026 16:36
Comment on lines +1331 to +1333
- `pod` - Name of the pod.
- `namespace` - Namespace of the pod.
- `uid` - Kubernetes UID of the pod.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This labels strike me as having extremely high cardinality. Or is there a reason we don't need to worry about that here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. See my comment above about kube-state-metrics. I don't think we need these metrics at all.

This section outlines the final list of metrics for the Pod-Level Resources feature, excluding Resource Manager extensions. These metrics are designed to provide deep observability into admission control, scheduling efficiency, and Kubelet-level execution.

#### Admission & API Validation
These metrics track feature adoption, user intent, and validation friction within the control plane.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richabanker how are metrics for feature adoption handled? Do we keep them in alpha and then deprecate them after 2-3 release? This would be my preference. If so let's make notes here about that plan.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s already exposes kubernetes_feature_enabled metric showing whether a feature is enabled / disabled in a component, can that be used to track "feature adoption" in a way if all we care to know about is whether this feature was on/off ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to track how many pods have pod-level resources set. Even after enabling the feature gate, it is required to explicitly set resources at pod-level in the spec to use this functionality. Does that make sense?

@richabanker richabanker Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would never want to deprecate this metric even when the feature graduates to GA right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you track this with kube state metrics does have to be a metric? If we can use kube state metrics that's usually better for this sort of thing

@richabanker richabanker Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh taking a step back, actually adding a permanent metric just to parse a resource's spec to capture presence/absence of a field will be an anti-pattern. We would generally want metrics if they help with determining operational health (health / availability / latency) of features , basically something actionable for cluster admins.

And regarding KSM, agree that if we want to expose metrics about a resource's spec/status, KSM would be the way to go. But since pod.spec.resource is not natively tracked in existing KSM metrics today, you'd have to add a new metric in the KSM repo (similar to kube_pod_level_resources_requests being proposed in this PR) that can parse this new field to convert that to a metric. Currently it only exposes these metrics for pod resource https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/pod-metrics.md

Another way to use KSM for this would be to add a label onto pod metadata with the value you want to track, then configure KSM with --metric-labels-allowlist=pods=[<label>] which will make the existing kube_pod_labels KSM metric to show how many pods have this label set. Also cc @dgrisonnet to confirm if thats the right way to go about feature adoption metrics depending on resource spec.

This metric is recorded as a counter.

#### Resource State (State Metrics)
These metrics expose the current state of Pod-level resource requests and limits, primarily for consumption by kube-state-metrics and observability dashboards.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-state-metrics is informer based, not metric based.

Let's drop the metrics listed here for that purpose. If we need this information, we should open a PR against kube-state-metrics after this feature merges.

Specifically, I recommend dropping:

  • kube_pod_level_resource_requests
  • kube_pod_level_resource_limits

Comment on lines +1331 to +1333
- `pod` - Name of the pod.
- `namespace` - Namespace of the pod.
- `uid` - Kubernetes UID of the pod.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. See my comment above about kube-state-metrics. I don't think we need these metrics at all.

#### The Kubelet (Execution Phase)
Tracks operation failures during the container lifecycle that are specific to the shared pod-level resource pool.

##### `kubelet_pod_level_oom_kills_total`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
##### `kubelet_pod_level_oom_kills_total`
##### `kubelet_pod_oom_kills_total`

? (I don't think we need the feature name here, the fact that it's a pod oom is enough)


This metric is recorded as a counter.

#### The Kubelet (Execution Phase)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend adding a "pod CPU throttling metric". Maybe pod_cpu_cfs_throttled_seconds_total ?

#### Admission & API Validation
These metrics track feature adoption, user intent, and validation friction within the control plane.

##### `pod_level_resources_admission_total`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This should be the main adoption metric (ALPHA, removed in a future release). I recommend dropping the other adoption related metrics listed below.

Let's make a note here that this metrics is ALPHA, temporary, and removed in 2-3 release.

@whtssub whtssub mentioned this pull request Jun 11, 2026
23 tasks
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2026
Signed-off-by: Dixita <ndixita@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants