Skip to content

[BUG] Misconfigured Port for webhook #276

@vitorfloriano

Description

@vitorfloriano

What happened?

When trying to create a NodeReadinessRule after a full install, the server fails to call the webhook because the service port 443 is not found for service nrr-webhook-service.

After fixing the port misconfig on the ValidatingWebhookConfiguration, the server fails to call the webhook because of a mismatch in the certificate.

After further investigation, it seems that the webhook service is sharing the same endpoint as the metrics service, which causes a failure when trying to call the webhook.

Steps to Reproduce

  1. Install the CRD using the instructions in the book (just change the version to v0.3.0):
VERSION=v0.3.0
kubectl apply -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/crds.yaml
kubectl wait --for condition=established --timeout=30s crd/nodereadinessrules.readiness.node.x-k8s.io
  1. Do the full install of the controller using the instructions in the book:
kubectl apply -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/install-full.yaml
  1. Save the example custom resource presented in the book to a file and try to apply it:
$ kubectl apply -f nrr-cr.yaml 
Error from server (InternalError): error when creating "nrr-cr.yaml": Internal error occurred: failed calling webhook "vnodereadinessrule.kb.io": failed to call webhook: Post "https://nrr-webhook-service.nrr-system.svc:443/validate-readiness-node-x-k8s-io-v1alpha1-nodereadinessrule?timeout=10s": no service port 443 found for service "nrr-webhook-service"
  1. Verify that the webhook service is using port 8443, not 443:
$ kubectl get svc -n nrr-system nrr-webhook-service 
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
nrr-webhook-service   ClusterIP   10.96.225.215   <none>        8443/TCP   135m
  1. Edit nrr-validating-webhook-configuration to use port 8443 instead (as a quick fix).
  2. Try to apply the custom resource in the file again:
$ kubectl apply -f nrr-cr.yaml 
Error from server (InternalError): error when creating "nrr-cr.yaml": Internal error occurred: failed calling webhook "vnodereadinessrule.kb.io": failed to call webhook: Post "https://nrr-webhook-service.nrr-system.svc:8443/validate-readiness-node-x-k8s-io-v1alpha1-nodereadinessrule?timeout=10s": tls: failed to verify certificate: x509: certificate is valid for nrr-metrics-service.nrr-system.svc, nrr-metrics-service.nrr-system.svc.cluster.local, not nrr-webhook-service.nrr-system.svc
  1. Verify that both the metrics service and the webhook service are using the same endpoint:
$ kubectl get endpointslices.discovery.k8s.io -n nrr-system 
NAME                        ADDRESSTYPE   PORTS   ENDPOINTS    AGE
nrr-metrics-service-n2bk7   IPv4          8443    10.244.3.6   3h15m
nrr-webhook-service-gmk67   IPv4          8443    10.244.3.6   144m
  1. Verify that the webhook service should be listening to the webhook server on port 9443, not 8443:
$ kubectl describe pods -n nrr-system nrr-controller-manager-99964d6bc-kl298 | grep Port
    Port:          9443/TCP (webhook-server)

Expected Behavior

The controller should validate and create the resource without any issues.

Controller Version / Image Tag

v0.3.0

Kubernetes Version

Client Version: v1.35.0 Kustomize Version: v5.7.1 Server Version: v1.35.0

Controller Logs

folded logs

2026-06-11T23:39:33Z INFO version: unknown
2026-06-11T23:39:33Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "readiness.node.x-k8s.io/v1alpha1, Kind=NodeReadinessRule", "path": "/validate-readiness-node-x-k8s-io-v1alpha1-nodereadinessrule"}
2026-06-11T23:39:33Z INFO controller-runtime.webhook Registering webhook {"path": "/validate-readiness-node-x-k8s-io-v1alpha1-nodereadinessrule"}
2026-06-11T23:39:33Z INFO setup webhook enabled
2026-06-11T23:39:33Z INFO setup starting manager
2026-06-11T23:39:33Z INFO controller-runtime.metrics Starting metrics server
2026-06-11T23:39:33Z INFO starting server {"name": "health probe", "addr": "[::]:8081"}
2026-06-11T23:39:33Z INFO controller-runtime.webhook Starting webhook server
2026-06-11T23:39:33Z INFO controller-runtime.certwatcher Updated current TLS certificate {"cert": "/tmp/k8s-metrics-server/metrics-certs/tls.crt", "key": "/tmp/k8s-metrics-server/metrics-certs/tls.key"}
2026-06-11T23:39:33Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8443", "secure": true}
2026-06-11T23:39:33Z INFO controller-runtime.certwatcher Updated current TLS certificate {"cert": "/tmp/k8s-webhook-server/serving-certs/tls.crt", "key": "/tmp/k8s-webhook-server/serving-certs/tls.key"}
2026-06-11T23:39:33Z INFO controller-runtime.webhook Serving webhook server {"host": "", "port": 9443}
2026-06-11T23:39:33Z INFO controller-runtime.certwatcher Starting certificate poll+watcher {"cert": "/tmp/k8s-metrics-server/metrics-certs/tls.crt", "key": "/tmp/k8s-metrics-server/metrics-certs/tls.key", "interval": "10s"}
2026-06-11T23:39:33Z INFO controller-runtime.certwatcher Starting certificate poll+watcher {"cert": "/tmp/k8s-webhook-server/serving-certs/tls.crt", "key": "/tmp/k8s-webhook-server/serving-certs/tls.key", "interval": "10s"}
I0611 23:39:33.857687 1 leaderelection.go:257] attempting to acquire leader lease nrr-system/ba65f13e.readiness.node.x-k8s.io...
I0611 23:39:50.640748 1 leaderelection.go:271] successfully acquired lease nrr-system/ba65f13e.readiness.node.x-k8s.io
2026-06-11T23:39:50Z DEBUG events nrr-controller-manager-99964d6bc-kl298_b0f177bd-876f-459e-8329-2a1b927def97 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"nrr-system","name":"ba65f13e.readiness.node.x-k8s.io","uid":"123ad03a-e750-4834-9c59-70cea85b5513","apiVersion":"coordination.k8s.io/v1","resourceVersion":"8706"}, "reason": "LeaderElection"}
2026-06-11T23:39:50Z INFO Starting EventSource {"controller": "nodereadiness-controller", "controllerGroup": "readiness.node.x-k8s.io", "controllerKind": "NodeReadinessRule", "source": "kind source: *v1alpha1.NodeReadinessRule"}
2026-06-11T23:39:50Z INFO Starting EventSource {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "source": "kind source: *v1.Node"}
2026-06-11T23:39:50Z INFO Starting Controller {"controller": "nodereadiness-controller", "controllerGroup": "readiness.node.x-k8s.io", "controllerKind": "NodeReadinessRule"}
2026-06-11T23:39:50Z INFO Starting workers {"controller": "nodereadiness-controller", "controllerGroup": "readiness.node.x-k8s.io", "controllerKind": "NodeReadinessRule", "worker count": 1}
2026-06-11T23:39:50Z INFO Starting Controller {"controller": "node", "controllerGroup": "", "controllerKind": "Node"}
2026-06-11T23:39:50Z INFO Starting workers {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "worker count": 1}
2026-06-11T23:39:50Z INFO Reconciling node {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-control-plane"}, "namespace": "", "name": "multinode-control-plane", "reconcileID": "1a388efd-bf7d-4533-a207-2ca98f7e4259", "node": "multinode-control-plane"}
2026-06-11T23:39:50Z INFO Processing node against rules {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-control-plane"}, "namespace": "", "name": "multinode-control-plane", "reconcileID": "1a388efd-bf7d-4533-a207-2ca98f7e4259", "node": "multinode-control-plane", "ruleCount": 0}
2026-06-11T23:39:50Z INFO Reconciling node {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker"}, "namespace": "", "name": "multinode-worker", "reconcileID": "276249dc-70fc-48a9-9279-24f5e4126cf5", "node": "multinode-worker"}
2026-06-11T23:39:50Z INFO Processing node against rules {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker"}, "namespace": "", "name": "multinode-worker", "reconcileID": "276249dc-70fc-48a9-9279-24f5e4126cf5", "node": "multinode-worker", "ruleCount": 0}
2026-06-11T23:39:50Z INFO Reconciling node {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker2"}, "namespace": "", "name": "multinode-worker2", "reconcileID": "7b320adf-8420-405a-bd51-73535d99baed", "node": "multinode-worker2"}
2026-06-11T23:39:50Z INFO Processing node against rules {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker2"}, "namespace": "", "name": "multinode-worker2", "reconcileID": "7b320adf-8420-405a-bd51-73535d99baed", "node": "multinode-worker2", "ruleCount": 0}
2026-06-11T23:39:50Z INFO Reconciling node {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker3"}, "namespace": "", "name": "multinode-worker3", "reconcileID": "56996778-5e98-40ee-9b0a-029f7175f99e", "node": "multinode-worker3"}
2026-06-11T23:39:50Z INFO Processing node against rules {"controller": "node", "controllerGroup": "", "controllerKind": "Node", "Node": {"name":"multinode-worker3"}, "namespace": "", "name": "multinode-worker3", "reconcileID": "56996778-5e98-40ee-9b0a-029f7175f99e", "node": "multinode-worker3", "ruleCount": 0}
2026/06/12 01:34:31 http: TLS handshake error from 172.18.0.3:44250: remote error: tls: bad certificate
2026/06/12 01:36:44 http: TLS handshake error from 172.18.0.3:57437: remote error: tls: bad certificate
2026/06/12 02:02:10 http: TLS handshake error from 172.18.0.3:38203: remote error: tls: bad certificate

Additional Environment Details

cert-manager-controller image: "quay.io/jetstack/cert-manager-controller:v1.20.2"

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions