What happened
Running TestGatewayAPIConformance more than once in a single go test process (e.g. -count=2) fails immediately on the second iteration:
gateways.gateway.networking.k8s.io "..." is forbidden: unable to create new content in namespace gateway-conformance-infra because it is being terminated
Why
The base-resource cleanup in Applier.MustApplyWithCleanup (conformance/utils/kubernetes/apply.go) deletes each created object — including the gateway-conformance-infra Namespace — with a plain c.Delete and does not wait for deletion to finish:
t.Cleanup(func() {
ctx, cancel = context.WithTimeout(context.Background(), timeoutConfig.DeleteTimeout)
defer cancel()
err = c.Delete(ctx, uObj) // returns once deletion is requested, not complete
...
})
Delete on a Namespace only moves it to Terminating; finalizers drain asynchronously. The cleanup returns while the namespace is still terminating, and the next iteration's setup re-applies the base manifests into it before it is gone.
Why it matters
This blocks running any conformance subtest under -count>1 in a single process — e.g. stress-looping a flaky subtest to characterize it, which is exactly how the RequestMirror timing flake (#4940) is best reproduced. Today each run needs a fresh go test invocation.
Expected
A re-run in the same process should succeed: either the cleanup waits for the namespace to be fully deleted, or setup waits out a Terminating namespace before applying.
Proposal
Either have the namespace cleanup poll until the namespace is actually gone (a WaitForDeletion after Delete), or have setup tolerate a Terminating namespace by waiting for it to clear before applying base resources.
Willing to contribute
Happy to send a PR once the preferred direction is clear.
A quick process question, since I've been turning up a number of these conformance-machinery issues lately: at this volume, what workflow do you prefer? File an issue and wait for a maintainer go-ahead before each PR, open the issue and PR together, or just send PRs directly for small, self-evident fixes? Happy to follow whatever keeps your review queue manageable.
/kind bug
/area conformance-machinery
What happened
Running
TestGatewayAPIConformancemore than once in a singlego testprocess (e.g.-count=2) fails immediately on the second iteration:Why
The base-resource cleanup in
Applier.MustApplyWithCleanup(conformance/utils/kubernetes/apply.go) deletes each created object — including thegateway-conformance-infraNamespace — with a plainc.Deleteand does not wait for deletion to finish:Deleteon a Namespace only moves it toTerminating; finalizers drain asynchronously. The cleanup returns while the namespace is still terminating, and the next iteration's setup re-applies the base manifests into it before it is gone.Why it matters
This blocks running any conformance subtest under
-count>1in a single process — e.g. stress-looping a flaky subtest to characterize it, which is exactly how the RequestMirror timing flake (#4940) is best reproduced. Today each run needs a freshgo testinvocation.Expected
A re-run in the same process should succeed: either the cleanup waits for the namespace to be fully deleted, or setup waits out a
Terminatingnamespace before applying.Proposal
Either have the namespace cleanup poll until the namespace is actually gone (a
WaitForDeletionafterDelete), or have setup tolerate aTerminatingnamespace by waiting for it to clear before applying base resources.Willing to contribute
Happy to send a PR once the preferred direction is clear.
A quick process question, since I've been turning up a number of these conformance-machinery issues lately: at this volume, what workflow do you prefer? File an issue and wait for a maintainer go-ahead before each PR, open the issue and PR together, or just send PRs directly for small, self-evident fixes? Happy to follow whatever keeps your review queue manageable.
/kind bug
/area conformance-machinery