best-effort-atomic-scale-up ProvisioningRequest wrongly reports CapacityIsNotFound when capacity is really present

**Which component are you using?**:

/area cluster-autoscaler

**What version of the component are you using?**:

 v1.35.3-gke.2190000, and also in the latest (commit 18bcb5e03300469c9b0639638dc19a3bb3f44cc2)

Component version:

**What k8s version are you using (`kubectl version`)?**:

<details><summary><code>kubectl version</code> Output</summary><br><pre>
$ kubectl version
Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.35.3-gke.2190000
</pre></details>

**What environment is this in?**:

* GKE, Standard cluster
* an autoscaling GPU node pool with 0 min nodes and 8 max nodes,  each node with 8 GPUs
* Kueue configured with TAS
* a stream of mixed 8-GPU and 1-GPU jobs, managed by Kueue
* the stream of jobs variates in time so the node pool scales up and down, so 
* AdmissionCheck and ProvisioningRequest set to the class best-effort-atomic-scale-up.autoscaling.x-k8s.io - to make TAS work correctly with autoscaling
* Currently the cluster has 7 nodes ready, with 8 GPUs in use and 48 GPUs free
* Arrives a batch of 256 1-GPU jobs.

**What did you expect to happen?**:

1. Kueue reserves 48 Workloads in cluster-queue and admits 48 jobs
2. In a few minutes 48 new jobs are running, no GPUs remain unused

**What happened instead?**:

1. After 3 minutes only 26 jobs are running, 22 GPUs remain unused
2. Kueue reserved 48 Workloads in cluster-queue and created 48 ProvisioningRequests (all Accepted True).
3. 26 ProvisioningRequests have Provisioned True, while 22 have it False with reason: CapacityIsNotFound, message: Capacity is not found, CA will try to find it later.
7. After 14 minutes: 38 PR Provisioned True, 10 Provisioned Free
8. After 22 minutes: 43 PR Provisioned True, 5 Provisioned Free
9. After 35 minutes: all 48 PR Provisioned True, all 48 jobs are running, no GPUs remain unused

**How to reproduce it (as minimally and precisely as possible)**:

1. Node with room for 2 CPUs.
2. Create a best-effort-atomic PR for one 1-CPU pod; let it reach Provisioned=True.
3. Create a pod for this PR, with 1-CPU
4. Create another best-effort-atomic PR for one 1-CPU pod
5. The new PR reports no capacity though one CPU is idle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best-effort-atomic-scale-up ProvisioningRequest wrongly reports CapacityIsNotFound when capacity is really present #9805

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

best-effort-atomic-scale-up ProvisioningRequest wrongly reports CapacityIsNotFound when capacity is really present #9805

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions