cluster-autoscaler: allow MachinePool scale-down when selected node has no matching Machine

**Which component are you using?**:

cluster-autoscaler


**What version of the component are you using?**:

cluster-autoscaler v1.33.4

Component version:

**What k8s version are you using (`kubectl version`)?**:

Client Version: v1.33.0
Server Version: v1.33.3

**What environment is this in?**:
Cluster API-based environment using MachinePools on Azure.
cloudProvider: clusterapi
Azure AKS v1.33.3

**What did you expect to happen?**:
When cluster-autoscaler selects a node belonging to a Cluster API MachinePool for scale-down, I expect scale-down to still work even if no backing Machine object can be resolved for that node, by falling back to MachinePool replica decrement.

In other words:

if a matching Machine exists, use the normal targeted deletion path
if no matching Machine exists, but the node belongs to a MachinePool, decrease the MachinePool replica count instead of blocking scale-down

**What happened instead?**:

Without the fallback, scale-down was blocked in the case where the selected node belonged to a MachinePool but no corresponding Machine could be found for that node.

After applying a local patch implementing a fallback to replica decrement, scale-down succeeded.

Relevant logs:
```
I... nodegroup <pool-name> has 3 nodes: [<provider-id-1> <provider-id-2> <provider-id-3>]
W... No Machine found for node "<provider-id-2>" in MachinePool "MachinePool/<namespace>/<pool-name>", falling back to replica decrement only
I... Event(...): type: 'Normal' reason: 'ScaleDown' Scale-down: node <node-name> removed with drain
```
This suggests that for some MachinePool-based implementations, requiring a resolvable Machine for the selected node prevents a valid scale-down operation.

**How to reproduce it (as minimally and precisely as possible)**:

1. Deploy cluster-autoscaler with the Cluster API provider enabled.
2. Use a Cluster API MachinePool-backed node group.
3. Ensure the autoscaler can discover the MachinePool and read its /scale subresource successfully.
4. Create a situation where:
    - a node in the MachinePool becomes a valid scale-down candidate
    - but cluster-autoscaler cannot resolve that specific node to a backing Machine object
5. Observe that scale-down is blocked unless a fallback to MachinePool replica decrement is implemented.
In our case, once fallback-to-replica-decrement logic was added, the autoscaler was able to:

  - drain the selected node
  - reduce the MachinePool size
  - complete scale-down successfully

**Anything else we need to know?**:

Additional anonymized observations:

- Management cluster access was working correctly.
- MachinePool/scale GET requests succeeded.
- Node group discovery succeeded.
- The issue was not caused by authentication failures.
- Earlier in the investigation, some scale-down attempts were also legitimately prevented by workload constraints (CPU requests / PodDisruptionBudget), but once a node became removable, the remaining issue was specifically the missing Node -> Machine mapping.

Example logs showing management-cluster access and successful nodegroup discovery:
```
I... discovered node group: MachinePool/<namespace>/<pool-name> (min: 1, max: 3, replicas: 3)
I... GET ... /apis/cluster.x-k8s.io/v1beta1/namespaces/<namespace>/machinepools/<pool-name>/scale
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-autoscaler: allow MachinePool scale-down when selected node has no matching Machine #9681

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

cluster-autoscaler: allow MachinePool scale-down when selected node has no matching Machine #9681

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions