Skip to content

-with-ci-artifacts-dra e2e consistently failing since ~2026-06-02 #6352

@mboersma

Description

@mboersma

The pull-cluster-api-provider-azure-conformance-with-ci-artifacts-dra presubmit job has been failing consistently across unrelated PRs since around 2026-06-02.

Which test broke

A single spec fails while the other 98 of 99 conformance specs pass and the cluster itself comes up healthy:

[FAIL] [sig-node] [DRA] [FeatureGate:DRAExtendedResource] [Feature:DynamicResourceAllocation]
  must run pods with extended resource on dra nodes and device plugin nodes
  [KubeletMinVersion:1.35] [Serial]
  [FAILED] start pod1: Timed out after 300.000s.

This is an alpha feature-gated test (KubeletMinVersion:1.35), so the regression is almost certainly upstream (CI k8s build / DRA extended-resource feature), not in CAPZ — no CAPZ change touches this flavor.

When it broke

  • Last green run: 2026-06-01
  • First failure of the regression: 2026-06-02, failing consistently since (intermixed across multiple unrelated PRs).

Failure pattern on TestGrid: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-azure#capz-pr-conformance-k8s-ci-dra-main

Example failed runs (different PRs, same failure)

Since it's failing on unrelated PRs, it shouldn't block merges for changes that don't touch this area, but the job needs attention (likely a temporary skip of the DRAExtendedResource [Serial] test, or tracking the upstream fix).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions