hack: add workflow-stats tool for analyzing CI step durations by nirs · Pull Request #23042 · kubernetes/minikube

nirs · 2026-05-25T18:43:49Z

Analyzes GitHub Actions workflow step durations to help set per-step timeouts based on historical data. Computes min, avg, P50, P90, P95, max, and a suggested timeout (3x P95, rounded to minutes) across completed workflow runs.

Features:

SQLite cache at ~/.cache/workflow-stats///stats.db avoids redundant API calls; only new runs are fetched
Incremental updates using the latest cached run date
Filter by job name (-job), conclusion (-conclusion), branch (-branch)
Output as table (default), markdown, CSV, or JSON

Dependencies:

google/go-github/v85: GitHub Actions API client for fetching workflow runs and job details
modernc.org/sqlite: pure-Go SQLite driver (transpiled from C), chosen over mattn/go-sqlite3 to avoid CGO build dependencies

Example usage

$ go run workflow-stats/workflow_stats.go -workflow "Functional Test"
Fetching runs since 2026-05-24 ... 6 runs (0.7s)

  Step                                                N     Min     Avg     P95     Max  Timeout
  Run Functional Test                               728   2m50s   4m05s   5m21s   9m27s   17m00s
  Build minikube and e2e test binaries              156   1m07s   1m39s   2m09s   2m16s    7m00s
  Set up Rootless Docker (rootless)                  67     42s     48s     56s   1m21s    3m00s
  Update apt-get package index (ubuntu)             373      5s     12s     23s     42s    2m00s
  ...

Related-to #23041
Related-to #23043

k8s-ci-robot · 2026-05-25T18:43:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nirs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [nirs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

medyagh

@nirs this is actually super cool,
some idea (not for this PR ofc )

but it would be cool if we have an automation job that every 14 days gathers this logs and makes a graphic chart our of them and adds them to our site

https://minikube.sigs.k8s.io/docs/benchmarks/

we could potenitally have something like this for example for "Functional Test on Docker"
we export the data that we get into a csv

(how long Run integration Step Took)
Add to the CSV in our hack folder (timestamp, env name,value)
Generate some chart with Google Charts like this
https://minikube.sigs.k8s.io/docs/benchmarks/timetok8s/weekly_benchmark/

and make a PR to add it to our site once a month (update functional test benchmarking)

nirs · 2026-05-29T21:58:36Z

@medyagh Comment addressed:

Removed "Building" section to we don't need to handle ignoring tools
Use go run in all the examples
Unified flag usage (-workflow instead of --workflow)

k8s-ci-robot · 2026-05-29T23:04:13Z

@nirs: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-minikube-docker-crio-linux-x86	`a4f3909`	link	false	`/test pull-minikube-docker-crio-linux-x86`
pull-minikube-kvm-crio-linux-x86	`a4f3909`	link	false	`/test pull-minikube-kvm-crio-linux-x86`
pull-minikube-kvm-containerd-linux-x86	`a4f3909`	link	true	`/test pull-minikube-kvm-containerd-linux-x86`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

nirs · 2026-05-29T23:40:42Z

/retest-required

Copilot

Pull request overview

Adds a new hack/workflow-stats Go tool that fetches completed GitHub Actions runs via the GitHub API, caches jobs/steps in a local SQLite database (modernc.org/sqlite, pure Go — no CGO), and reports per-step duration statistics (min/avg/p50/p90/p95/max) plus a suggested timeout (p95 × multiplier, rounded up to whole minutes, floor 1 min). Output is available as table/markdown/CSV/JSON to support both human review and an upcoming automated timeout-tuning workflow (#23043).

Changes:

New hack/workflow-stats/workflow_stats.go CLI with SQLite-backed incremental fetch and four output formats.
New hack/workflow-stats/README.md documenting usage and a worked example of tuning Functional Test timeouts.
hack/go.mod/hack/go.sum updates pulling in google/go-github/v85, modernc.org/sqlite, and their transitive deps.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File	Description
hack/workflow-stats/workflow_stats.go	Implements the CLI: option parsing, GitHub API fetching, SQLite cache schema and queries, stats computation, and formatted output.
hack/workflow-stats/README.md	User-facing documentation with usage examples and a sample workflow-edit diff.
hack/go.mod	Adds `modernc.org/sqlite` direct dep and several indirect deps.
hack/go.sum	Checksums for new direct/indirect modules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+	// run_id PRIMARY KEY: uniqueness + O(1) lookup by run ID (dbCachedRunIDs).
+	if _, err = db.Exec(`
+		CREATE TABLE IF NOT EXISTS runs (
+			run_id        INTEGER PRIMARY KEY,
+			workflow_name TEXT NOT NULL DEFAULT '',
+			created_at    TEXT NOT NULL DEFAULT ''
+		)`); err != nil {
+		log.Fatalf("Creating runs table: %v", err)
+	}
+
+	// Optimizes dbLatestRunDate (MAX(created_at) per workflow) and
+	// dbRunIDsSince (run IDs for a workflow within a date range).
+	if _, err = db.Exec("CREATE INDEX IF NOT EXISTS idx_runs_workflow_created ON runs (workflow_name, created_at)"); err != nil {
+		log.Fatalf("Creating index: %v", err)
+	}
+
+	// PRIMARY KEY (run_id, job_name, step_number): uniqueness + fast lookup
+	// by run_id prefix for dbCollectDurations (all steps for a set of runs).


+			jobs := fetchJobsForRun(ctx, client, opts.Owner, opts.Repo, r.ID)
+			if jobs == nil {
+				continue
+			}
+			insertRun(db, r.ID, opts.Workflow, r.CreatedAt, jobs)


nirs · 2026-05-30T00:32:28Z

+func updateDB(ctx context.Context, client *github.Client, db *sql.DB, opts options) {
+	fetchSince := latestRunDate(db, opts.Workflow)
+	requestedSince := time.Now().UTC().AddDate(0, 0, -opts.Since)
+	if fetchSince.Before(requestedSince) {
+		fetchSince = requestedSince
+	}
+
+	fmt.Fprintf(os.Stderr, "Fetching runs since %s ...", fetchSince.Format("2006-01-02"))
+	t := time.Now()
+	wfID := findWorkflowID(ctx, client, opts.Owner, opts.Repo, opts.Workflow)
+	runs := fetchRuns(ctx, client, opts.Owner, opts.Repo, wfID, opts.Branch, fetchSince)
+	fmt.Fprintf(os.Stderr, " %d runs (%.1fs)\n", len(runs), time.Since(t).Seconds())


Need to consider the branch flag, I think we can remove it. This tool should be used on the master branch. We can add branch option later if we have a real need.

+		name := s.Name
+		if len(name) > 50 {
+			name = name[:49] + "…"
+		}


Analyzes GitHub Actions workflow step durations to help set per-step timeouts based on historical data. Computes min, avg, P50, P90, P95, max, and a suggested timeout (3x P95, rounded to minutes) across completed workflow runs. Features: - SQLite cache at ~/.cache/workflow-stats/<owner>/<repo>/stats.db avoids redundant API calls; only new runs are fetched - Incremental updates using the latest cached run date - Filter by job name (-job), conclusion (-conclusion), branch (-branch) - Output as table (default), markdown, CSV, or JSON Dependencies: - google/go-github/v85: GitHub Actions API client for fetching workflow runs and job details - modernc.org/sqlite: pure-Go SQLite driver (transpiled from C), chosen over mattn/go-sqlite3 to avoid CGO build dependencies Example: $ go run workflow-stats/workflow_stats.go -workflow "Functional Test" Fetching runs since 2026-05-24 ... 6 runs (0.7s) Step N Min Avg P95 Max Timeout Run Functional Test 728 2m50s 4m05s 5m21s 9m27s 17m00s Build minikube and e2e test binaries 156 1m07s 1m39s 2m09s 2m16s 7m00s Set up Rootless Docker (rootless) 67 42s 48s 56s 1m21s 3m00s Update apt-get package index (ubuntu) 373 5s 12s 23s 42s 2m00s ...

nirs · 2026-06-12T17:55:14Z

+Build minikube and e2e test binaries                  200     1m07s     1m39s     1m38s     2m07s     2m09s     2m21s      7m00s
+Set up Rootless Docker (rootless)                      87       42s       47s       46s       53s       55s     1m01s      3m00s
+Run actions/setup-go@4a3601121dd01d1626a1e23e3721…   1132        7s       20s       21s       26s       29s     1m01s      2m00s
+Update apt-get package index (ubuntu)                 475        5s       11s        8s       20s       23s       48s      2m00s


We need to add job name to the table, so we can see the best timeout for each job/step combination.

When using matrix, we may be able to set the timeout in the matrix, and use:

timeout-minutes: ${{ matrix.timeout-minutes }}

The code updating the timeouts can search timeout-minutes in the matrix when it finds timeout-minutes: ${{ matrix.timeout-minutes }}. This way we can fail fast jobs (ubuntu) quickly and wait longer only for slow builds (macos-15-intel).

nirs requested a review from medyagh May 25, 2026 18:43

nirs added the area/testing label May 25, 2026

k8s-ci-robot requested review from ComradeProgrammer and prezha May 25, 2026 18:43

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 25, 2026

This was referenced May 25, 2026

ci: Ensure workflow timeouts - runs can get stuck for hours #23041

Open

Auto-tune workflow step timeouts based on historical CI performance #23043

Open

medyagh requested changes May 26, 2026

View reviewed changes

Comment thread hack/workflow-stats/workflow_stats.go

nirs force-pushed the workflow-stats branch from e46e3ec to 8bbd61b Compare May 29, 2026 17:07

nirs requested a review from medyagh May 29, 2026 17:08

medyagh reviewed May 29, 2026

View reviewed changes

Comment thread hack/workflow-stats/README.md Outdated

medyagh reviewed May 29, 2026

View reviewed changes

nirs force-pushed the workflow-stats branch from 8bbd61b to a4f3909 Compare May 29, 2026 21:56

nirs requested a review from medyagh May 29, 2026 21:58

medyagh requested a review from Copilot May 30, 2026 00:26

Copilot started reviewing on behalf of medyagh May 30, 2026 00:26 View session

Copilot AI reviewed May 30, 2026

View reviewed changes

nirs marked this pull request as draft May 30, 2026 00:33

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 30, 2026

nirs force-pushed the workflow-stats branch from a4f3909 to 1806d06 Compare June 12, 2026 16:35

nirs commented Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hack: add workflow-stats tool for analyzing CI step durations#23042

hack: add workflow-stats tool for analyzing CI step durations#23042
nirs wants to merge 1 commit into
kubernetes:masterfrom
nirs:workflow-stats

nirs commented May 25, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented May 25, 2026

Uh oh!

Uh oh!

Uh oh!

medyagh left a comment

Uh oh!

nirs commented May 29, 2026

Uh oh!

k8s-ci-robot commented May 29, 2026

Uh oh!

nirs commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

nirs May 30, 2026

Uh oh!

nirs Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nirs commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example usage

Uh oh!

k8s-ci-robot commented May 25, 2026

Uh oh!

Uh oh!

Uh oh!

medyagh left a comment

Choose a reason for hiding this comment

Uh oh!

nirs commented May 29, 2026

Uh oh!

k8s-ci-robot commented May 29, 2026

Uh oh!

nirs commented May 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

nirs May 30, 2026

Choose a reason for hiding this comment

Uh oh!

nirs Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nirs commented May 25, 2026 •

edited

Loading