Skip to content

HTTPRouteRequestPercentageMirror: the 20% subcase tolerance is only ~1.7σ wide, causing ~8–9% per-attempt distribution-check flake by sampling alone #4933

@lexfrei

Description

@lexfrei

What happened:

Running the conformance suite (v1.5.1, GATEWAY-HTTP profile) against cloudflare-tunnel-gateway-controller — a Gateway API implementation for Cloudflare Tunnel, where each request and its mirror copy both traverse the Cloudflare edge — the HTTPRouteRequestPercentageMirror test intermittently logs a failed distribution check before passing within its retry budget. The failing subcase is consistently /percent-mirror (the 20% mirror); I have observed mirrored counts as low as 78 and as high as 116 against the [85, 115] band. The suite ultimately passes because numDistributionChecks = 5 returns on the first success, so this is log noise rather than a hard failure — but it is caused by the test's tolerance being statistically too tight at the lowest mirror percentage, independent of the implementation under test.

What you expected to happen:

An implementation that mirrors at exactly the configured percentage should pass the distribution check reliably (well under ~0.1% per-attempt flake), the way the 35% and 50% subcases already do.

How to reproduce it (as minimally and precisely as possible):

This reproduces by analysis, independent of any implementation, because the check is a binomial sampling test.

The test sends totalRequests = 500 requests and compares the mirrored count to a relative ±tolerancePercentage = 15% band around the expected count (conformance/tests/httproute-request-percentage-mirror.go). Mirroring is Bernoulli, so the count is Binomial(500, p) with σ = √(n·p·(1−p)). The relative tolerance scales linearly with p, but σ scales with √(p(1−p)), so the band narrows in σ-units as p drops:

Subcase p expected μ σ band (±15%) width P(outside) per check
/percent-mirror 20% 100 8.94 [85, 115] ±1.68σ ~8–9%
/percent-mirror-and-modify-headers 35% 175 10.67 [148.75, 201.25] ±2.46σ ~1.4%
/percent-mirror-fraction 50% 250 11.18 [212.5, 287.5] ±3.35σ ~0.08%

So even a perfect implementation flakes ~8–9% per distribution-check on the 20% subcase. The 5-retry budget hides it at the suite level (P(all 5 fail) on the order of 10⁻⁵), but roughly every 12th run logs a failed attempt and re-sends up to 2500 requests for that subcase.

Anything else we need to know?:

Ruling out a lossy mirror leg in cloudflare-tunnel-gateway-controller (the mirror copy traverses the Cloudflare edge, so dropped mirrors were a plausible worry): the observed overshoot (116 > 115) cannot come from dropped mirrors — a drop can only ever undercount. The deviations are symmetric (below 85 and above 115), which is the signature of sampling variance, not systematic loss.

Suggested fix — decouple the tolerance from a flat relative percentage. Options, roughly in order of preference:

  • Derive the band from the binomial σ, e.g. μ ± k·√(n·p(1−p)) with k ≈ 3.5 (≈ 4·10⁻⁴ per-check flake across all three subcases).
  • Or raise totalRequests so the 20% subcase clears ~3σ at ±15% (needs roughly n ≥ 1600).
  • Or relax tolerancePercentage for the low-percentage subcase specifically.

Each keeps the test sensitive to a genuinely wrong distribution while removing the sampling flake. The test is already Provisional; I'm happy to send a PR once a direction is agreed.

Constants are identical on main at time of writing (tolerancePercentage = 15.0, totalRequests = 500.0, numDistributionChecks = 5).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions