Thanos, Prometheus and Golang version used:
- Thanos: v0.41.0 (docker image
thanosio/thanos:v0.41.0)
- Prometheus (sending via remote-write): v3.9.2 / v3.7.3 (both trigger the bug)
- Go: 1.25.0 (as built in v0.41.0)
- Dependencies:
prometheus/common@v0.67.5, prometheus/prometheus@v0.308.0
Object Storage Provider: N/A (affects receive component only)
What happened:
The Thanos receiver-router panics with Invalid name validation scheme requested: unset when processing incoming remote-write requests with --receive.relabel-config-file configured. The panic causes Go's HTTP server to close the connection, resulting in downstream clients receiving EOF errors.
Panic trace
panic: Invalid name validation scheme requested: unset
goroutine 5312 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1943 +0xd3
panic({0x2c21820?, 0xc00063e050?})
/usr/local/go/src/runtime/panic.go:783 +0x132
github.com/prometheus/common/model.ValidationScheme.IsValidLabelName(0xc0015660f7?, {0xc0014b2010?, 0xc?})
/go/pkg/mod/github.com/prometheus/common@v0.67.5/model/metric.go:203 +0xb4
github.com/prometheus/prometheus/model/relabel.relabel(0xc000390b60, 0xc00150e050)
/go/pkg/mod/github.com/prometheus/prometheus@v0.308.0/model/relabel/relabel.go:340 +0x812
github.com/prometheus/prometheus/model/relabel.ProcessBuilder(...)
/go/pkg/mod/github.com/prometheus/prometheus@v0.308.0/model/relabel/relabel.go:290
github.com/prometheus/prometheus/model/relabel.Process({0xc000591b08?, 0x1a?, 0x23?}, {0xc0006d3780, 0x4, 0x4?})
/go/pkg/mod/github.com/prometheus/prometheus@v0.308.0/model/relabel/relabel.go:280 +0x5f
github.com/thanos-io/thanos/pkg/receive.(*Handler).relabel(0xc000274460, 0xc000755410)
/app/pkg/receive/handler.go:1132 +0x108
github.com/thanos-io/thanos/pkg/receive.(*Handler).receiveHTTP(0xc000274460, {0x42d3580, 0xc001a000e0}, 0xc0002ae500)
/app/pkg/receive/handler.go:620 +0xf8d
What you expected to happen:
Relabeling should work without panics, using UTF-8 validation as the default scheme.
How to reproduce it (as minimally and precisely as possible):
- Deploy Thanos v0.41.0 receiver-router with
--receive.relabel-config-file pointing to a relabel config that includes a replace action with a non-default regex:
- source_labels: [some_label]
regex: "(.+)"
target_label: new_label
- Send remote-write requests from a Prometheus instance that has
some_label populated on its metrics
- Observe panic in receiver-router logs
Full logs to relevant components:
See panic trace above. The panic is recovered by Go's HTTP server, logged at error level, and the connection is closed (causing EOF on the client side).
Anything else we need to know:
Root Cause
In v0.41.0, the relabel.Config struct (from prometheus/prometheus@v0.308.0) has a NameValidationScheme field of type model.ValidationScheme. When relabel configs are parsed from YAML via yaml.Unmarshal in cmd/thanos/receive.go:224, this field is not present in user-supplied YAML files, so it defaults to the zero value UnsetValidation (0).
At runtime, relabel.relabel() at relabel.go:340 calls:
if !cfg.NameValidationScheme.IsValidLabelName(target) {
This panics because IsValidLabelName does not handle UnsetValidation.
The existing unit tests in handler_test.go pass because they manually set NameValidationScheme: model.UTF8Validation on each test config struct.
Why some deployments are unaffected
The panic only triggers when the Replace action's regex matches the source label value:
case Replace:
// Fast path - skips IsValidLabelName entirely
if val == "" && cfg.Regex == DefaultRelabelConfig.Regex && ... {
lb.Set(cfg.TargetLabel, cfg.Replacement)
break
}
indexes := cfg.Regex.FindStringSubmatchIndex(val)
if indexes == nil {
break // regex didn't match - skips IsValidLabelName
}
target := ...
if !cfg.NameValidationScheme.IsValidLabelName(target) { // PANIC when UnsetValidation
If the incoming metrics don't have the source labels populated (e.g., customer_key is empty), then:
val = "" → regex: "(.+)" doesn't match → indexes == nil → break before reaching the panic
This means clusters where the sending Prometheus doesn't emit metrics with the relevant labels will never hit the panic, even on v0.41.0 with the same relabel config.
Suggested fix
After unmarshaling relabel configs, set NameValidationScheme = model.UTF8Validation on any config where it is unset:
for _, rc := range relabelConfig {
if rc.NameValidationScheme == model.UnsetValidation {
rc.NameValidationScheme = model.UTF8Validation
}
}
Environment:
- Kubernetes: GKE 1.33+
- Thanos receiver mode: router
Thanos, Prometheus and Golang version used:
thanosio/thanos:v0.41.0)prometheus/common@v0.67.5,prometheus/prometheus@v0.308.0Object Storage Provider: N/A (affects receive component only)
What happened:
The Thanos receiver-router panics with
Invalid name validation scheme requested: unsetwhen processing incoming remote-write requests with--receive.relabel-config-fileconfigured. The panic causes Go's HTTP server to close the connection, resulting in downstream clients receiving EOF errors.Panic trace
What you expected to happen:
Relabeling should work without panics, using UTF-8 validation as the default scheme.
How to reproduce it (as minimally and precisely as possible):
--receive.relabel-config-filepointing to a relabel config that includes areplaceaction with a non-default regex:some_labelpopulated on its metricsFull logs to relevant components:
See panic trace above. The panic is recovered by Go's HTTP server, logged at error level, and the connection is closed (causing EOF on the client side).
Anything else we need to know:
Root Cause
In v0.41.0, the
relabel.Configstruct (fromprometheus/prometheus@v0.308.0) has aNameValidationSchemefield of typemodel.ValidationScheme. When relabel configs are parsed from YAML viayaml.Unmarshalincmd/thanos/receive.go:224, this field is not present in user-supplied YAML files, so it defaults to the zero valueUnsetValidation(0).At runtime,
relabel.relabel()atrelabel.go:340calls:This panics because
IsValidLabelNamedoes not handleUnsetValidation.The existing unit tests in
handler_test.gopass because they manually setNameValidationScheme: model.UTF8Validationon each test config struct.Why some deployments are unaffected
The panic only triggers when the
Replaceaction's regex matches the source label value:If the incoming metrics don't have the source labels populated (e.g.,
customer_keyis empty), then:val = ""→regex: "(.+)"doesn't match →indexes == nil→breakbefore reaching the panicThis means clusters where the sending Prometheus doesn't emit metrics with the relevant labels will never hit the panic, even on v0.41.0 with the same relabel config.
Suggested fix
After unmarshaling relabel configs, set
NameValidationScheme = model.UTF8Validationon any config where it is unset:Environment: