compact: Block's checksum mismatched, but block seems doesn't broken #8846
Unanswered
zayomeng
asked this question in
Questions & Answers
Replies: 1 comment
-
|
@zayomeng Can I please work on this issue? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Thanos version: v0.38.0
Prometheus version: v3.0.1(using thanos sidecar mode with kube-prometheus)
Object storage: MinIO (Build RELEASE.2024-04-18)
Deployment: Kubernetes
compact start command args:
- compact
- '--wait'
- '--log.level=info'
- '--log.format=logfmt'
- '--http-address=0.0.0.0:10912'
- '--data-dir=/var/thanos/compactor'
- '--debug.accept-malformed-index'
- '--retention.resolution-raw=180d'
- '--retention.resolution-5m=180d'
- '--retention.resolution-1h=180d'
- '--delete-delay=6h'
- '--objstore.config-file=/config/thanos.yaml'
- '--compact.enable-vertical-compaction'
- '--deduplication.replica-label="prometheus_replica"'
Problem description
We are experiencing repeated compactor halts due to block corruption errors:
Because we don't have a backup object storage for store, I think those blocks which noticed in the compactor's log is totally broken and can't repair. So I stopped my thanos compactor, then use
thanos tools bucket markto mark those blocks to be delete and runthanos tools bucket cleanupto delete themimmediately. When these command finished, I startup the compactor and it will keep running about 10mins, it still will be go into halted. So I did the things same as before. After 10+ loops for block removal, I think it may not be the problem of these blocks. Then I ranthanos bucket verify -i index_known_issues --id <broken-block-id>(I have to use this because without this flag, it will download all blocks to temp and wont cleanup these files, it ran out of my filesystem.) , it shows verify ok without any error or warn logs.After many trials, finally I go to use client to download the newest checksum error block to my local disk, and used
promtool tsdb analyzeto check this file if it is broken. That shows it can be read successfully. So this make me confused and I don't know who to let the compactor run correctly. I have about 20000 blocks in MinIO.Steps which I been tried but doesn't work
thanos tools bucket verifyto check the whole blocks in MinIO. (No logs shows have problem.)Beta Was this translation helpful? Give feedback.
All reactions