Backport of #104881 to 25.8: Use explicit flag for secondary on cluster queries by ianton-ru · Pull Request #1875 · Altinity/ClickHouse

ianton-ru · 2026-06-05T14:25:36Z

Backport of ClickHouse#104881 by @tavplubix

Changelog category (leave one):

Not for changelog (changelog entry is not required) (Use explicit flag for secondary on cluster queries ClickHouse/ClickHouse#104881 by @tavplubix)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Use an explicit flag in Context for secondary DDL/ON CLUSTER queries instead of SECONDARY_QUERY

CI/CD Options

Exclude tests:

Regression jobs to run:

…ry-on-cluster Use explicit flag for secondary on cluster queries

github-actions · 2026-06-05T14:26:32Z

Workflow [PR], commit [2fa783d]

ianton-ru · 2026-06-05T14:28:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fa783dc77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T14:35:01Z

    query_context->makeQueryContext();
    query_context->setCurrentQueryId(""); // generate random query_id
-    query_context->setQueryKind(ClientInfo::QueryKind::SECONDARY_QUERY);
+    query_context->setDDLOrOnClusterInternal(true);


Preserve worker mode for cluster table functions

When a DDL worker executes CREATE ... ON CLUSTER ... AS SELECT ... FROM fileCluster/urlCluster/s3Cluster, this context is no longer marked as SECONDARY_QUERY, while the *Cluster table functions still use query_kind == SECONDARY_QUERY to choose their worker-local storage path (for example TableFunctionFileCluster.cpp:23 and TableFunctionURLCluster.cpp:13). As a result, each DDL worker can fan out to the whole cluster again instead of reading only its assigned local input, multiplying remote reads and potentially duplicating the data inserted by CTAS on every replica; keep the secondary query kind for execution semantics or update those table-function checks to also recognize this new internal DDL flag.

Useful? React with 👍 / 👎.

Good catch, but upstream master branch still uses query_kind == SECONDARY_QUERY, so this issue also there.

the original PR author stated it was irrelevant to ddl

ianton-ru · 2026-06-08T10:40:56Z

Failed tests looks as unstable, unrelated to PR

mkmkme

LGTM, pretty much 1-to-1 with the original PR

mkmkme · 2026-06-10T12:30:20Z

    query_context->makeQueryContext();
    query_context->setCurrentQueryId(""); // generate random query_id
-    query_context->setQueryKind(ClientInfo::QueryKind::SECONDARY_QUERY);
+    query_context->setDDLOrOnClusterInternal(true);


the original PR author stated it was irrelevant to ddl

alsugiliazova · 2026-06-10T13:50:31Z

Audit update for PR #1875

AI audit note: This review comment was generated by AI (claude-opus-4.7).

PR: Altinity/ClickHouse#1875 — Backport of ClickHouse#104881 to 25.8: Use explicit flag for secondary on cluster queries
HEAD: 2fa783dc773f8a971e333aa9de671f3982fdacfb
Base: stable-25.8
Upstream: ClickHouse/ClickHouse#104881 (merged)

Confirmed defects

High — `CREATE TABLE AS s3Cluster()` / `fileCluster()` / `urlCluster()` is broken inside a `Replicated` database

Impact: CREATE TABLE ... AS SELECT FROM s3Cluster(...) (and analogues for fileCluster, urlCluster, azureBlobStorageCluster, etc.) executed against a Replicated database throws NOT_FOUND_COLUMN_IN_BLOCK on the secondary replica that picks up the DDL task. The PR's own CI reproduces this on Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel)/03579_create_table_populate_from_s3 (backported by 53b01c8fdfe from upstream LOGICAL_ERROR: Next task callback is not set for query ClickHouse/ClickHouse#84753); the stack trace runs through DatabaseReplicated::tryEnqueueReplicatedDDL → DDLWorker::tryExecuteQuery → InterpreterCreateQuery::fillTableIfNeeded → Planner::buildPlanForQueryNode → ActionsDAG::appendInputsForUnusedColumns. The previously-backported fix Fix logical error on creating table as s3Cluster() in Replicated database ClickHouse/ClickHouse#85904 ("Fix logical error on creating table as s3Cluster in Replicated database") relies on the DDL worker context having query_kind == SECONDARY_QUERY so that TableFunctionObjectStorageCluster::executeImpl takes the worker-local StorageObjectStorage branch with distributed_processing = can_use_distributed_iterator (false here, because there is no cluster-function read-task callback). After this PR, DDLTaskBase::makeQueryContext and DatabaseReplicatedTask::makeQueryContext set only setDDLOrOnClusterInternal(true) and drop the setQueryKind(SECONDARY_QUERY) call, so client_info.query_kind keeps the parent context's value (NO_QUERY for the DDL worker thread). The check at src/TableFunctions/TableFunctionObjectStorageCluster.cpp:37 then falls through to the initiator branch and constructs a StorageObjectStorageCluster that re-dispatches the read across the replicas. The Codex bot raised this exact concern on the PR; it was dismissed. Other *Cluster table functions in 25.8 use the same discrimination (TableFunctionFileCluster.cpp:23, TableFunctionURLCluster.cpp:13, TableFunctionObjectStorage.cpp:203, TableFunctionURL.cpp:100), so the regression is not S3-only.
Anchor: src/Interpreters/DDLTask.cpp (DDLTaskBase::makeQueryContext, DatabaseReplicatedTask::makeQueryContext); affected callers in src/TableFunctions/TableFunction*Cluster.cpp and src/TableFunctions/TableFunctionObjectStorage.cpp.
Trigger: CREATE [OR REPLACE] TABLE t ... AS SELECT * FROM s3Cluster(cluster, url, format) ... executed against a Replicated database (or any DatabaseReplicated stateless test wrapper). Reproduced by tests/queries/0_stateless/03579_create_table_populate_from_s3.sh in the DatabaseReplicated stateless config on this PR's CI.
Why defect: DDLTask::makeQueryContext no longer marks the worker context as SECONDARY_QUERY, but multiple call sites that gate worker-local vs. initiator behavior still check query_kind == SECONDARY_QUERY and were not updated to also accept isDDLOrOnClusterInternal. The Altinity 25.8 branch carries the Fix logical error on creating table as s3Cluster() in Replicated database ClickHouse/ClickHouse#85904 fix and the LOGICAL_ERROR: Next task callback is not set for query ClickHouse/ClickHouse#84753 regression test that depend on the old semantics; upstream master happens to be broken in the same way (the PR author acknowledged this), but on 25.8 the regression is observable and fails CI.
Fix direction: Either keep setQueryKind(ClientInfo::QueryKind::SECONDARY_QUERY) in DDLTask::makeQueryContext alongside setDDLOrOnClusterInternal(true), or update TableFunctionObjectStorageCluster.cpp, TableFunctionFileCluster.cpp, TableFunctionURLCluster.cpp, TableFunctionObjectStorage.cpp, TableFunctionURL.cpp (and any other query_kind == SECONDARY_QUERY checks that fire from the DDL flow) to also treat context->isDDLOrOnClusterInternal() as the worker branch.
Regression test direction: 03579_create_table_populate_from_s3 already triggers the failure under the DatabaseReplicated config; make it required in CI for this backport and add explicit fileCluster / urlCluster analogues under a Replicated engine.

Coverage summary

Scope reviewed: All 13 files in the backport diff; cross-checked against upstream Use explicit flag for secondary on cluster queries ClickHouse/ClickHouse#104881 (15 files). The two missing upstream files (src/Interpreters/InterpreterSystemQuery.cpp::restoreDatabaseFromKeeperPath and src/Storages/ObjectStorage/Utils.cpp::expandPaimonKeeperMacrosIfNeeded) and the missing third hunk in src/Databases/DatabaseReplicated.cpp::registerDatabaseReplicated are correctly omitted because those code paths do not exist in 25.8. All boolean translations (query_kind == SECONDARY_QUERY ↔ isDDLOrOnClusterInternal, query_kind == INITIAL_QUERY ↔ !isDDLOrOnClusterInternal, query_kind != INITIAL_QUERY ↔ isDDLOrOnClusterInternal, query_kind != SECONDARY_QUERY ↔ !isDDLOrOnClusterInternal) match upstream verbatim. Context state propagation (copy constructor, default initializer, getter/setter) is consistent. CI report at https://altinity-build-artifacts.s3.amazonaws.com/PRs/1875/2fa783dc773f8a971e333aa9de671f3982fdacfb/result_pr.json reviewed.
Categories failed: DDL worker context propagation × cluster table function dispatch (root cause: dropped setQueryKind(SECONDARY_QUERY)).
Categories passed: Backport-vs-upstream boolean translation parity; Context member lifecycle/copy; backup internal-flag access checks; Replicated database tryEnqueueReplicatedDDL initial-query check; InterpreterCreateQuery UUID/attach-path checks; InterpreterDropQuery secondary-query detection; Kafka / MergeTree / ObjectStorageQueue / TableZnodeInfo is_replicated_database macro gating.
Not applicable: iterator/reference invalidation, integer overflow/signedness, RAII leaks, multithreaded interleaving (no new shared mutable state introduced), rollback/partial-update (no mutation paths added).
Assumptions/limits: Static reasoning + CI log inspection only; did not run a fresh local DatabaseReplicated build to re-execute 03579_create_table_populate_from_s3. The two remaining CI failures (03707_set_index_bad_get_null_bug plan-text mismatch under ParallelReplicas, and the BackupsWorker::wait fatal in the amd_debug, distributed plan shard) were not analyzed beyond confirming their stack traces are not on the modified code paths.

ianton-ru · 2026-06-10T15:56:32Z

Tables can't be created with cluster table functions

select * from s3('http://minio1:9001/root/test/test1/data/**', 'minio', 'minio123', 'Parquet') limit 1

   ┌──────────────────────_time─┬─value─┐
1. │ 2026-01-15 00:00:00.000000 │     1 │
   └────────────────────────────┴───────┘


CREATE TABLE r.t2 engine=MergeTree() order by _time settings allow_nullable_key=1 AS s3Cluster('ch-cluster', 'http://minio1:9001/root/test/test1/data/**', 'minio', 'minio123', 'Parquet')

Received exception from server (version 26.3.10):
Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Table function 's3Cluster' cannot be used to create a table. (BAD_ARGUMENTS)

Can be created as select * from s3Cluster

create table r.t2 engine=MergeTree() order by _time settings allow_nullable_key=1 as select * from s3Cluster('ch-cluster', 'http://minio1:9001/root/test/test1/data/**', 'minio', 'minio123', 'Parquet')

but this is very strange action.

And the same bug be in upstream master, latest commit b376a13

This works:

create table test engine=MergeTree() order by _time settings allow_nullable_key=1 as select * from s3Cluster('ch-cluster-12', 'http://minio1:9001/root/test/test1/data/**', 'minio', 'minio123', 'Parquet')

select * from test limit 1

   ┌──────────────────────_time─┬─value─┐
1. │ 1970-01-01 00:00:00.000000 │     1 │
   └────────────────────────────┴───────┘

This doesn't:

create database r on cluster 'ch-cluster-12' engine=Replicated('/clickhouse/path/', '{shard}', '{replica}')

create table r.test engine=MergeTree() order by _time settings allow_nullable_key=1 as select * from s3Cluster('ch-cluster-12', 'http://minio1:9001/root/test/test1/data/**', 'minio', 'minio123', 'Parquet')

Received exception from server (version 26.6.1):
Code: 10. DB::Exception: Received from localhost:9000. DB::Exception: Not found column __table1._time in block . (NOT_FOUND_COLUMN_IN_BLOCK)

ianton-ru · 2026-06-10T17:09:21Z

I think we can live with this while upstream can.

ianton-ru · 2026-06-11T12:11:22Z

ClickHouse#107057
Issue about 'create table as select * from s3Cluster'

alsugiliazova · 2026-06-11T12:18:20Z

PR #1875 CI Verification — Backport: Use explicit flag for secondary on cluster queries

Head branch: backports/25.8/104881
Head SHA: 2fa783dc773f8a971e333aa9de671f3982fdacfb
Verification date: 2026-06-10

Verdict

No PR-caused regressions identified. All failing tests are either pre-existing flakes, cascade failures from a flaky test that hung the server, or infrastructure timeouts. The PR can be merged after a green rerun of the two affected jobs.

A separate DCO check is in ACTION_REQUIRED state (sign-off needed) and is unrelated to test correctness.

Job status summary

Status	Count
SUCCESS	105
FAILURE (jobs, GitHub conclusion)	2
ERROR (job, GitHub conclusion)	1 (Build amd_release)
ACTION_REQUIRED	1 (DCO sign-off)
SKIPPED	68 (build matrix exclusions)

Failing checks (current SHA `2fa783d`)

1. `Build (amd_release)` — infrastructure / timeout

Job link
The job ran for ~2 hours through full compilation and packaging (build artifacts uploaded, INFO: 7 rows inserted into CIDB) before being terminated:

Job got terminated with an error, exit code [-15] (SIGTERM)
No compilation error, no FAILED: from ninja, no error: lines from clang. The build itself succeeded; the runner/orchestrator killed the job after the artifact-publishing step.
Effect: dependent jobs Docker keeper image, Docker server image, Compatibility check (release) were marked skipped.
Verdict: infrastructure / harness timeout, not caused by PR. Rerun should clear it.

2. `Stateless tests (amd_debug, distributed plan, s3 storage, parallel)` — cascade from flaky test

Job link

Reported failures (in order):

Test	Type
`02907_backup_restore_flatten_nested`	actual test
`Some queries hung`	meta / cascade
`Killed by signal (in clickhouse-server.log…)`	meta / cascade
`Fatal messages (in clickhouse-server.log…)`	meta / cascade

The latter three are server-crash cascade markers triggered by the first test hanging. Root cause is 02907_backup_restore_flatten_nested.

Pre-existing flakiness of 02907_backup_restore_flatten_nested:

Failures recorded on master (pull_request_number = 0) and on many unrelated PRs going back to 2025-06-18.

Verdict: pre-existing flaky test causing cascade. Rerun recommended to confirm; do not block the PR on this run.

Failing checks reported in DB but green on GitHub (rerun cleared)

These show one FAIL row in the CI database from an earlier execution but the GitHub job conclusion is success at the latest run on the same SHA:

Job	DB row test	GitHub conclusion
`Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel)`	`03707_set_index_bad_get_null_bug`	success
`Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel)`	`03579_create_table_populate_from_s3`	success

Flakiness check:

03707_set_index_bad_get_null_bug: 37 failures across 17 unrelated PRs in the last 60 days → known flake.
03579_create_table_populate_from_s3: 1 failure ever (this PR), but the rerun on the same SHA passed and the test passes consistently across many other PRs (1869, 1896, 1867, master) on the same DatabaseReplicated config.

Verdict: both are flaky / one-off; already green at HEAD.

Other unrelated check

DCO (ACTION_REQUIRED): commit needs a Signed-off-by trailer. Author action, not a CI failure.

Recommendations

Author: add Signed-off-by trailer to satisfy DCO.
Rerun Build (amd_release) (timeout) and Stateless tests (amd_debug, distributed plan, s3 storage, parallel) (flaky 02907_backup_restore_flatten_nested).
If 02907_backup_restore_flatten_nested fails again on rerun in the same configuration, investigate further — but historical evidence makes a regression unlikely.

Merge pull request ClickHouse#104881 from ClickHouse/explicit-seconda…

2fa783d

…ry-on-cluster Use explicit flag for secondary on cluster queries

ianton-ru added 25.8 25.8 Altinity Stable backport Backport labels Jun 5, 2026

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

mkmkme approved these changes Jun 10, 2026

View reviewed changes

alsugiliazova added the verified Approved for release label Jun 11, 2026

Conversation

ianton-ru commented Jun 5, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

ianton-ru commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ianton-ru Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

mkmkme Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

ianton-ru commented Jun 8, 2026

Uh oh!

mkmkme left a comment

Choose a reason for hiding this comment

Uh oh!

mkmkme Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

alsugiliazova commented Jun 10, 2026

Audit update for PR #1875

Confirmed defects

High — CREATE TABLE AS s3Cluster() / fileCluster() / urlCluster() is broken inside a Replicated database

Coverage summary

Uh oh!

ianton-ru commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Jun 10, 2026

Uh oh!

ianton-ru commented Jun 11, 2026

Uh oh!

alsugiliazova commented Jun 11, 2026

PR #1875 CI Verification — Backport: Use explicit flag for secondary on cluster queries

Verdict

Job status summary

Failing checks (current SHA 2fa783d)

1. Build (amd_release) — infrastructure / timeout

2. Stateless tests (amd_debug, distributed plan, s3 storage, parallel) — cascade from flaky test

Failing checks reported in DB but green on GitHub (rerun cleared)

Other unrelated check

Recommendations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

High — `CREATE TABLE AS s3Cluster()` / `fileCluster()` / `urlCluster()` is broken inside a `Replicated` database

ianton-ru commented Jun 10, 2026 •

edited

Loading

Failing checks (current SHA `2fa783d`)

1. `Build (amd_release)` — infrastructure / timeout

2. `Stateless tests (amd_debug, distributed plan, s3 storage, parallel)` — cascade from flaky test