Control sharding codec read coalescing with ArrayConfig and runtime config options#3987
Control sharding codec read coalescing with ArrayConfig and runtime config options#3987aldenks wants to merge 15 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3987 +/- ##
=======================================
Coverage 93.52% 93.53%
=======================================
Files 90 90
Lines 11926 11938 +12
=======================================
+ Hits 11154 11166 +12
Misses 772 772
🚀 New features to boost your workflow:
|
|
disclaimer: I'm not a big fan of our global config object, so I'd like to explore some alternative ways for the sharding reads to access this configuration. A few options:
|
|
@d-v-b I also like new fields on ArrayConfig. Thinking that through:
That sound alright? |
yeah, that sounds right. the array config object is designed to make it easy to get a cheap copy of an array with a new config, using the |
|
|
| write_empty_chunks: bool, | ||
| *, | ||
| read_missing_chunks: bool = True, | ||
| sharding_coalesce_max_gap_bytes: int = 1 << 20, # 1 MiB |
There was a problem hiding this comment.
flagging this as an old problem we need to deal with, not a blocker: the default values appear twice, once here, and once in the global config object. that's a latent bug. the defaults should be defined in exactly 1 place.
There was a problem hiding this comment.
+1 For now I added test_array_config_init_defaults_match_global_config which makes sure they don't drift.
The merge of zarr-developers#3968 ("deprecate more enums") retyped ShardsConfigParam.index_location to IndexLocation (Literal["start", "end"]), so the deprecated ShardingCodecIndexLocation.end member (now a plain str) no longer satisfies it. Pass the literal "end" directly, as the deprecation instructs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@d-v-b ready for merge! thank you for the review |
Follow up #3004 by adding
ArrayConfigand runtime configuration options for the thresholds that control how requests are coalesced when reading in the sharding codec.Two new fields on
ArrayConfigcontrol how the sharding codec coalesces partial-shard reads:sharding_coalesce_max_gap_bytes(default 1 MiB) andsharding_coalesce_max_bytes(default 16 MiB). When reading multiple chunks from the same shard, nearby byte ranges are merged into a single request to the store if separated by no more thansharding_coalesce_max_gap_bytesand the merged read stays withinsharding_coalesce_max_bytes. Defaults are seeded from the matchingarray.sharding_coalesce_max_gap_bytes/array.sharding_coalesce_max_byteskeys inzarr.configat array-creation time, and can be overridden per array by passingconfig={...}tozarr.create_array.TODO:
docs/user-guide/*.mdchanges/