Improve shard collocation while limiting to 3 pipelines per node #5808

rdettai · 2025-06-19T08:35:54Z

Description

Closes #4470
Closes #5747
Closes #4630

Improve shard collocation

When placing new shards for an existing source, when affinity is not defined (e.g Kafka source), we currently have no affinity for nodes where we have existing indexing pipelines. This can create a fragmented distribution of shards.

This solution is not perfect, ideally we should split the assignation in two steps (just as we do for the shard affinity):

first assign shards to nodes that already have shards for all the source
second for sources where we still have shards to assign, assign them to new nodes

Limit to 3 pipelines per node

See #5792

How was this PR tested?

Added unit test

quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/scheduling_logic.rs

rdettai · 2025-07-02T07:33:11Z

Note: when using ingest V2, the effect of this PR should be moderate. Indeed, ingest V2 creates shards on nodes randomly:

quickwit/quickwit/quickwit-control-plane/src/ingest/ingest_controller.rs

Lines 114 to 125 in 52ac2e1

    
           /// Pick a node from the `shard_count_to_node_ids` that is different from `except_node_opt`. 
        
           /// We pick in priority nodes with the least number of shards, and we break any tie randomly. 
        
           /// 
        
           /// Once a node has been found, we update the `shard_count_to_node_ids` to reflect the new state. 
        
           /// In particular, the ingester node is moved from its previous shard_count level to its new 
        
           /// shard_count level. In particular, a shard_count entry that is empty should be removed from the 
        
           /// BTreeMap. 
        
           fn pick_one<'a>( 
        
               shard_count_to_node_ids: &mut BTreeMap<usize, Vec<&'a NodeIdRef>>, 
        
               except_node_opt: Option<&'a NodeIdRef>, 
        
               rng: &mut ThreadRng, 
        
           ) -> Option<&'a NodeIdRef> {

This means that in most cases it's the affinity placement stage that is going to scatter shards across indexers. It's only when nodes don't have enough (artificially scaled) capacity during that stage that the placement change from this PR will kick in. In practice this happens fairly frequently.

rdettai · 2025-07-16T12:29:03Z

This still requires some at-scale tests before merging

dpavlov-smartling · 2025-09-29T20:24:29Z

@rdettai I can participate in testing. I have a problem with uneven distribution of the partitions between 2 indexers. 1 indexers always gets the whole set of topic's partitions and because I have a few very loaded topics that cause performance degradation.
I have available in QW discord. Here is my recent post https://discord.com/channels/908281611840282624/915644818988863558/1422316608478515240

rdettai-sk · 2025-10-14T10:30:08Z

Thanks @dpavlov-smartling for testing this.

I tried docker image quickwit/quickwit:qw-collocation-20250710 from your PR #5808 but I still see the same behavior of partition distribution. All 24 partitions of 1 topic were assigned to the 1 indexer.
I tried different number of pipelines from 2 to 12. all partitions always are being assigend to 1 indexer. Cluster has 2 indexers and tested with cpu_capacity set to 1m and without setting anything. I think i noticed better pipeline distribution during the high load. But still 1 heavy topic cause 1 indexer to be loaded almost to the top, while another one using 10% of CPU 🙁

I'm a bit surprised by the behavior you are describing after running this branch. We are using it and it does improve balancing significantly. Have you start scaling down the whole cluster (including the control plane)?

dpavlov-smartling · 2025-10-14T14:57:34Z

Hello @rdettai.
What do you mean by Have you start scaling down the whole cluster (including the control plane)? All QW containers were replaced with quickwit/quickwit:qw-collocation-20250710 during my test.
My setup is:

2 search nodes
2 indexers (during the test tried to use 3)
1 control plane + 1 janitor
I run metastore on searches and indexers with Postgres

As for now I use quickwit/quickwit:edge, but ready to test if we needed. Maybe something specific in my cluster cause this disbalance, but I doubt

daniele-br · 2025-10-31T14:42:45Z

hey @rdettai @rdettai-sk, do you want other people to help test this fix?

We have 500 kakfa sources and 4 indexers with this configuration:

  indexer:
    replicaCount: 4
    persistentVolume:
      storage: 200Gi
    resources:
      limits:
        cpu: 32000m
        memory: 48Gi
      requests:
        cpu: 4000m
        memory: 8Gi

The ingest rate of these sources is extremely varied with only a few indexes having very high volume.

Usually we have one working at close to 2/3-3/4 capacity, another working at 1/3 capacity, and 2 working at basically 10%.

rdettai · 2025-11-02T17:30:09Z

Hi @daniele-br you should definitely test this branch, it has helped a lot for our setup, and our load distribution is a bit similar to yours (probably a bit less skewed). There are still distribution issues, especially when adding new nodes and I'm planning to work on it in the coming weeks.

daniele-br · 2025-11-13T16:24:50Z

ok @rdettai, I'm confused about how the tagging works and how we know whether the source code should match the behavior of specific tags. If we use this branch, does it just represent a base of an airmail image? Is there something that explains the tagging strategy, so that we know which behavior to expect?

rdettai-sk · 2025-11-14T16:42:40Z

These are not official builds, quickwit/quickwit:qw-collocation-20250710 was just a build of this branch at the time it was written. As you can see here I rebased it since. I recommend you to cherry-pick this branch to whatever revision you are currently running, build and perform the tests that way. I can trigger a CI build for you as well if you prefer, just tell me which revision you are currently running.

rdettai force-pushed the impr-shard-collocation branch from 71727e5 to 4f54d65 Compare June 19, 2025 09:04

rdettai mentioned this pull request Jun 19, 2025

Limit to 3 pipelines per node per source #5792

Merged

Base automatically changed from test-solution-stability to main June 27, 2025 13:10

fulmicoton-dd reviewed Jun 27, 2025

View reviewed changes

quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/scheduling_logic.rs Outdated Show resolved Hide resolved

fulmicoton-dd approved these changes Jun 27, 2025

View reviewed changes

rdettai force-pushed the impr-shard-collocation branch from 96ac7ef to abc54c8 Compare June 30, 2025 10:46

rdettai changed the title ~~Improve shard collocation when scheduling~~ Improve shard collocation when scheduling while limiting to 3 pipelines per node Jul 2, 2025

rdettai changed the title ~~Improve shard collocation when scheduling while limiting to 3 pipelines per node~~ Improve shard collocation while limiting to 3 pipelines per node Jul 2, 2025

This was referenced Jul 2, 2025

Balance pipelines when unscaled capacity of individual nodes is exceeded #5755

Closed

Uneven pipeline distribution across indexers #5833

Open

rdettai force-pushed the impr-shard-collocation branch from abc54c8 to acdb18a Compare July 10, 2025 09:14

rdettai temporarily deployed to production July 10, 2025 09:35 — with GitHub Actions Inactive

Limit to 3 pipelines per node per source

6c7423e

rdettai-sk force-pushed the impr-shard-collocation branch from acdb18a to 6c7423e Compare November 6, 2025 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve shard collocation while limiting to 3 pipelines per node #5808

Improve shard collocation while limiting to 3 pipelines per node #5808

rdettai commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

rdettai commented Jul 2, 2025

Uh oh!

rdettai commented Jul 16, 2025

Uh oh!

dpavlov-smartling commented Sep 29, 2025

Uh oh!

rdettai-sk commented Oct 14, 2025

Uh oh!

dpavlov-smartling commented Oct 14, 2025

Uh oh!

daniele-br commented Oct 31, 2025

Uh oh!

rdettai commented Nov 2, 2025

Uh oh!

daniele-br commented Nov 13, 2025

Uh oh!

rdettai-sk commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Improve shard collocation while limiting to 3 pipelines per node #5808

Are you sure you want to change the base?

Improve shard collocation while limiting to 3 pipelines per node #5808

Conversation

rdettai commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Improve shard collocation

Limit to 3 pipelines per node

How was this PR tested?

Uh oh!

Uh oh!

rdettai commented Jul 2, 2025

Uh oh!

rdettai commented Jul 16, 2025

Uh oh!

dpavlov-smartling commented Sep 29, 2025

Uh oh!

rdettai-sk commented Oct 14, 2025

Uh oh!

dpavlov-smartling commented Oct 14, 2025

Uh oh!

daniele-br commented Oct 31, 2025

Uh oh!

rdettai commented Nov 2, 2025

Uh oh!

daniele-br commented Nov 13, 2025

Uh oh!

rdettai-sk commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rdettai commented Jun 19, 2025 •

edited

Loading