High spikes of ingester inflight query requests #7110

sam-mcbr · 2025-11-11T21:09:39Z

sam-mcbr
Nov 11, 2025

Hi,

My team was looking into setting the max_inflight_query_requests limit to help protect our ingesters from OOMKills due to heavy query load. However, we were reviewing our data to figure out what a good value would be and noticed massive spikes in ingester inflight query requests whenever blocks were shipped:

We have a ship interval of 2h, and confirmed that these spikes occur during the same times our ingesters are shipping blocks. We also looked at metrics for the ingress we have in front of our Cortex deployment and do not believe the spikes are related to user activity, this seems to be triggered by some internal Cortex component.

Can you all share what component might be causing this high number of requests? We are wondering if either the store-gateways or compactors might be responsible? If an internal component is making a high number of query requests, we are concerned about setting this limit too low and causing issues.

Thanks!

Answered by friedrichg

Nov 12, 2025

The metric cortex_ingester_max_inflight_query_requests comes only from ingesters.
The number of inflight requests increases because reads are all queued up. Two things happens every 2 hours in ingesters: compacting and shipping. Compacting and creating a 2h TSDB block is very disk intensive before shipping.

The queuing happens because there is resource constraint: memory, cpu or disk. Or a combination of them.

Here is a couple of ideas to reduce this massive latency for reads :

Add more resources to ingesters. Not just cpu limits, but cpu requests. Ensure the machines where ingesters run are not overloaded. These are very fast resource overload, you might miss them if you are scraping m…

View full answer

friedrichg · 2025-11-12T18:19:50Z

friedrichg
Nov 12, 2025
Maintainer

The metric cortex_ingester_max_inflight_query_requests comes only from ingesters.
The number of inflight requests increases because reads are all queued up. Two things happens every 2 hours in ingesters: compacting and shipping. Compacting and creating a 2h TSDB block is very disk intensive before shipping.

The queuing happens because there is resource constraint: memory, cpu or disk. Or a combination of them.

Here is a couple of ideas to reduce this massive latency for reads :

Add more resources to ingesters. Not just cpu limits, but cpu requests. Ensure the machines where ingesters run are not overloaded. These are very fast resource overload, you might miss them if you are scraping metrics every minute.
Reduce the shipping concurrency:

  # Maximum number of tenants concurrently shipping blocks to the storage.
  # CLI flag: -blocks-storage.tsdb.ship-concurrency
  [ship_concurrency: <int> | default = 10]

Use the in-memory queue, to ease up writes. Make this number bigger than zero

  # The size of the in-memory queue used before flushing chunks to the disk.
  # CLI flag: -blocks-storage.tsdb.head-chunks-write-queue-size
  [head_chunks_write_queue_size: <int> | default = 0]

1 reply

sam-mcbr Nov 12, 2025
Author

Thank you so much for all these details!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High spikes of ingester inflight query requests #7110

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

High spikes of ingester inflight query requests #7110

Uh oh!

sam-mcbr Nov 11, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

friedrichg Nov 12, 2025 Maintainer

Uh oh!

sam-mcbr Nov 12, 2025 Author

sam-mcbr
Nov 11, 2025

Replies: 1 comment 1 reply

friedrichg
Nov 12, 2025
Maintainer

sam-mcbr Nov 12, 2025
Author