Skip to content
Discussion options

You must be logged in to vote

The metric cortex_ingester_max_inflight_query_requests comes only from ingesters.
The number of inflight requests increases because reads are all queued up. Two things happens every 2 hours in ingesters: compacting and shipping. Compacting and creating a 2h TSDB block is very disk intensive before shipping.

The queuing happens because there is resource constraint: memory, cpu or disk. Or a combination of them.

Here is a couple of ideas to reduce this massive latency for reads :

  • Add more resources to ingesters. Not just cpu limits, but cpu requests. Ensure the machines where ingesters run are not overloaded. These are very fast resource overload, you might miss them if you are scraping m…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@sam-mcbr
Comment options

Answer selected by sam-mcbr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
2 participants
Converted from issue

This discussion was converted from issue #7107 on November 12, 2025 18:19.