Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Commit 1e56681

Browse files
committed
Added playbook for CortexAllocatingTooMuchMemory
Signed-off-by: Marco Pracucci <marco@pracucci.com>
1 parent 12293f0 commit 1e56681

File tree

2 files changed

+22
-4
lines changed

2 files changed

+22
-4
lines changed

cortex-mixin/alerts/alerts.libsonnet

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,7 @@
479479
},
480480
annotations: {
481481
message: |||
482-
High QPS for ingesters, add more ingesters.
482+
Ingesters in {{ $labels.namespace }} have an high samples/sec rate.
483483
|||,
484484
},
485485
},
@@ -498,7 +498,7 @@
498498
},
499499
annotations: {
500500
message: |||
501-
Too much memory being used by {{ $labels.namespace }}/{{ $labels.pod }} - add more ingesters.
501+
Ingester {{ $labels.namespace }}/{{ $labels.pod }} is using too much memory.
502502
|||,
503503
},
504504
},
@@ -517,7 +517,7 @@
517517
},
518518
annotations: {
519519
message: |||
520-
Too much memory being used by {{ $labels.namespace }}/{{ $labels.pod }} - add more ingesters.
520+
Ingester {{ $labels.namespace }}/{{ $labels.pod }} is using too much memory.
521521
|||,
522522
},
523523
},

cortex-mixin/docs/playbooks.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,25 @@ How to **fix**:
451451
452452
### CortexAllocatingTooMuchMemory
453453
454-
_TODO: this playbook has not been written yet._
454+
This alert fires when an ingester memory utilization is getting closer to the limit.
455+
456+
How it **works**:
457+
- Cortex ingesters are a stateful service
458+
- Having 2+ ingesters `OOMKilled` may cause a cluster outage
459+
- Ingester memory baseline usage is primarily influenced by memory allocated by the process (mostly go heap) and mmap-ed files (used by TSDB)
460+
- Ingester memory short spikes are primarily influenced by queries
461+
- A pod gets `OOMKilled` once it's working set memory reaches the configured limit, so it's important to prevent ingesters memory utilization (working set memory) from getting close to the limit (we need to keep at least 30% room for spikes due to queries)
462+
463+
How to **fix**:
464+
- Check if the issue occurs only for few ingesters. If so:
465+
- Restart affected ingesters 1 by 1 (proceed with the next one once the previous pod has restarted and it's Ready)
466+
```
467+
kubectl -n <namespace> delete pod ingester-XXX
468+
```
469+
- Restarting an ingester typically reduces the memory allocated by mmap-ed files. Such memory could be reallocated again, but may let you gain more time while working on a longer term solution
470+
- Check the `Cortex / Writes Resources` dashboard to see if the number of series per ingester is above the target (1.5M). If so:
471+
- Scale up ingesters
472+
- Memory is expected to be reclaimed at the next TSDB head compaction (occurring every 2h)
455473
456474
### CortexGossipMembersMismatch
457475

0 commit comments

Comments
 (0)