Skip to content

Commit e2f8663

Browse files
codesomegouthamve
andauthored
Avoid indefinite checkpointing (#2955)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
1 parent badc146 commit e2f8663

File tree

2 files changed

+11
-0
lines changed

2 files changed

+11
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110
* [BUGFIX] Fixed `Missing chunks and index config causing silent failure` Absence of chunks and index from schema config is not validated. #2732
111111
* [BUGFIX] Fix panic caused by KVs from boltdb being used beyond their life. #2971
112112
* [BUGFIX] Experimental TSDB: `/api/v1/series`, `/api/v1/labels` and `/api/v1/label/{name}/values` only query the TSDB head regardless of the configured `-experimental.blocks-storage.tsdb.retention-period`. #2974
113+
* [BUGFIX] Ingester: Avoid indefinite checkpointing in case of surge in number of series. #2955
113114

114115
## 1.2.0 / 2020-07-01
115116

pkg/ingester/wal.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,7 @@ func (w *walWrapper) performCheckpoint(immediate bool) (err error) {
337337
totalSize := 0
338338
ticker := time.NewTicker(perSeriesDuration)
339339
defer ticker.Stop()
340+
start := time.Now()
340341
for userID, state := range us {
341342
for pair := range state.fpToSeries.iter() {
342343
state.fpLocker.Lock(pair.fp)
@@ -361,6 +362,15 @@ func (w *walWrapper) performCheckpoint(immediate bool) (err error) {
361362
}
362363

363364
if !immediate {
365+
if time.Since(start) > 2*w.cfg.CheckpointDuration {
366+
// This could indicate a surge in number of series and continuing with
367+
// the old estimation of ticker can make checkpointing run indefinitely in worst case
368+
// and disk running out of space. Re-adjust the ticker might not solve the problem
369+
// as there can be another surge again. Hence let's checkpoint this one immediately.
370+
immediate = true
371+
continue
372+
}
373+
364374
select {
365375
case <-ticker.C:
366376
case <-w.quit: // When we're trying to shutdown, finish the checkpoint as fast as possible.

0 commit comments

Comments
 (0)