Skip to content

Commit 53655e9

Browse files
committed
gcs: mitigate transient errors causing corruption
For long backups, there appears to be a relatively high chance of random issue like a 503. When these issues, it appears that the clickhouse-backup retries are not sufficient, as we are seeing missing table parts get reported. This is a mitigation because in theory, the wrapping retries would correctly retry these issues, however it does not appear to be the case. More understanding is needed to fix that particular issue.
1 parent 4ba6f1f commit 53655e9

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

pkg/storage/gcs.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,8 @@ func (gcs *GCS) PutFileAbsolute(ctx context.Context, key string, r io.ReadCloser
279279
}
280280
pClient := pClientObj.(*clientObject).Client
281281
obj := pClient.Bucket(gcs.Config.Bucket).Object(key)
282+
// always retry transient errors to mitigate retry logic bugs.
283+
obj = obj.Retryer(storage.WithPolicy(storage.RetryAlways))
282284
writer := obj.NewWriter(ctx)
283285
writer.ChunkSize = gcs.Config.ChunkSize
284286
writer.StorageClass = gcs.Config.StorageClass
@@ -365,6 +367,8 @@ func (gcs *GCS) CopyObject(ctx context.Context, srcSize int64, srcBucket, srcKey
365367
pClient := pClientObj.(*clientObject).Client
366368
src := pClient.Bucket(srcBucket).Object(srcKey)
367369
dst := pClient.Bucket(gcs.Config.Bucket).Object(dstKey)
370+
// always retry transient errors to mitigate retry logic bugs.
371+
dst = dst.Retryer(storage.WithPolicy(storage.RetryAlways))
368372
attrs, err := src.Attrs(ctx)
369373
if err != nil {
370374
if pErr := gcs.clientPool.InvalidateObject(ctx, pClientObj); pErr != nil {

0 commit comments

Comments
 (0)