Skip to content

Commit 407771f

Browse files
committed
roachtest: increase server.sqlliveness.ttl in copyfrom
`copyfrom/atomic` often fails with `RETRY_COMMIT_DEADLINE_EXCEEDED` error. Based on some guidance from KV folks, it sounds like (apart from the closed TS system that has already been adjusted) the most likely explanation is that the lease on a descriptor has expired while the txn was running. Before 24.1 we used expiry based leasing, then we introduced session based leasing, and the migration to only use that has been completed in the beginning of the year. If I'm reading the code right, then the lease duration depends on the SQL liveness TTL, so this commit bumps the relevant cluster setting from 40s to 5m to give enough time for atomic COPY to finish. I did 50 runs of the test, and all of them passed. Release note: None
1 parent c619ca9 commit 407771f

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

pkg/cmd/roachtest/tests/copyfrom.go

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -150,10 +150,14 @@ func runCopyFromCRDB(ctx context.Context, t test.Test, c cluster.Cluster, sf int
150150
startOpts.RoachprodOpts.ExtraArgs = append(startOpts.RoachprodOpts.ExtraArgs, "--vmodule=copy_from=2,insert=2")
151151
// The roachtest frequently runs on overloaded instances and can timeout as
152152
// a result. We've seen cases where the atomic COPY takes about 2 minutes to
153-
// complete, so we set the closed TS to 5 minutes (to give enough safety
154-
// gap, otherwise the closed TS system might continuously push the COPY txn
155-
// not allowing it ever to complete).
156-
clusterSettings := install.MakeClusterSettings(install.ClusterSettingsOption{"kv.closed_timestamp.target_duration": "300s"})
153+
// complete, so we set the closed TS and SQL liveness TTL to 5 minutes (to
154+
// give enough safety gap, otherwise the closed TS system and lease system
155+
// expireation might continuously push the COPY txn not allowing it ever to
156+
// complete).
157+
clusterSettings := install.MakeClusterSettings(install.ClusterSettingsOption{
158+
"kv.closed_timestamp.target_duration": "300s",
159+
"server.sqlliveness.ttl": "300s",
160+
})
157161
c.Start(ctx, t.L(), startOpts, clusterSettings, c.All())
158162
initTest(ctx, t, c, sf)
159163
db, err := c.ConnE(ctx, t.L(), 1)

0 commit comments

Comments
 (0)