Skip to content

Commit 5ac50f7

Browse files
committed
copy: fix vectorized auto commit behavior
When we implemented the vectorized INSERT which supports COPY in some cases, we missed one condition for auto-committing the txn that is present in the regular `tableWriterBase` path. Namely, we need to check whether the deadline that might be set on the txn hasn't expired yet, and if it has, we shouldn't be auto-committing and should be leaving it up to the connExecutor (which will try to refresh the deadline). The impact of the bug is that often if COPY took longer than 40s (controlled via `server.sqlliveness.ttl`), we'd hit the txn retry error and propagate it to the client. Release note (bug fix): Previously, the "atomic" COPY command (controlled via `copy_from_atomic_enabled`, which is `true` by default) could encounter RETRY_COMMIT_DEADLINE_EXCEEDED txn errors if the whole command took 1 minute or more. This was the case only when the vectorized engine was used for COPY and is now fixed.
1 parent 133c00b commit 5ac50f7

File tree

2 files changed

+9
-12
lines changed

2 files changed

+9
-12
lines changed

pkg/cmd/roachtest/tests/copyfrom.go

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -148,16 +148,7 @@ func runCopyFromCRDB(ctx context.Context, t test.Test, c cluster.Cluster, sf int
148148
// Enable the verbose logging on relevant files to have better understanding
149149
// in case the test fails.
150150
startOpts.RoachprodOpts.ExtraArgs = append(startOpts.RoachprodOpts.ExtraArgs, "--vmodule=copy_from=2,insert=2")
151-
// The roachtest frequently runs on overloaded instances and can timeout as
152-
// a result. We've seen cases where the atomic COPY takes about 2 minutes to
153-
// complete, so we set the closed TS and SQL liveness TTL to 5 minutes (to
154-
// give enough safety gap, otherwise the closed TS system and lease system
155-
// expireation might continuously push the COPY txn not allowing it ever to
156-
// complete).
157-
clusterSettings := install.MakeClusterSettings(install.ClusterSettingsOption{
158-
"kv.closed_timestamp.target_duration": "300s",
159-
"server.sqlliveness.ttl": "300s",
160-
})
151+
clusterSettings := install.MakeClusterSettings()
161152
c.Start(ctx, t.L(), startOpts, clusterSettings, c.All())
162153
initTest(ctx, t, c, sf)
163154
db, err := c.ConnE(ctx, t.L(), 1)

pkg/sql/colexec/insert.go

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -190,9 +190,15 @@ func (v *vectorInserter) Next() coldata.Batch {
190190
}
191191
colexecerror.ExpectedError(err)
192192
}
193-
log.VEventf(ctx, 2, "copy running batch, autocommit: %v, final: %v, numrows: %d", v.autoCommit, end == b.Length(), end-start)
193+
// Similar to tableWriterBase.finalize, we examine whether it's likely
194+
// that we'll be able to auto-commit. If it seems unlikely based on the
195+
// deadlie, we won't auto-commit which might allow the connExecutor to
196+
// get a fresh deadline before committing.
197+
autoCommit := v.autoCommit && end == b.Length() &&
198+
!v.flowCtx.Txn.DeadlineLikelySufficient()
199+
log.VEventf(ctx, 2, "copy running batch, autocommit: %v, numrows: %d", autoCommit, end-start)
194200
var err error
195-
if v.autoCommit && end == b.Length() {
201+
if autoCommit {
196202
err = v.flowCtx.Txn.CommitInBatch(ctx, kvba.Batch)
197203
} else {
198204
err = v.flowCtx.Txn.Run(ctx, kvba.Batch)

0 commit comments

Comments
 (0)