Skip to content

Commit d1faa08

Browse files
craig[bot]DarrylWong
andcommitted
Merge #155543
155543: server: don't exit auto-upgrade if own node isn't live r=stevendanna,cthumuluru-crdb a=DarrylWong The auto upgrade loop previously eagerly exited early if all nodes were on the same version. This optimized for the common case where we are not attempting an upgrade, and don't care if a node is currently down. However, we don't want to exit the auto upgrade process if we can't validate our own version. Consider the case where the meta1 leaseholder is the first node to restart to the new version but has transient liveness issues. If we skip the check here, we will see all other nodes at the same (old) version and exit the auto upgrade process with UpgradeAlreadyCompleted. Since the meta1 leaseholder is the only node that can perform the upgrade, this indefinitely blocks the upgrade. Fixes: #155359 Release note: none Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
2 parents ad9b0b2 + 12d73bb commit d1faa08

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

pkg/server/auto_upgrade.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,15 @@ func (s *topLevelServer) upgradeStatus(
183183
if notRunningErr == nil {
184184
notRunningErr = errors.Errorf("node %d not running (%d), cannot determine version", nodeID, st)
185185
}
186+
// However, we don't want to exit the auto upgrade process if we can't validate
187+
// our own version. Consider the case where the meta1 leaseholder is the first
188+
// node to restart to the new version but has transient liveness issues. If we skip
189+
// the check here, we will see all other nodes at the same (old) version and exit
190+
// the auto upgrade process with UpgradeAlreadyCompleted. Since the meta1 leaseholder
191+
// is the only node that can perform the upgrade, this indefinitely blocks the upgrade.
192+
if s.node.Descriptor.NodeID == nodeID {
193+
return UpgradeBlockedDueToError, errors.Errorf("node %d not running (%d), cannot determine version", nodeID, st)
194+
}
186195
continue
187196
}
188197

0 commit comments

Comments
 (0)