Skip to content

Commit 9175a36

Browse files
pavanbalajimeta-codesync[bot]
authored andcommitted
Change timeout thread checks to 60 seconds
Summary: The timeout thread and the main thread contend for a lock to access the work objects. Having the timeout thread check too frequently can cause lock contention. Reduce check frequency to minimize lock contention. Reviewed By: tanquer Differential Revision: D85455461 fbshipit-source-id: 84576b79acc2ccb5fedbfb6eb95a9be89d8bcb5c
1 parent 03da441 commit 9175a36

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

comms/torchcomms/ncclx/TorchCommNCCLXUtils.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -181,9 +181,10 @@ void TorchCommNCCLX::timeoutWatchdog() noexcept {
181181
{
182182
std::unique_lock<std::mutex> lock(timeout_mutex_);
183183
// Wait for a shorter interval to check work objects periodically
184-
// Wake up either after 1 second or immediately if shutdown is requested
185-
timeout_cv_.wait_for(
186-
lock, std::chrono::seconds(1), [this]() { return shutdown_.load(); });
184+
// Wake up either after 60 seconds or immediately if shutdown is requested
185+
timeout_cv_.wait_for(lock, std::chrono::seconds(60), [this]() {
186+
return shutdown_.load();
187+
});
187188

188189
// If we're shutting down, exit the loop
189190
if (shutdown_) {

0 commit comments

Comments
 (0)