-
Notifications
You must be signed in to change notification settings - Fork 130
fix(guard): handle websocket tasks during shutdown #3448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-10-fix_improve_sigterm_handling_for_the_entire_runtime
Are you sure you want to change the base?
fix(guard): handle websocket tasks during shutdown #3448
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR Review: fix(guard): handle websocket tasks during shutdownSummaryThis PR introduces a Code Quality & Best Practices ✅Positives:
Minor Issues:
Potential Bugs or Issues
|
More templates
@rivetkit/actor
@rivetkit/cloudflare-workers
@rivetkit/core
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
bc86aa3 to
b7ad73c
Compare
PR Review: fix(guard): handle websocket tasks during shutdownSummaryThis PR introduces a Positive Aspects✅ Clean abstraction: The Issues & Recommendations🔴 Critical: Panic Handling (task_group.rs:27)The TODO comment indicates panics aren't handled. If a spawned task panics, the counter will never decrement, causing Recommendation: Wrap the future execution to catch panics: pub fn spawn<F, O>(self: &Arc<Self>, fut: F)
where
F: Future<Output = O> + Send + 'static,
{
self.running_count.fetch_add(1, Ordering::Relaxed);
let self2 = self.clone();
tokio::spawn(
async move {
let _guard = scopeguard::guard((), |_| {
if self2.running_count.fetch_sub(1, Ordering::AcqRel) == 1 {
self2.notify.notify_waiters();
}
});
fut.await;
}
.in_current_span(),
);
}Or use
|
b7ad73c to
a353c1b
Compare
7aff577 to
9f71238
Compare
PR Review: fix(guard): handle websocket tasks during shutdownSummaryThis PR improves the guard service's shutdown handling by properly tracking and waiting for background WebSocket and analytics tasks to complete. The implementation adds a new Code Quality & ArchitectureStrengths:
Issues Found: 1. Race Condition in TaskGroup (Critical)
There's a potential race condition in loop {
self.notify.notified().await;
if self.running_count.load(Ordering::Acquire) == 0 {
break;
}
}Problem: If all tasks complete between the initial fast-path check (line 45) and calling Solution: Subscribe to notifications BEFORE checking the count: pub async fn wait_idle(&self) {
loop {
let notified = self.notify.notified();
if self.running_count.load(Ordering::Acquire) == 0 {
return;
}
notified.await;
}
}This ensures we're listening for notifications before checking the count, preventing the race. 2. Missing Panic Handling (High Priority)
The TODO comment indicates panic handling is missing. If a spawned task panics:
Recommendation: Wrap the future in a panic-catching wrapper or use pub fn spawn<F, O>(self: &Arc<Self>, fut: F)
where
F: Future<Output = O> + Send + 'static,
{
self.running_count.fetch_add(1, Ordering::Relaxed);
let self2 = self.clone();
tokio::spawn(
async move {
let _guard = scopeguard::guard((), |_| {
if self2.running_count.fetch_sub(1, Ordering::AcqRel) == 1 {
self2.notify.notify_waiters();
}
});
fut.await;
}
.in_current_span(),
);
}Or use the 3. Ordering Consistency (Low Priority)
The increment uses 4. Removed defer! Macro Without Cleanup
The PR removes the Good: The change to spawn cleanup after the request completes is correct - it prevents blocking the request response. Action needed: The old macro definition should be removed if it's no longer used anywhere. Performance ConsiderationsPositive:
Concern:
Security ConcernsNo security issues identified. The changes don't introduce new attack surfaces or bypass existing security measures. Test CoverageMissing tests for:
Recommendation: Add integration tests in
Minor Issues
RecommendationsMust Fix:
Should Fix: Optional: VerdictThis is a solid improvement to shutdown handling, but the race condition and missing panic handling are blockers that must be addressed before merging. Once those critical issues are fixed, this will significantly improve the reliability of the guard service's shutdown process. The overall architecture is sound and follows best practices. Great work on identifying and solving the WebSocket task tracking problem! |
PR Review: fix(guard): handle websocket tasks during shutdownSummaryThis PR addresses a critical issue where WebSocket and other spawned tasks were not being properly tracked during guard shutdown, potentially leading to abrupt termination of in-flight operations. The solution introduces a Code Quality & Best Practices ✅Strengths:
Minor improvements:
Potential Issues & Bugs 🔍1. Race condition in TaskGroup::wait_idle() - CRITICALLocation: There's a potential race condition between lines 45-46 (fast path check) and lines 50-54 (notification loop). If a task increments the counter between the fast-path check and the Suggested fix: pub async fn wait_idle(&self) {
loop {
// Subscribe to notifications *before* checking the count
let notified = self.notify.notified();
if self.running_count.load(Ordering::Acquire) == 0 {
return;
}
notified.await;
}
}2. Memory ordering inconsistencyLocation: Using self.running_count.fetch_add(1, Ordering::Release);3. Panic handling TODO - CRITICALLocation: If a spawned future panics, the counter won't be decremented, causing 4. Defer macro removalLocation: Please verify no other files use this macro before removing it. Performance Considerations ⚡Positive:
Potential concerns:
Security Concerns 🔒No critical security issues identified The existing shutdown timeout mechanism properly prevents indefinite hangs. The changes ensure WebSocket connections are properly tracked during shutdown. Test Coverage 🧪Missing: No unit tests for Recommended:
Additional Observations
RecommendationOverall assessment: Well-implemented fix for a real issue. Core logic is sound. Action items before merge:
Great work on identifying and addressing this shutdown issue! 🎉 |

No description provided.