[mgs] change SpPoller MissedTickBehavior to Skip (#9429)

hawkw · web-flow · commit ef9510deb9fd · 2025-11-20T20:21:55.000Z
@jgallagher points out to me that the `SpPoller` in MGS' SP metrics module uses a `tokio::time::interval` with the default `MissedTickBehavior`, which is [`Burst`]. This means that if we miss a one-second polling interval because, say, MGS was busy servicing other requests, or due to the vagaries of host OS scheduling, we'll send a burst of SP metrics requests right after each other. This is probably not the ideal behavior here, since the goal is just to do one poll every second. Thus, this commit changes the `SpPoller`'s `Interval` to use the [`Skip`] `MissedTickBehavior`. This way, MGS will perform *up to* one poll per second, with potential gaps if MGS was too busy to poll the SP within a given second. This seems less overwhelming for the poor service processor network stack, in the case that MGS was delayed. Fixes #9428 [`Burst`]: https://docs.rs/tokio/latest/tokio/time/enum.MissedTickBehavior.html#variant.Burst [`Skip`]: https://docs.rs/tokio/latest/tokio/time/enum.MissedTickBehavior.html#variant.Skip
diff --git a/gateway/src/metrics.rs b/gateway/src/metrics.rs
@@ -461,6 +461,19 @@ async fn start_pollers(
 impl SpPoller {
     async fn run(mut self, apictx: Arc<ServerContext>) {
         let mut interval = tokio::time::interval(SP_POLL_INTERVAL);
+        // The goal is to poll each SP once per SP_POLL_INTERVAL. If we miss
+        // that interval because, say, MGS was busy doing other things, the
+        // network was congested, or due to vagaries of OS scheduling, we don't
+        // want to issue a whole bunch of polls with less than a second part.
+        // This is what we would get with the default,
+        // `MissedTickBehavior::Burst`. Instead, configure the `Interval` to
+        // skip missed ticks, so that we poll the SP a maximum of once per
+        // second, with potential gaps if MGS was not able to poll the SP within
+        // a given second. This should mean that if MGS is delayed, we don't
+        // overwhelm the poor SP's network stack with a big burst of requests.
+        interval
+            .set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip);
+
         let switch = &apictx.mgmt_switch;
         let sp = match switch.sp(self.spid) {
             Ok(sp) => sp,