Skip to content

Commit ef9510d

Browse files
authored
[mgs] change SpPoller MissedTickBehavior to Skip (#9429)
@jgallagher points out to me that the `SpPoller` in MGS' SP metrics module uses a `tokio::time::interval` with the default `MissedTickBehavior`, which is [`Burst`]. This means that if we miss a one-second polling interval because, say, MGS was busy servicing other requests, or due to the vagaries of host OS scheduling, we'll send a burst of SP metrics requests right after each other. This is probably not the ideal behavior here, since the goal is just to do one poll every second. Thus, this commit changes the `SpPoller`'s `Interval` to use the [`Skip`] `MissedTickBehavior`. This way, MGS will perform *up to* one poll per second, with potential gaps if MGS was too busy to poll the SP within a given second. This seems less overwhelming for the poor service processor network stack, in the case that MGS was delayed. Fixes #9428 [`Burst`]: https://docs.rs/tokio/latest/tokio/time/enum.MissedTickBehavior.html#variant.Burst [`Skip`]: https://docs.rs/tokio/latest/tokio/time/enum.MissedTickBehavior.html#variant.Skip
1 parent d32777c commit ef9510d

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

gateway/src/metrics.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,19 @@ async fn start_pollers(
461461
impl SpPoller {
462462
async fn run(mut self, apictx: Arc<ServerContext>) {
463463
let mut interval = tokio::time::interval(SP_POLL_INTERVAL);
464+
// The goal is to poll each SP once per SP_POLL_INTERVAL. If we miss
465+
// that interval because, say, MGS was busy doing other things, the
466+
// network was congested, or due to vagaries of OS scheduling, we don't
467+
// want to issue a whole bunch of polls with less than a second part.
468+
// This is what we would get with the default,
469+
// `MissedTickBehavior::Burst`. Instead, configure the `Interval` to
470+
// skip missed ticks, so that we poll the SP a maximum of once per
471+
// second, with potential gaps if MGS was not able to poll the SP within
472+
// a given second. This should mean that if MGS is delayed, we don't
473+
// overwhelm the poor SP's network stack with a big burst of requests.
474+
interval
475+
.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Skip);
476+
464477
let switch = &apictx.mgmt_switch;
465478
let sp = match switch.sp(self.spid) {
466479
Ok(sp) => sp,

0 commit comments

Comments
 (0)