Skip to content

Conversation

@Centril
Copy link
Contributor

@Centril Centril commented Nov 20, 2025

Description of Changes

Reworks how SchedulerActor::handle_queued works so that it first determines the parameters of the call to a reducer or the parameters of the call to the procedure. This also enables the removal of the special case call_scheduled_reducer.

Fixes #3645.

API and ABI breaking changes

None

Expected complexity level and risk

2

Testing

A test schedule_procedure is added.

@Centril Centril force-pushed the centril/schedule-procedures branch 2 times, most recently from 79f76cb to b31ec3a Compare November 20, 2025 23:18
@Centril Centril marked this pull request as ready for review November 20, 2025 23:18
@Centril Centril requested a review from gefjon November 20, 2025 23:18
@Centril Centril force-pushed the centril/schedule-procedures branch from b31ec3a to 3fa9650 Compare November 21, 2025 10:07
Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable to me, but I'm not super familiar with our scheduling code. I'd like to get @Shubham8287 's review before merging, if possible.

@gefjon gefjon requested a review from Shubham8287 November 21, 2025 16:03
@gefjon
Copy link
Contributor

gefjon commented Nov 21, 2025

It occurs to me - for a procedure, you have to delete the schedule row before invoking, not after. If execution aborts (host crash or w/e) midway through a scheduled procedure, it's not correct to retry that procedure on restart the way it is for a scheduled reducer, so you need the row to be gone by the time the procedure performs any observable side effect. I think that this means that we should:

  • When the schedule fires, open a mut tx.
  • Within that TX, determine whether the function is a reducer or a procedure.
  • If it's a reducer, use the mut TX to execute the reducer, as normal.
  • If it's a procedure, delete the row, commit the TX, then invoke the procedure without a TX open.

There's some additional complexity here that we'll get to if/when we want to implement on-abort handling, but we haven't been considering that part of the MVP.

@Shubham8287
Copy link
Contributor

Shubham8287 commented Nov 21, 2025

  • If it's a procedure, delete the row, commit the TX, then invoke the procedure without a TX open.

This is a atmost once gurantee then, we have to also cosider for the case when Host fatals or shutdown after row deletion, procedure will never run.

Edit: nvm, just read, you mentioned it is not required to retry for this case.

@bfops bfops added the release-any To be landed in any release window label Nov 24, 2025
@Centril Centril requested a review from gefjon November 24, 2025 23:32
Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:excellent:

@Centril Centril force-pushed the centril/schedule-procedures branch from 84decda to b5f4468 Compare November 24, 2025 23:40
@Centril Centril force-pushed the centril/schedule-procedures branch from b5f4468 to d52fcfb Compare November 25, 2025 10:31
@Centril Centril enabled auto-merge November 25, 2025 10:32
@coolreader18 coolreader18 force-pushed the centril/schedule-procedures branch from 12e5f79 to a722cab Compare November 25, 2025 18:46
@Centril Centril added release-1.10.0 and removed release-any To be landed in any release window labels Nov 25, 2025
@Centril Centril added this pull request to the merge queue Nov 26, 2025
Merged via the queue into master with commit b2e37e8 Nov 26, 2025
33 of 37 checks passed
@Centril Centril deleted the centril/schedule-procedures branch November 26, 2025 06:48
jdetter added a commit that referenced this pull request Nov 26, 2025
gefjon added a commit that referenced this pull request Nov 26, 2025
This reverts commit b2e37e8.

PR #3704 seems to have caused a nondeterministic failure in scheduling,
leading to tests hanging.
We should revert it until we can determine what's wrong,
and include #3768 in the release instead.
github-merge-queue bot pushed a commit that referenced this pull request Nov 26, 2025
This reverts commit b2e37e8.

# Description of Changes

<!-- Please describe your change, mention any related tickets, and so on
here. -->

Reverts #3704 which I'm pretty sure contains some sort of bug which is
causing the smoketests to hang.

# API and ABI breaking changes

None

<!-- If this is an API or ABI breaking change, please apply the
corresponding GitHub label. -->

# Expected complexity level and risk

1

<!--
How complicated do you think these changes are? Grade on a scale from 1
to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex
change.

This complexity rating applies not only to the complexity apparent in
the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning
ways. -->

# Testing

<!-- Describe any testing you've done, and any testing you'd like your
reviewers to do,
so that you're confident that all the changes work as expected! -->

- [x] CI passing again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Procedures: enable executing scheduled procedures

7 participants