swarm - switch to handoff node only after current node stops #1147

pgrayy · 2025-11-07T03:53:50Z

Description

Set the handoff node to current only after the current node finishes. Currently, we make the switch in the middle of the current node execution. It is important to fix this for a few reasons:

We emit the AfterNodeCallEvent with the current node id and state.current_node set to the handoff node. This is going to cause customer confusion.
If the current node runs a tool that is interrupted in parallel (concurrently) to the hand off tool, the swarm state will be invalid. The swarm state needs a reference to the real current node so that users can properly respond to its interrupts and resume execution.

Related Issues

#204

Documentation PR

Implementation detail

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare: Relying on existing unit tests
I ran hatch test tests_integ/test_multiagent_swarm.py

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2025-11-07T03:55:03Z

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/strands/multiagent/swarm.py	90.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

pgrayy · 2025-11-07T03:58:31Z

src/strands/multiagent/swarm.py

                    logger.debug("reason=<%s> | stopping execution", reason)
                    break

-                # Get current node


I removed a few inline comments because I felt the code was already self explanatory.

pgrayy · 2025-11-07T04:00:19Z

src/strands/multiagent/swarm.py

                    self.state.node_history.append(current_node)
-
-                    #  After self.state add current node, swarm state finish updating, we persist here
                    self.hooks.invoke_callbacks(AfterNodeCallEvent(self, current_node.node_id, invocation_state))


To reiterate, setting self.state.current_node = handoff_node in the handoff tool means that AfterNodeCallEvent is emitted with a current node_id that does not match the self.state.current_node.node_id.

Also, for supporting interrupts, we can't have self.state.current_node update to the handoff node if the current node is interrupted.

pgrayy · 2025-11-07T04:14:41Z

src/strands/session/session_manager.py


        registry.add_callback(MultiAgentInitializedEvent, lambda event: self.initialize_multi_agent(event.source))
-        registry.add_callback(AfterNodeCallEvent, lambda event: self.sync_multi_agent(event.source))
+        registry.add_callback(BeforeNodeCallEvent, lambda event: self.sync_multi_agent(event.source))


Let's say we have successfully executed one node and are now executing the handoff node. If we crash on the handoff node, we would be left in different states depending on which event we persist on:

AfterNodeCallEvent: Current node is not set to the handoff node in session because the handoff node hasn't yet emitted its AfterNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the first node.

BeforeNodeCallEvent: Current node is set to the handoff node in session because the handoff node already emitted its BeforeNodeCallEvent. This means if we resume after crashing on the handoff node, we will be starting again from the handoff node.

In short, persisting on AfterNodeCallEvent only made sense when setting the current node to the handoff in the handoff tool.

swarm - switch to handoff node only after current node stops

1e58e59

github-actions bot added the size/s label Nov 7, 2025

pgrayy temporarily deployed to auto-approve November 7, 2025 03:54 — with GitHub Actions Inactive

pgrayy commented Nov 7, 2025

View reviewed changes

pgrayy marked this pull request as ready for review November 7, 2025 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

swarm - switch to handoff node only after current node stops #1147

swarm - switch to handoff node only after current node stops #1147

Uh oh!

pgrayy commented Nov 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 7, 2025

Uh oh!

pgrayy Nov 7, 2025

Uh oh!

pgrayy Nov 7, 2025

Uh oh!

pgrayy Nov 7, 2025

Uh oh!

pgrayy Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

swarm - switch to handoff node only after current node stops #1147

Are you sure you want to change the base?

swarm - switch to handoff node only after current node stops #1147

Uh oh!

Conversation

pgrayy commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

codecov bot commented Nov 7, 2025

Codecov Report

Uh oh!

pgrayy Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pgrayy Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pgrayy Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

pgrayy Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pgrayy commented Nov 7, 2025 •

edited

Loading