Skip to content

Conversation

@mariusae
Copy link
Member

@mariusae mariusae commented Nov 13, 2025

Stack from ghstack (oldest at bottom):

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

  1. attributing the fault to the corresponding actor in the monitored actor mesh;
  2. elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:

  1. We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
  2. We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.

Differential Revision: D86993889

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

…proc failures

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!

[ghstack-poisoned]
mariusae added a commit that referenced this pull request Nov 13, 2025
…proc failures

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!

ghstack-source-id: 323095346
Pull Request resolved: #1877
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 13, 2025
…events for proc failures"

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!

[ghstack-poisoned]
mariusae added a commit that referenced this pull request Nov 14, 2025
…proc failures

Pull Request resolved: #1877

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.
ghstack-source-id: 323362899
@exported-using-ghexport

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!
…events for proc failures"

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!

[ghstack-poisoned]
mariusae added a commit that referenced this pull request Nov 14, 2025
…proc failures

Pull Request resolved: #1877

Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.

Here, we synthesize an actor failure by:

1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure

In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.
ghstack-source-id: 323429315
@exported-using-ghexport

Differential Revision: [D86993889](https://our.internmc.facebook.com/intern/diff/D86993889/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D86993889/)!
@meta-codesync meta-codesync bot closed this in 27c98c9 Nov 15, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 15, 2025

This pull request has been merged in 27c98c9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants