You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
propagate proc status in supervision events for proc failures (#1877)
Summary:
Pull Request resolved: #1877
Currently, we report the agent to be Stopped. This is accurate, but confusing, and could be better attributed.
Here, we synthesize an actor failure by:
1) attributing the fault to the corresponding actor in the monitored actor mesh;
2) elevating the proc_status (which contains mode of failure, exit code, etc) into the actor failure, making it clear it is a process failure
In the future:
1) We will have a more general "Failure" struct, explicitly capturing host, proc, actor, etc., failures.
2) We will attribute the actor failure (it is the most proximate), but move the proc failure to the "cause" (i.e., proc failure caused actor to fail), which is the most correct and clear.
ghstack-source-id: 323429315
exported-using-ghexport
Reviewed By: dulinriley
Differential Revision: D86993889
fbshipit-source-id: 03578a230d155a4a9307e7468b00482ce9f36e98
0 commit comments