Commit 6976258
Add
Summary:
Pull Request resolved: #1777
Sometimes when we send a message, we want it to be fully fire-and-forget,
including if the destination is not even reachable. This is typically only used in
scenarios like:
* When shutting down the system, we try to ask a process to nicely shut itself down
before ungracefully killing it. If the message is undeliverable, we can just proceed with
killing the process (it's probably already dead anyways)
* Replying to a message. If the sender is down, there's nothing the current actor can do about it
This should be used sparingly as it could hide real errors, like your messages not getting sent.
This diff adds a `return_undeliverable: bool` property on `MessageEnvelope` and `PortRef`. When the property is set on `PortRef`, any `MessageEnvelope` sent via that `PortRef` will have an equivalent value for `return_undeliverable`. Any envelope with `return_undeliverable == true` will not be returned to its sender on delivery failure.
This is useful for messages like `GetRankStatus` and `GetState`, where the receiver shouldn't fail its reply fails to be delivered. It is also useful during proc termination, when the host mesh agent sends `StopAll` to the proc mesh agent; if the proc mesh agent is already dead, the message won't be delivered, but that shouldn't crash the host mesh agent.
Unrelatedly, this diff also fixes a race condition with host/proc mesh shutdown vs. tensor engine shutdown. Basically, `DeviceMesh.exit` was sending a fire-and-forget `WorkerMessage::Exit` via `Controller.drain_and_stop()`. But if you simultaneously try to shut down the host/proc mesh, then the worker exit message might fail to deliver, crashing the process. With this diff, `Controller.drain_and_stop()` synchronously calls `ActorMesh::stop` on the worker actor mesh so that there can't be a race with host/proc mesh shutdown (at least not from the same thread).
ghstack-source-id: 322545046
exported-using-ghexport
Reviewed By: mariusae, dulinriley, shayne-fletcher
Differential Revision: D86315780
fbshipit-source-id: 06c4aa92331e7f11c64f1ea8b13c52c2e7f0c153return_undeliverable property to PortRef and MessageEnvelope, and fix race in tensor engine shutdown (#1777)1 parent b72dbe4 commit 6976258
File tree
14 files changed
+183
-68
lines changed- hyperactor_mesh/src
- v1
- hyperactor/src
- mailbox
- monarch_extension/src
- python/monarch
- _rust_bindings/monarch_extension
14 files changed
+183
-68
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
| |||
80 | 81 | | |
81 | 82 | | |
82 | 83 | | |
83 | | - | |
| 84 | + | |
84 | 85 | | |
85 | 86 | | |
86 | 87 | | |
| |||
97 | 98 | | |
98 | 99 | | |
99 | 100 | | |
100 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
101 | 104 | | |
102 | 105 | | |
103 | 106 | | |
| |||
106 | 109 | | |
107 | 110 | | |
108 | 111 | | |
| 112 | + | |
109 | 113 | | |
110 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
111 | 123 | | |
112 | 124 | | |
113 | | - | |
| 125 | + | |
114 | 126 | | |
115 | 127 | | |
116 | 128 | | |
| |||
135 | 147 | | |
136 | 148 | | |
137 | 149 | | |
138 | | - | |
| 150 | + | |
139 | 151 | | |
140 | 152 | | |
141 | 153 | | |
| |||
154 | 166 | | |
155 | 167 | | |
156 | 168 | | |
157 | | - | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
158 | 172 | | |
159 | 173 | | |
160 | 174 | | |
| |||
197 | 211 | | |
198 | 212 | | |
199 | 213 | | |
200 | | - | |
| 214 | + | |
201 | 215 | | |
202 | 216 | | |
203 | 217 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
216 | 220 | | |
217 | 221 | | |
218 | 222 | | |
| |||
226 | 230 | | |
227 | 231 | | |
228 | 232 | | |
| 233 | + | |
| 234 | + | |
229 | 235 | | |
230 | 236 | | |
231 | 237 | | |
| |||
248 | 254 | | |
249 | 255 | | |
250 | 256 | | |
| 257 | + | |
| 258 | + | |
251 | 259 | | |
252 | 260 | | |
253 | 261 | | |
| |||
333 | 341 | | |
334 | 342 | | |
335 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
336 | 351 | | |
337 | 352 | | |
338 | 353 | | |
| |||
393 | 408 | | |
394 | 409 | | |
395 | 410 | | |
| 411 | + | |
396 | 412 | | |
397 | 413 | | |
398 | 414 | | |
| |||
402 | 418 | | |
403 | 419 | | |
404 | 420 | | |
| 421 | + | |
405 | 422 | | |
406 | 423 | | |
407 | 424 | | |
| |||
414 | 431 | | |
415 | 432 | | |
416 | 433 | | |
| 434 | + | |
417 | 435 | | |
418 | 436 | | |
419 | 437 | | |
| |||
423 | 441 | | |
424 | 442 | | |
425 | 443 | | |
| 444 | + | |
426 | 445 | | |
427 | 446 | | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
428 | 451 | | |
429 | 452 | | |
430 | 453 | | |
| |||
452 | 475 | | |
453 | 476 | | |
454 | 477 | | |
| 478 | + | |
455 | 479 | | |
456 | 480 | | |
457 | 481 | | |
| |||
1534 | 1558 | | |
1535 | 1559 | | |
1536 | 1560 | | |
| 1561 | + | |
1537 | 1562 | | |
1538 | 1563 | | |
1539 | 1564 | | |
| |||
1562 | 1587 | | |
1563 | 1588 | | |
1564 | 1589 | | |
| 1590 | + | |
1565 | 1591 | | |
1566 | 1592 | | |
1567 | 1593 | | |
| |||
3340 | 3366 | | |
3341 | 3367 | | |
3342 | 3368 | | |
3343 | | - | |
| 3369 | + | |
3344 | 3370 | | |
3345 | 3371 | | |
3346 | | - | |
| 3372 | + | |
3347 | 3373 | | |
3348 | 3374 | | |
3349 | 3375 | | |
3350 | 3376 | | |
3351 | | - | |
| 3377 | + | |
3352 | 3378 | | |
3353 | 3379 | | |
3354 | 3380 | | |
| |||
3470 | 3496 | | |
3471 | 3497 | | |
3472 | 3498 | | |
3473 | | - | |
| 3499 | + | |
3474 | 3500 | | |
3475 | 3501 | | |
3476 | 3502 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
102 | | - | |
103 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
104 | 106 | | |
105 | 107 | | |
106 | 108 | | |
| |||
109 | 111 | | |
110 | 112 | | |
111 | 113 | | |
112 | | - | |
| 114 | + | |
113 | 115 | | |
114 | 116 | | |
115 | 117 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1036 | 1036 | | |
1037 | 1037 | | |
1038 | 1038 | | |
1039 | | - | |
| 1039 | + | |
1040 | 1040 | | |
1041 | 1041 | | |
1042 | 1042 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
891 | 891 | | |
892 | 892 | | |
893 | 893 | | |
894 | | - | |
| 894 | + | |
895 | 895 | | |
896 | 896 | | |
897 | 897 | | |
| |||
904 | 904 | | |
905 | 905 | | |
906 | 906 | | |
907 | | - | |
| 907 | + | |
908 | 908 | | |
909 | 909 | | |
910 | 910 | | |
| |||
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
| 917 | + | |
917 | 918 | | |
918 | | - | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
919 | 925 | | |
920 | 926 | | |
921 | 927 | | |
| |||
964 | 970 | | |
965 | 971 | | |
966 | 972 | | |
| 973 | + | |
967 | 974 | | |
968 | 975 | | |
969 | 976 | | |
| |||
975 | 982 | | |
976 | 983 | | |
977 | 984 | | |
| 985 | + | |
978 | 986 | | |
979 | 987 | | |
980 | 988 | | |
| |||
986 | 994 | | |
987 | 995 | | |
988 | 996 | | |
| 997 | + | |
989 | 998 | | |
990 | 999 | | |
991 | 1000 | | |
| |||
1052 | 1061 | | |
1053 | 1062 | | |
1054 | 1063 | | |
1055 | | - | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
1056 | 1070 | | |
1057 | 1071 | | |
1058 | 1072 | | |
1059 | 1073 | | |
1060 | 1074 | | |
1061 | 1075 | | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
1062 | 1082 | | |
1063 | 1083 | | |
1064 | 1084 | | |
| |||
1068 | 1088 | | |
1069 | 1089 | | |
1070 | 1090 | | |
| 1091 | + | |
1071 | 1092 | | |
1072 | 1093 | | |
1073 | 1094 | | |
| |||
1080 | 1101 | | |
1081 | 1102 | | |
1082 | 1103 | | |
1083 | | - | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
1084 | 1110 | | |
1085 | 1111 | | |
1086 | 1112 | | |
| |||
1095 | 1121 | | |
1096 | 1122 | | |
1097 | 1123 | | |
| 1124 | + | |
1098 | 1125 | | |
1099 | 1126 | | |
1100 | 1127 | | |
| |||
1111 | 1138 | | |
1112 | 1139 | | |
1113 | 1140 | | |
| 1141 | + | |
1114 | 1142 | | |
1115 | 1143 | | |
1116 | 1144 | | |
| |||
1163 | 1191 | | |
1164 | 1192 | | |
1165 | 1193 | | |
1166 | | - | |
| 1194 | + | |
1167 | 1195 | | |
1168 | 1196 | | |
1169 | 1197 | | |
| |||
0 commit comments