Commit 14f7a6e
[SPARK-54456][PYTHON] Import worker module after fork to avoid deadlock
### What changes were proposed in this pull request?
We lazy-import the worker module after fork to avoid potential deadlock caused by importing some modules that spawns multiple threads.
### Why are the changes needed?
https://discuss.python.org/t/switching-default-multiprocessing-context-to-spawn-on-posix-as-well/21868
It's impossible to do a thread-safe fork in CPython. CPython started issuing warnings from 3.12 and switched the default `multiprocessing` start method to "spawn" since 3.14.
It would be a huge effort for us to give up `fork` entirely, but we can try out best to not import random modules before fork by lazy-importing worker module after fork.
We already have some workers that import dangerous libraries like `pyarrow` - `plan_data_source_read` for example.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI should pass.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#53166 from gaogaotiantian/move-worker-module-import.
Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>1 parent 3fbca58 commit 14f7a6e
1 file changed
+13
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | 34 | | |
43 | 35 | | |
44 | 36 | | |
| |||
78 | 70 | | |
79 | 71 | | |
80 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
81 | 86 | | |
82 | 87 | | |
83 | 88 | | |
| |||
0 commit comments