You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Increase socket buffer size to allow ProcessGroup init up to 12k ranks
Summary:
The c10d socket and gloo listener both set their buffer size to 2048 which causes connection issue at 4k scale. This diff sets the buffer size to `-1` which uses `somaxconn` as the actual buffer size, aiming to enable 24k PG init without crash. The experiment shows the ability to successful creation of 12k ranks without crash.
split the original diff for OSS vs. internal.
Caution: we need the change on both gloo and c10d to enable 12k PG init. Updating only one side may not offer the benefit.
Reviewed By: wconstab, bmaurer
Differential Revision: D48617912
fbshipit-source-id: 3ba40d1b94c113a268ded0ea8f51a03daa1233d3
0 commit comments