You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
optimization: move set_metadata out of main stream (#5082)
Summary:
Pull Request resolved: #5082
X-link: https://github.com/facebookresearch/FBGEMM/pull/2090
with feature score eviction, tbe will call backend to update feature score metadata separately in forward pass.
this process is designed for asynchronous update without blocking forward/backward pass, however the cpu blocking operation blocked the main stream, so after get_cuda, all2all cannot be started immediately.
from dummy profile, we can see this trace:
{F1983224804}
the set metadata operation becomes a blocker in critical path, which took 217ms
With this change, we can see the trace is updated to:
{F1983224830}
where overall prefetch is reduced to less than 70ms, also the get_cuda is followed by all2all immediately without other waiting and stream sync
https://www.internalfb.com/ai_infra/zoomer/profiling-run/overview?profilingRunID=1913270729575721
Reviewed By: steven1327, kathyxuyy
Differential Revision: D86013406
fbshipit-source-id: 2fad88bd17d8e83104706540cfcd3311545af613
0 commit comments