You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[webgpu] Fix poor performance in flash attention for Qualcomm devices (microsoft#25730)
It seems that when multiple threads in one subgroup access the same
shared memory location, the performance is poor on Qualcomm devices
(bank conflicts?). If we limit the number of threads accessing the same
memory location, the performance is greatly improved on Qualcomm
devices.
Phi4 becomes ~10s from ~13s on QC Adreno X1-85 (31.0.112.0).
0 commit comments