You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The executor_status_looper spend CPU time polling at the number of
tokens. Because the function is protected by mutex inside, this also
interferes with the Executor.
Because now the TensorRtLlmBackendImpl is interior mutable, we can mark
it as `Send` and share it in multiple threads. Therefore, the loop can
be split into request and response parts, and we can await for tokens
instead of constantly polling.
0 commit comments