-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[BugFix] Fix PP performance and PP kv connector output regression #28768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two main changes: a performance improvement by removing granular tracing context managers, and a fix for KV connector output propagation in pipeline parallel setups. The removal of record_function_or_nullcontext should improve performance and code readability. The refactoring of the pipeline parallelism logic in step_with_batch_queue to be more asynchronous is also a good improvement. However, I have a critical concern regarding the new mechanism for propagating kv_connector_output from non-final pipeline parallel ranks. The logic has been moved from execute_model to sample_tokens, but it seems sample_tokens is only executed on the final rank. This could lead to the loss of kv_connector_output from other ranks, breaking the feature. Please see my detailed comment.
bf6ecea to
9cde494
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
ec1339c to
818983d
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
|
@weireweire please check if this works. Thanks! |
|
Have you tested lm_eval,I also found PP have accuracy issue, not sure if it's related to #26866 |
|
From my test, this fixed the overlap issue but the accuracy is still bad. My command: This result is same as my draft fix , must be somewhere else that caused the accuracy issue. |
|
accuracy issue tracked #28839 |
|
Thanks @weireweire, I think the accuracy issue is probably separate to this, we'll investigate that too. |
| logger.info("Batch queue is enabled with size %d", self.batch_queue_size) | ||
| self.batch_queue = deque(maxlen=self.batch_queue_size) | ||
|
|
||
| self.ec_producer = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| self.ec_producer = ( | |
| self.is_ec_producer = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open follow-on PR so that this doesn't hold up the release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon opened follow-on #28884 for this.
…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>
…8768) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>
…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>
…lm-project#28768) Signed-off-by: Nick Hill <nhill@redhat.com>
This fixes two pipeline-parallel related issues introduced by #26866:
The first commit just reverts the granular tracing context managers added in #28329 which make the code very difficult to read/update. Please just look at the changes in the second commit which are smaller/cleaner.
Fixes #28270
Thanks to @weireweire for reporting.