Skip to content

Conversation

@dkalinowski
Copy link
Collaborator

🛠 Summary

Previously there was a fail during last streaming response
Now it streams correctly, but with usage=0 since it is not supported
It enables use of Continue.dev plugin with legacy pipelines

@dkalinowski dkalinowski changed the base branch from main to releases/2025/4 December 8, 2025 08:16
@dkalinowski dkalinowski changed the title Fake usage statistics for legacy text generation pipelines [2025.4.1] Fake usage statistics for legacy text generation pipelines Dec 8, 2025
@dkalinowski dkalinowski changed the title [2025.4.1] Fake usage statistics for legacy text generation pipelines [2025.4.1] Fake usage statistics (0 tokens) for legacy text generation pipelines Dec 8, 2025
}

// Disabling usage in streaming mode in legacy servable due to the issue with token counting.
// Fake usage in streaming mode in legacy servable due to the issue with token counting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO or FIXME for this?
We definitely don't want to leave such comment here for a longer period.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

ASSERT_TRUE(responses.back().find("\"finish_reason\":\"length\"") != std::string::npos);
}
// For non-continuous batching servables usage is not supported
// For non-continuous batching servables usage is faked, always returns 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I would use word fake - maybe fixed would sound better. And I would keep not supported, because that is the actual state of that feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@dkalinowski dkalinowski changed the title [2025.4.1] Fake usage statistics (0 tokens) for legacy text generation pipelines [2025.4.1] Ignore usage statistics (0 tokens) for legacy text generation pipelines Dec 8, 2025
}

// Disabling usage in streaming mode in legacy servable due to the issue with token counting.
// Fake usage in streaming mode in legacy servable due to the issue with token counting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Fake usage in streaming mode in legacy servable due to the issue with token counting.
// Usage in streaming mode in legacy servable is not supported yet hence we report 0 as a missing metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants