-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Frontend] Perform offline path replacement to tokenizer
#29706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When HF_HUB_OFFLINE is true, vLLM always rewrites the model ID/path to the corresponding local path. However, this does not mean the ID/path is necessarily altered. If `model` already points to a local directory or a GGUF file, the value remains unchanged and there is no need to inform the user via the log. This change updates the offline-conversion logging to emit a message only when the value of `model` actually changes. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
If `tokenizer` is a Hugging Face model, vLLM attempts to access Hugging Face even if the tokenizer is already available offline. It prevents specifying a Hugging Face model as `tokenizer` on the offline mode (i.e. `HF_HUB_OFFLINE` is true). With this commit, vLLM performs offline path replacement also on `tokenizer`, not only on `model`. A test case is added because error related to the offline mode only occurs when the model and the tokenizer are different. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses an issue with using a Hugging Face model as a tokenizer in offline mode. The changes ensure that when HF_HUB_OFFLINE is set, the tokenizer path is also resolved to a local cache path, preventing unnecessary and failing network requests. The logic is sound, and the addition of conditional logging to report path replacements only when they occur is a nice improvement for clarity. The new test case in test_offline_mode.py effectively covers the scenario and validates the fix. Overall, this is a well-executed change that improves the offline capabilities of vLLM.
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
|
Known failing test on main |
Purpose
If
tokenizeris a Hugging Face model, vLLM attempts to access Hugging Face even if the tokenizer is already available offline.It prevents specifying a Hugging Face model as
tokenizeron the offline mode (i.e.HF_HUB_OFFLINEis true).It also tweaks when the offline mode path replacement log is emitted in a separate commit: only when model/tokenizer value changes. This is because it's not helpful to log the replacement when this value is unchanged (e.g. the model is a local GGUF file).
Test Plan
tests/entrypoints/offline_mode/test_offline_mode.pyis modified to test this case. Use this forpytestbased tests.For easy comparison, I'll attach
vllm serve-based reproduction here.vllm servereproduction: Setupvllm servereproduction: MainTest Result
Before
After
A warning in the middle seems irrelevant:
flash_attnto be installed on ROCm.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.