Skip to content

[Performance]: Option for disabling model info collection in subprocess #19317

@ibl-g

Description

@ibl-g

Proposal to improve performance

By default vLLM collects model support info in a single sub process per model
(added in in #9233). Specifically, this
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/registry.py#L336
_run_in_subprocess call.

This adds ~4s when running against local ssd and can easily be double or more
against a network filesystem in some environments. Collecting the info
in-process does not seem to have adverse effects, at least based on my limited
manual testing, but I lack context on why this was done in the first place.

Can we make this behaviour configurable via a boolean flag or env var? That way
users could opt out.

collect_model_info_via_subprocess = True

Something like

if self.model_config.collect_model_info_via_subprocess:
    return _run_in_subprocess(
        lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
return _ModelInfo.from_model_cls(self.load_model_cls())

Image

Show the latency in "inspect-model" span based on my local wip otel tracing of start up

CC @DarkLight1337

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issuesstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions