Skip to content

Conversation

@laithsakka
Copy link
Contributor

@laithsakka laithsakka commented Oct 23, 2025

evaluate_guards is A debug mode to detect and fail if Dynamo ever specializes a dynamic shape by
guarding on it. When True, dynamic shape guards are not dropped from Dynamo.
And a failure will be triggered if recompilation ever happens due to that.

Enabling this allow observing the dynamic shapes guards in the tl-parse artifact.

One exception that evaluate_guards wont verify is counter 0/1 specializations.
Backed dynamic shapes by default will assume sizes >=2. we will not check nor fail
if that guard fails for 0/1 inputs sizes.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable evaluate_guards option for debugging dynamic shapes in torch.compile, which will be very helpful for understanding and debugging recompilations. The implementation includes a significant and well-executed refactoring of the compilation wrapper and decorator. The new tests are comprehensive and effectively validate the new functionality. I've identified a critical bug in vllm/compilation/backends.py due to an incorrect comparison, and a minor redundancy in vllm/config/compilation.py. Once these issues are addressed, this PR will be a solid contribution.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hmellor
Copy link
Member

hmellor commented Oct 24, 2025

Note : Only last(top) commit belongs to this PR

Which PR is this built on top of?

@mergify
Copy link

mergify bot commented Oct 24, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laithsakka.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 24, 2025
@laithsakka
Copy link
Contributor Author

Note : Only last(top) commit belongs to this PR

Which PR is this built on top of?

#26199
and

#25110

@hmellor
Copy link
Member

hmellor commented Oct 30, 2025

Ok, should we wait for those to merge before reviewing this one?

backed/unbacked.
"""

evaluate_guards: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I want to get into a state where this is always on during vLLM warmup, and then optionally during runtime. This involves making sure the guards are actually correct (maybe this requires unbacked). Not for this PR, but we should do this in a follow-up.

options["guard_filter_fn"] = lambda x: [False for _ in x]
if vllm_config.compilation_config.dynamic_shapes_config.evaluate_guards:
options["guard_filter_fn"] = lambda x: [
entry.guard_type == "SHAPE_ENV" for entry in x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems brittle..
can we add a test in pytorch for this or something? "if you change how the dynamic shape guards look then you need to change vLLM"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh we shall if we dont have any. I will file issue and assign to me.

f"when shape changes from 10 to 5, but the "
f"model ran successfully without error."
)
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a more specific exception we can add?

else:
print(f"❌ {config_desc}: Unexpected failure")
print(f" Error: {e}")
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that we dropped 0/1 specialization in backed mode?

Comment on lines 217 to 224
if "guard" in error_msg.lower() or "recompile" in error_msg.lower():
print(f"✅ {config_desc}: Expected failure due to guard violation")
print(f" Error (truncated): {error_msg[:150]}")
else:
# Unexpected error type
print(f"❌ {config_desc}: Unexpected error type")
print(f" Error: {e}")
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests aren't really supposed to print. why not pytest.fail for the failure case?

Copy link
Collaborator

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable, just had testing qs

@mergify mergify bot removed tpu Related to Google TPUs needs-rebase labels Nov 26, 2025

model(input1)

if evaluate_guards and not is_01_specialization:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use with pytest.raises()?

# 2 for any, we assume it's 0.
for s, r in fake_mode.shape_env.var_to_range.items():
if r.lower == 2:
fake_mode.shape_env.var_to_range[s] = ValueRanges(0, r.upper)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you import ValueRanges locally as it's an internal?


fake_mode = detect_fake_mode()

if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only do this with evaluate_guards on?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh is it because otherwise we drop all guards anyway?

@mergify
Copy link

mergify bot commented Nov 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laithsakka.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 29, 2025
Signed-off-by: Laith Sakka <lsakka@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models needs-rebase qwen Related to Qwen models v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants