Skip to content

Commit 3461e7e

Browse files
gmagogsfmclaude
andauthored
[Frontend] Remap -O to -cc commandline flag (#29557)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent fecae12 commit 3461e7e

File tree

9 files changed

+72
-39
lines changed

9 files changed

+72
-39
lines changed

.buildkite/scripts/hardware_ci/run-xpu-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ docker run \
3535
echo $ZE_AFFINITY_MASK
3636
pip install tblib==3.1.0
3737
python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager
38-
python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 -O3 -O.cudagraph_mode=NONE
38+
python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 -O3 -cc.cudagraph_mode=NONE
3939
python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager -tp 2 --distributed-executor-backend ray
4040
python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager -tp 2 --distributed-executor-backend mp
4141
VLLM_ATTENTION_BACKEND=TRITON_ATTN python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --block-size 64 --enforce-eager

docs/design/debug_vllm_compile.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ TL;DR:
88
| Online Flag | Offline Flag | Result |
99
|----------|----------|-------------|
1010
| --enforce-eager | enforce_eager=True | Turn off torch.compile and CUDAGraphs |
11-
| -O.mode=0 | mode=CompilationMode.NONE | Turn off torch.compile only |
12-
| -O.cudagraph_mode=NONE | compilation_config=CompilationConfig(cudagraph_mode=CUDAGraphMode.NONE) | Turn off CUDAGraphs only |
13-
| -O.backend=eager | compilation_config=CompilationConfig(backend='eager') | Turn off TorchInductor |
11+
| -cc.mode=0 | mode=CompilationMode.NONE | Turn off torch.compile only |
12+
| -cc.cudagraph_mode=NONE | compilation_config=CompilationConfig(cudagraph_mode=CUDAGraphMode.NONE) | Turn off CUDAGraphs only |
13+
| -cc.backend=eager | compilation_config=CompilationConfig(backend='eager') | Turn off TorchInductor |
1414

1515
## vLLM-torch.compile overview
1616

@@ -86,11 +86,11 @@ LLM(model, enforce_eager=True)
8686
```
8787

8888
To turn off just torch.compile, pass `mode = NONE` to the compilation config.
89-
(`-O` is short for `--compilation_config`):
89+
(`-cc` is short for `--compilation_config`; `-O.*` dotted syntax is deprecated):
9090

9191
```sh
9292
# Online
93-
vllm serve -O.mode=0
93+
vllm serve -cc.mode=0
9494
```
9595

9696
```py
@@ -103,7 +103,7 @@ To turn off just CUDAGraphs, pass `cudagraph_mode = NONE`:
103103

104104
```sh
105105
# Online
106-
vllm serve -O.cudagraph_mode=NONE
106+
vllm serve -cc.cudagraph_mode=NONE
107107
```
108108

109109
```py
@@ -183,10 +183,10 @@ help debug the issue:
183183

184184
```sh
185185
# Online - using unbacked mode
186-
vllm serve meta-llama/Llama-3.2-1B -O.dynamic_shapes_config.type=unbacked
186+
vllm serve meta-llama/Llama-3.2-1B -cc.dynamic_shapes_config.type=unbacked
187187

188188
# Online - using backed_size_oblivious mode
189-
vllm serve meta-llama/Llama-3.2-1B -O.dynamic_shapes_config.type=backed_size_oblivious
189+
vllm serve meta-llama/Llama-3.2-1B -cc.dynamic_shapes_config.type=backed_size_oblivious
190190
```
191191

192192
```py
@@ -233,7 +233,7 @@ to the compilation config:
233233

234234
```sh
235235
# online
236-
vllm serve -O.backend=eager
236+
vllm serve -cc.backend=eager
237237
```
238238

239239
```py
@@ -252,7 +252,7 @@ You can also use `TORCH_LOGS=output_code <command>` to print the Inductor output
252252
### Editable TorchInductor code
253253

254254
You can edit the TorchInductor code that gets run by setting `VLLM_COMPILE_CACHE_SAVE_FORMAT=unpacked`
255-
or passing `-O.compile_cache_save_format=unpacked`. The default is `binary`, which means it is not editable.
255+
or passing `-cc.compile_cache_save_format=unpacked`. The default is `binary`, which means it is not editable.
256256

257257
This is a useful technique: you can put breakpoints (e.g. `torch.distributed.breakpoint()`)
258258
and print statements in the output code.
@@ -299,7 +299,7 @@ To turn off just CUDAGraphs, pass `cudagraph_mode = NONE`:
299299

300300
```sh
301301
# Online
302-
vllm serve -O.cudagraph_mode=NONE
302+
vllm serve -cc.cudagraph_mode=NONE
303303
```
304304

305305
```py

docs/design/torch_compile.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ vllm serve meta-llama/Llama-3.2-1B \
117117

118118

119119
# Alternative: Using dot notation (simpler for single values)
120-
vllm serve meta-llama/Llama-3.2-1B -O.dynamic_shapes_config.type=unbacked
120+
vllm serve meta-llama/Llama-3.2-1B -cc.dynamic_shapes_config.type=unbacked
121121
```
122122

123123
#### Choosing the Right Mode

tests/compile/fullgraph/test_basic_correctness.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ def test_compile_correctness(
115115
str(pp_size),
116116
"-tp",
117117
str(tp_size),
118-
"-O.cudagraph_mode=none",
118+
"-cc.cudagraph_mode=none",
119119
]
120120

121121
all_args: list[list[str]] = []
@@ -128,7 +128,7 @@ def test_compile_correctness(
128128
]:
129129
for mode in [CompilationMode.NONE, comp_mode]:
130130
all_args.append(
131-
final_args + [f"-O.mode={mode.name}", "-O.backend=inductor"]
131+
final_args + [f"-cc.mode={mode.name}", "-cc.backend=inductor"]
132132
)
133133

134134
# inductor will change the output, so we only compare if the output
@@ -148,7 +148,7 @@ def test_compile_correctness(
148148
CompilationMode.DYNAMO_TRACE_ONCE,
149149
CompilationMode.VLLM_COMPILE,
150150
]:
151-
all_args.append(final_args + [f"-O.mode={mode.name}", "-O.backend=eager"])
151+
all_args.append(final_args + [f"-cc.mode={mode.name}", "-cc.backend=eager"])
152152
all_envs.append({})
153153
all_envs.append({})
154154

tests/engine/test_arg_utils.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -248,15 +248,15 @@ def test_optimization_level(args, expected):
248248
@pytest.mark.parametrize(
249249
("args", "expected"),
250250
[
251-
(["-O.mode=0"], 0),
252-
(["-O.mode=1"], 1),
253-
(["-O.mode=2"], 2),
254-
(["-O.mode=3"], 3),
251+
(["-cc.mode=0"], 0),
252+
(["-cc.mode=1"], 1),
253+
(["-cc.mode=2"], 2),
254+
(["-cc.mode=3"], 3),
255255
],
256256
)
257257
def test_mode_parser(args, expected):
258258
"""
259-
Test compilation config modes (-O.mode=int) map to compilation_config.
259+
Test compilation config modes (-cc.mode=int) map to compilation_config.
260260
"""
261261
parser = EngineArgs.add_cli_args(FlexibleArgumentParser())
262262
parsed_args = parser.parse_args(args)
@@ -273,7 +273,7 @@ def test_compilation_config():
273273
# set to string form of a dict
274274
args = parser.parse_args(
275275
[
276-
"-O",
276+
"-cc",
277277
'{"mode": 3, "cudagraph_capture_sizes": [1, 2, 4, 8], "backend": "eager"}',
278278
]
279279
)

tests/utils_/test_argparse_utils.py

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def parser():
2727
parser.add_argument("--batch-size", type=int)
2828
parser.add_argument("--enable-feature", action="store_true")
2929
parser.add_argument("--hf-overrides", type=json.loads)
30-
parser.add_argument("-O", "--compilation-config", type=json.loads)
30+
parser.add_argument("-cc", "--compilation-config", type=json.loads)
3131
parser.add_argument("--optimization-level", type=int)
3232
return parser
3333

@@ -167,8 +167,8 @@ def test_dict_args(parser):
167167
"--hf-overrides.key2.key4",
168168
"val3",
169169
# Test compile config and compilation mode
170-
"-O.use_inductor_graph_partition=true",
171-
"-O.backend",
170+
"-cc.use_inductor_graph_partition=true",
171+
"-cc.backend",
172172
"custom",
173173
"-O1",
174174
# Test = sign
@@ -191,9 +191,9 @@ def test_dict_args(parser):
191191
"--hf_overrides.key14.key15",
192192
"-minus.and.dot",
193193
# Test array values
194-
"-O.custom_ops+",
194+
"-cc.custom_ops+",
195195
"-quant_fp8",
196-
"-O.custom_ops+=+silu_mul,-rms_norm",
196+
"-cc.custom_ops+=+silu_mul,-rms_norm",
197197
]
198198
parsed_args = parser.parse_args(args)
199199
assert parsed_args.model_name == "something.something"
@@ -234,7 +234,7 @@ def test_duplicate_dict_args(caplog_vllm, parser):
234234
"--hf-overrides.key1",
235235
"val2",
236236
"-O1",
237-
"-O.mode",
237+
"-cc.mode",
238238
"2",
239239
"-O3",
240240
]
@@ -380,29 +380,29 @@ def test_load_config_file(tmp_path):
380380

381381

382382
def test_compilation_mode_string_values(parser):
383-
"""Test that -O.mode accepts both integer and string mode values."""
384-
args = parser.parse_args(["-O.mode", "0"])
383+
"""Test that -cc.mode accepts both integer and string mode values."""
384+
args = parser.parse_args(["-cc.mode", "0"])
385385
assert args.compilation_config == {"mode": 0}
386386

387387
args = parser.parse_args(["-O3"])
388388
assert args.optimization_level == 3
389389

390-
args = parser.parse_args(["-O.mode=NONE"])
390+
args = parser.parse_args(["-cc.mode=NONE"])
391391
assert args.compilation_config == {"mode": "NONE"}
392392

393-
args = parser.parse_args(["-O.mode", "STOCK_TORCH_COMPILE"])
393+
args = parser.parse_args(["-cc.mode", "STOCK_TORCH_COMPILE"])
394394
assert args.compilation_config == {"mode": "STOCK_TORCH_COMPILE"}
395395

396-
args = parser.parse_args(["-O.mode=DYNAMO_TRACE_ONCE"])
396+
args = parser.parse_args(["-cc.mode=DYNAMO_TRACE_ONCE"])
397397
assert args.compilation_config == {"mode": "DYNAMO_TRACE_ONCE"}
398398

399-
args = parser.parse_args(["-O.mode", "VLLM_COMPILE"])
399+
args = parser.parse_args(["-cc.mode", "VLLM_COMPILE"])
400400
assert args.compilation_config == {"mode": "VLLM_COMPILE"}
401401

402-
args = parser.parse_args(["-O.mode=none"])
402+
args = parser.parse_args(["-cc.mode=none"])
403403
assert args.compilation_config == {"mode": "none"}
404404

405-
args = parser.parse_args(["-O.mode=vllm_compile"])
405+
args = parser.parse_args(["-cc.mode=vllm_compile"])
406406
assert args.compilation_config == {"mode": "vllm_compile"}
407407

408408

@@ -458,3 +458,25 @@ def test_flat_product():
458458
(3, 4, "a", 5, 6),
459459
(3, 4, "b", 5, 6),
460460
]
461+
462+
463+
def test_o_legacy_syntax_deprecation(caplog_vllm):
464+
"""Test that -O.* dotted syntax emits warnings and converts correctly to -cc syntax."""
465+
parser = FlexibleArgumentParser()
466+
parser.add_argument("-cc", "--compilation-config", type=json.loads)
467+
468+
# Test that -O.backend gets converted correctly AND emits warning
469+
args = parser.parse_args(["-O.backend=eager"])
470+
assert args.compilation_config == {"backend": "eager"}
471+
472+
# Check that deprecation warning was logged
473+
assert len(caplog_vllm.records) >= 1
474+
assert (
475+
"The -O.* dotted syntax for --compilation-config is deprecated"
476+
in caplog_vllm.text
477+
)
478+
479+
# Test that -O.mode gets converted correctly
480+
# Note: warning_once won't emit again in same session
481+
args = parser.parse_args(["-O.mode=2"])
482+
assert args.compilation_config == {"mode": 2}

vllm/config/vllm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,8 +193,8 @@ class VllmConfig:
193193
compilation_config: CompilationConfig = Field(default_factory=CompilationConfig)
194194
"""`torch.compile` and cudagraph capture configuration for the model.
195195
196-
As a shorthand, one can append compilation arguments via
197-
-0.parameter=argument such as `-O.mode=3` (same as `-O='{"mode":3}'`).
196+
As a shorthand, one can append compilation arguments via
197+
-cc.parameter=argument such as `-cc.mode=3` (same as `-cc='{"mode":3}'`).
198198
199199
You can specify the full compilation config like so:
200200
`{"mode": 3, "cudagraph_capture_sizes": [1, 2, 4, 8]}`

vllm/engine/arg_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1107,7 +1107,7 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
11071107
"--ec-transfer-config", **vllm_kwargs["ec_transfer_config"]
11081108
)
11091109
vllm_group.add_argument(
1110-
"--compilation-config", "-O", **vllm_kwargs["compilation_config"]
1110+
"--compilation-config", "-cc", **vllm_kwargs["compilation_config"]
11111111
)
11121112
vllm_group.add_argument(
11131113
"--additional-config", **vllm_kwargs["additional_config"]

vllm/utils/argparse_utils.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,17 @@ def repl(match: re.Match) -> str:
257257
):
258258
# Convert -O <n> to --optimization-level <n>
259259
processed_args.append("--optimization-level")
260+
elif arg.startswith("-O."):
261+
# Handle -O.* dotted syntax - ALL dotted syntax is deprecated
262+
logger.warning_once(
263+
"The -O.* dotted syntax for --compilation-config is "
264+
"deprecated and will be removed in v0.13.0 or v1.0.0"
265+
", whichever is earlier. Please use -cc.* instead. "
266+
"Example: -cc.backend=eager instead of "
267+
"-O.backend=eager."
268+
)
269+
converted_arg = arg.replace("-O", "-cc", 1)
270+
processed_args.append(converted_arg)
260271
else:
261272
processed_args.append(arg)
262273

0 commit comments

Comments
 (0)