WIP: Feat: Add ONNX Sub Functions Export Feature #613

vbaddi · 2025-11-09T11:42:20Z

This PR introduces support for exporting ONNX modules as functions, enabling more efficient model compilation and execution on hardware.

Changes

Added new environment variable QEFF_USE_ONNX_FUNCTIONS to control ONNX function export behavior
Integrated ONNX function export capability into the inference pipeline

Enable ONNX Functions Export

Set the environment variable before running inference:

export QEFF_USE_ONNX_FUNCTIONS=true

Export and Execute with ONNX Functions

python -m QEfficient.cloud.infer \
  --model-name gpt2 \
  --num-cores 16 \
  --device-group "[0]" \
  --prompt "My name is" \
  --num-layers 2

Backward Compatibility

This feature is opt-in and requires explicit environment variable. Existing workflows remain unaffected when the flag is disabled.

- Auto-detect decoder layers for export_modules_as_functions based on model type - Add CustomOpTransform to dynamically register and include custom ops (CustomRMSNorm, CtxGather, CtxScatter) - Fix invalid INT32_MAX indices in ONNX runtime by replacing with 0 - Support ONNX functions export via QEFF_USE_ONNX_FUNCTIONS env var - Handle rope_scaling None values gracefully for Gemma3 Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

Signed-off-by: Vinayak Baddi <quic_vbaddi@quicinc.com>

quic-rishinr · 2025-11-10T06:19:45Z

QEfficient/base/onnx_transforms.py

+        """
+        transformed = False
+        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)
+        temp_onnx_path = kwargs.get("temp_onnx_path", None)


Can we make it as a mandiatory argument? and onnx_base_dir is unused here

quic-rishinr · 2025-11-10T06:23:04Z

QEfficient/base/onnx_transforms.py

+        :param temp_onnx_path: Path to save the slimmed ONNX model.
+        """
+        transformed = False
+        onnx_slim_transform = True  # kwargs.get("enable_onnx_slim_transform", False)


if OnnxSlimTransform is called do you need to again have a flag for onnx_slim_transform = True? and then check it on line 130? expectation should be to apply the onnxslimtransform right?

quic-rishinr · 2025-11-10T06:24:18Z

QEfficient/transformers/models/gemma3/modeling_gemma3.py

        inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim))

-        if hasattr(config, "rope_scaling") and "factor" in config.rope_scaling:
+        if hasattr(config, "rope_scaling") and config.rope_scaling is not None and "factor" in config.rope_scaling:


is this change part of ONNX Sub Functions?

quic-rishinr · 2025-11-10T06:28:06Z

QEfficient/transformers/models/modeling_auto.py

                    example_inputs["past_key_values"][i].append(torch.zeros(pkv_cache[0][0].shape, dtype=torch.float32))
                    dynamic_axes[f"past_{kv}.{i}"] = pkv_dynamic_axes
-                    output_names.append(f"past_{kv}.{i}_RetainedState")
+                    output_names.append(f"past_{kv}.{i}_InternalRetainedState")


Why we are renaming it? if we are renaming _RetainedState to _InternalRetainedState wouldnt the chages need to added on text_generation_inference and other places we are skipping the bufferes? Even if we are not enabling the subfunction this would impact regular execution

quic-rishinr · 2025-11-10T06:36:45Z

QEfficient/utils/constants.py

 ONNX_EXPORT_EXAMPLE_FBS = 4
 ONNX_EXPORT_EXAMPLE_NLK = 2  # Number of Logits to Keep
-ONNX_EXPORT_OPSET = 13
+ONNX_EXPORT_OPSET = 17


some test on opset 17 is still ongoing @quic-hemagnih are we good to merge opset 17 changes?

quic-rishinr · 2025-11-10T07:37:48Z

QEfficient/__init__.py


+# Apply patches
+# TODO: Find a better way to do this, this is temp. fix.
+apply_torch_patches()


If we are not enabling subfunction do we need to do the monkey patching?

quic-rishinr · 2025-11-10T07:41:42Z

QEfficient/base/modeling_qeff.py

                dynamic_axes=dynamic_axes,
                opset_version=constants.ONNX_EXPORT_OPSET,
+                export_modules_as_functions=decoder_layer_classes,
+                do_constant_folding=True,


Do we need it to be do_constant_folding=True and export_modules_as_functions by default if we are enabling it via env variable?

quic-rishinr · 2025-11-10T07:54:33Z

QEfficient/transformers/models/modeling_auto.py

-    _onnx_transforms = [FP16ClipTransform, SplitTensorsTransform]
+    _onnx_transforms = [
+        FP16ClipTransform,
+        CustomOpTransform,


do we need to apply the CustomOpTransform again after export?

vbaddi added 5 commits November 5, 2025 17:20

fix: Add torch ONNX export patch for export_modules_as_functions

2cb1708

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

fix: update, fix the modeling_qeff

c16a9eb

Signed-off-by: vbaddi <quic_vbaddi@quicinc.com>

nit: add rename_function fix

02eaaa8

Signed-off-by: Vinayak Baddi <quic_vbaddi@quicinc.com>

nit: fix dynamic axes integer access

1fce1d6

Signed-off-by: Vinayak Baddi <quic_vbaddi@quicinc.com>

vbaddi self-assigned this Nov 9, 2025

vbaddi requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners November 9, 2025 11:42

nit: ruff/lint on cache utils file

7f1d431

Signed-off-by: Vinayak Baddi <quic_vbaddi@quicinc.com>

vbaddi marked this pull request as draft November 9, 2025 11:45

vbaddi changed the title ~~Feat: Add ONNX Sub Functions Export Feature~~ WIP: Feat: Add ONNX Sub Functions Export Feature Nov 9, 2025

quic-rishinr requested changes Nov 10, 2025

View reviewed changes

vbaddi assigned abhishek-singh591 Nov 10, 2025

vbaddi added the enhancement New feature or request label Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Feat: Add ONNX Sub Functions Export Feature #613

WIP: Feat: Add ONNX Sub Functions Export Feature #613

Uh oh!

vbaddi commented Nov 9, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

quic-rishinr Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WIP: Feat: Add ONNX Sub Functions Export Feature #613

Are you sure you want to change the base?

WIP: Feat: Add ONNX Sub Functions Export Feature #613

Uh oh!

Conversation

vbaddi commented Nov 9, 2025

Changes

Enable ONNX Functions Export

Export and Execute with ONNX Functions

Backward Compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants