Skip to content

Commit bc3dc45

Browse files
ankitm3kdaijhqjia7adrianlizarragasnnn
authored
Sync ORT main 16 07 25 (#744)
* [webgpu] Update wgsl_templates README.md (microsoft#25336) ### Description Fix a broken URL and numbering in the ordered list in README.md. ### Motivation and Context See Above. * [webgpu] Move the early return after copying for ScatterND (microsoft#25345) ### Description For ScatterND, if the indices are empty (nothing to update), it becomes a copy operation. So we should move the early return after copying. * [EP ABI] Utility to serialize OrtGraph to GraphProto (microsoft#25292) ### Description - Provides utility functions that serialize an `OrtGraph` to a `GraphProto` or `ModelProto`. - Header-only file that can be copied to a project that builds with ORT and ONNX. - Available in [include/onnxruntime/core/providers/utils/ort_graph_to_proto.h](https://github.com/microsoft/onnxruntime/blob/adrianl/ep-abi-ort-graph-to-onnx-protobuf/include/onnxruntime/core/providers/utils/ort_graph_to_proto.h) - Updates the `Node_GetSubgraphs` API function to also return the attribute names associated with each subgraph. This is required to determine which subgraph corresponds to a given attribute. - Adds `Graph_GetNumOperatorSets` and `Graph_GetOperatorSets` API functions to get the opset version for each domain. ### Motivation and Context Provide a utility to facilitate porting of existing execution providers to the new EP ABI. The utilities introduced by this PR convert an `OrtGraph` into an ONNX protobuf representation, which some existing EPs currently convert to their internal representation. Ideally, we would prefer a more direct conversion from a `OrtGraph` to the EP's internal representation, but this is a large effort. These utilities enable an incremental transition. * Update vcpkg.json: remove optional-lite (microsoft#25339) The library is not used. C++ itself already has std::optional. * Move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager (microsoft#25276) ### Description <!-- Describe your changes. --> This PR is to move buffer release or cache from OnRefresh to ReleaseBuffer in BucketCacheManager. ### Motivation and Context The OnRefresh is executed after a batch(16) ep runs and inside the batch runs, the buffer can not be really reused which is a waste for gpu buffer resources. This PR proposed a strightforward optimization that release or cache the buffer early in ReleaseBuffer instead of OnRefresh to improve the buffer cache or release efficiency which will improve the peak and average GPU memory usage. The experimental result also shows a reasonable memory optimization without perf regressions. #### Phi3 Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen Latency (ms) | Tokens/sec -- | -- | -- | -- | -- Default Bucket | 3603.83 | 3127.05 | 7.17 | 139.50 Default Bucket with Early Release Optimization | 3534.77 (+1.92%) | 3073.97 (+1.70%) | 7.14 (+0.36%) | 140.01 (+0.36%) #### Deepseek-R1 Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen Latency (ms) | Tokens/sec -- | -- | -- | -- | -- Default Bucket | 2089.03 | 1716.15 | 6.07 | 164.67 Default Bucket with Early Release Optimization | 2034.00 (+2.63%) | 1674.49 (+2.43%) | 6.09 (-0.20%) | 164.34 (-0.20%) #### LLama3.2-1B Optimization Strategy | Peak Memory (MB) | Avg Memory (MB) | Token Gen Latency (ms) | Tokens/sec -- | -- | -- | -- | -- Default Bucket | 1736.03 | 1424.64 | 3.37 | 296.53 Default Bucket with Early Release Optimization | 1659.78 (+4.39%) | 1366.78 (+4.06%) | 3.41 (-1.09%) | 293.34 (-1.08%) * [web] Fix "npm run pull:wasm" script (microsoft#25330) ### Description following up for microsoft#25267 * [MLAS] DequantizeLinear int8/uint8 (microsoft#24818) ### Description - Adds multithreaded vectorized implementations of DequantizeLinear for int8 and uint8 inputs: - Intel SSE 2 - ARM NEON - All other architectures fallback to a multithreaded scalar reference implementation (previous was not multithreaded). - **Note**: only enabled if ORT is built for client/on-device workloads (`ORT_CLIENT_PACKAGE_BUILD` is defined). INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op threads (SSE 2 implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 2 | 2 | 1 | | 40 K | 5 | 5 | 1 | | 80 K | 11 | 4 | 2.75 | | 100 K | 14 | 5 | 2.80 | | 150 K | 21 | 7 | 3.00 | | 200 K | 28 | 8 | 3.50 | | 400 K | 68 | 15 | 4.53 | | 600 K | 107 | 21 | 5.10 | | 800 K | 142 | 28 | 5.07 | | 1 M | 187 | 42 | 4.45 | | 2 M | 376 | 102 | 3.69 | | 4 M | 880 | 236 | 3.73 | | 6 M | 1547 | 557 | 2.78 | | 8 M | 2438 | 1097 | 2.22 | | 10 M | 3192 | 1464 | 2.18 | | 100 M | 38718 | 17733 | 2.18 | INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4 intra op threads (NEON implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 1 | 1 | 1 | | 40 K | 3 | 3 | 1 | | 80 K | 7 | 4 | 1.75 | | 100 K | 9 | 3 | 3.00 | | 150 K | 14 | 5 | 2.80 | | 200 K | 18 | 6 | 3.00 | | 400 K | 38 | 10 | 3.80 | | 600 K | 61 | 15 | 4.07 | | 800 K | 76 | 19 | 4.00 | | 1 M | 98 | 24 | 4.08 | | 2 M | 204 | 48 | 4.25 | | 4 M | 424 | 112 | 3.79 | | 6 M | 677 | 384 | 1.76 | | 8 M | 919 | 621 | 1.48 | | 10 M | 1132 | 776 | 1.46 | | 100 M | 11842 | 10566 | 1.12 | ### Motivation and Context Improves latency of quantized QDQ models that with large DQs that dominate the inference latency. * [CPU] GQA supports head_sink input for smooth softmax (microsoft#25269) ### Description It is an extension of [Smooth Softmax](microsoft#21867) feature. The difference is that each head has a learnable smooth factor that adding to the denominator of softmax. The smooth factor is like an extra element that joins the softmax. The usage of the smooth factor in softmax is like the following: ```math softmax_{i} = \frac{exp(x_{i})}{exp(s)+ \sum_{j} exp(x_{j})} ``` The head_sink is a float tensor with length of number of attention heads. For h-th head, `head_sink[h]` is used as smooth factor s. When head_sink is not provided, constant 0 is used as smooth factor s. Changes: - [x] Update operator spec to add an optional new input `head_sink` - [x] Implement CPU (MLAS) kernel. - [x] Update test_gqa_cpu.py to test it. CUDA kernel will be updated later in a separate PR. * Add PackageVersion parameter to NuGet packaging stage (microsoft#25315) Fix: `Microsoft.ML.OnnxRuntime.Managed.nupkg` artifact from GPU pipeline does not have package version. ![image](https://github.com/user-attachments/assets/4a6135ab-4774-4aa6-aeb1-d5b06948ba8f) * [QNN EP] Fix pool with reshape name conflicts (microsoft#25332) Naming conflicts when expand-pool2d-squeeze (implemented as reshape) logic is invoked during ONNX -> QNN op lowering. Model with multiple pool 1D ops would hit this issue. * Added creation of QDQ for TopK node (microsoft#25309) - Added TopK in registry.py so as to create QDQ nodes for the op - Ensure that both the input and output quantization params are equal - Added unit test to verify the creation of QDQ nodes for TopK ### Description: Added support for creation of QDQ nodes for TopK when quantized with ORT static quantization tool ### Motivation and Context: Currently there is support to form a node unit for TopK operator when QDQ nodes are present and both the input and output quantization params are equal. But there was no support to create QDQ nodes for TopK operator in the ORT static quantization tool * [WebNN] Refactor webnn op input rank check and add validation for ops (microsoft#25185) ### Description Development for webnn op input rank range check ### Motivation and Context - refactor webnn op input rank check - add validation for various ops - take `gemm` op as an example to perform inputs rank check of decomposed ops @Honry @fdwr PTAL * Make TRT plugins optional (microsoft#25261) ### Description The parser does no longer link agains the plugin library but also loads it dynamic. Due to that I think we should also make the library optional in ORT. @chilo-ms * [EP ABI] Add Graph_GetGraphView API to get a OrtGraph from a subset of nodes (microsoft#25191) Added an API that creates a sub-graph from a set of nodes in an OrtGraph. This API is needed in the GetCapability EP ABI porting when EP wants to check whether a 'sub-graph' of the graph is supported by the hardware backend. * [webgpu] a few optimization to WGSL template (microsoft#25333) ### Description This change is a follow up to microsoft#25130. - consume duktape from vcpkg if --use_vcpkg is specified - ~~add a Windows CI pipeline for dynamic WGSL template~~ (Will do in a separate PR) - upgrade wgsl-template package from 0.1.10 to 0.1.13 - support adding contribop folder as input * add --client_package_build option (microsoft#25351) add a build option to enable default options more appropriate for client/on-device workloads. initial use case will be to set the default thread pool allow_spinning policy , which we want to default to 0/false for builds targeted for client/on-device workloads. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [WebNN] Fix bug in Float16Array availability check (microsoft#25354) The `from` is not a property of `Float16Array` but an inherited function, we can use `Float16Array['from']` to check if it is available. * [EP ABI] Add Node_GetEpType API (microsoft#25350) Add a new API `Node_GetEpType` to get the EP that the node is assigned to run on. This API is needed when porting the plugin TRT EP in `GetCapability` where ep needs to know whether the subgraph(s) of the control flow node is assigned to the ep and then to add this control flow op to the support list. * QNN-EP: DSPQueue Polling (microsoft#25361) ### Description Enable DSP queue polling when performance profile is burst * [QNN_EP] Implement Efficient Mode API (microsoft#25146) ### Description - Set context priority to low when workload type is Efficient - Set context priority to command line configured value if Default - Error out otherwise (invalid argument) * Add Compile API to set the location for the context binary file (microsoft#25356) Add Compile API ModelCompilationOptions_SetEpContextBinaryInformation to set the folder path and model name so that the EP can get the right place to dump the [model_name]_[ep].bin file. * add build matrix for wgsl template (microsoft#25352) ### Description Windows WebGPU CI: add build matrix for wgsl template * [JSEP] Fix inputShape index OOB in slice.ts (microsoft#25364) Use `inputShape.length - 1` instead of `inputShape.length` to avoid out-of-bounds access. * [webgpu] extend cast version to 23 (microsoft#25235) * Fix a security warning (microsoft#18979) Description (reference: GHSA-5crp-9r3c-p9vr) Newtonsoft.Json prior to version 13.0.1 is vulnerable to Insecure Defaults due to improper handling of expressions with high nesting level that lead to StackOverFlow exception or high CPU and RAM usage. Exploiting this vulnerability results in Denial Of Service (DoS). To mitigate the issue one either need to update Newtonsoft.Json to 13.0.1 or set MaxDepth parameter in the JsonSerializerSettings. ``` JsonConvert.DefaultSettings = () => new JsonSerializerSettings { MaxDepth = 128 }; ``` This file is the only place using `JsonConvert`, so I blindly put this fix and hope the warning will disappear. * Fix AutoEpSelection and OrtEpLibrary tests when using AuthenticAMD (microsoft#24754) * Missing datatype in assertion (microsoft#23578) * [EP ABI] Update to use Node_GetEpName (microsoft#25363) Change to use `Node_GetEpName` API name to avoid confusion. For plugin EPs, the EP factory can use whatever name that registered with ORT, so make the API name `Node_GetEpName` to align with `OrtEpFactory.GetName.` * Bump clang-format from 20.1.7 to 20.1.8 (microsoft#25381) * Fix number of layers in Whisper export (microsoft#25375) ### Description This PR fixes the number of hidden layers used during the export of Whisper by always using the number of hidden layers in the decoder. ### Motivation and Context Most of the Whisper models contain the same number of hidden layers in the encoder and decoder. However, Whisper large v3 turbo contains 32 hidden layers in the encoder and only 4 hidden layers in the decoder. This PR also fixes [this issue](microsoft/onnxruntime-genai#1611). * Bump transformers from 4.48.0 to 4.52.1 in /onnxruntime/python/tools/transformers/models/llama (microsoft#25328) Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.52.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>Patch release v4.51.3</h2> <p>A mix of bugs were fixed in this patch; very exceptionally, we diverge from semantic versioning to merge GLM-4 in this patch release.</p> <ul> <li>Handle torch ver in flexattn (<a href="https://redirect.github.com/huggingface/transformers/issues/37400">#37400</a>)</li> <li>handle torch version edge cases (<a href="https://redirect.github.com/huggingface/transformers/issues/37399">#37399</a>)</li> <li>Add glm4 (<a href="https://redirect.github.com/huggingface/transformers/issues/37388">#37388</a>)</li> </ul> <h1>Patch Release 4.51.2</h1> <p>This is another round of bug fixes, but they are a lot more minor and outputs were not really affected!</p> <ul> <li>Fix Llama4 offset (<a href="https://redirect.github.com/huggingface/transformers/issues/37414">#37414</a>) by <a href="https://github.com/Cyrilvallez"><code>@​Cyrilvallez</code></a></li> <li>Attention Quantization with FBGemm &amp; TP (<a href="https://redirect.github.com/huggingface/transformers/issues/37384">#37384</a>) by <a href="https://github.com/MekkCyber"><code>@​MekkCyber</code></a></li> <li>use rms_norm_eps for the L2Norm for Llama4 (<a href="https://redirect.github.com/huggingface/transformers/issues/37418">#37418</a>) by <a href="https://github.com/danielhanchen"><code>@​danielhanchen</code></a></li> <li>mark llama4 as not supported with fa2 (<a href="https://redirect.github.com/huggingface/transformers/issues/37416">#37416</a>) by <a href="https://github.com/winglian"><code>@​winglian</code></a></li> </ul> <h1>Patch release v4.51.1</h1> <p>Since the release of Llama 4, we have fixed a few issues that we are now releasing in patch v4.51.1</p> <ul> <li>Fixing flex attention for torch=2.6.0 (<a href="https://redirect.github.com/huggingface/transformers/issues/37285">#37285</a>)</li> <li>more fixes for post-training llama4 (<a href="https://redirect.github.com/huggingface/transformers/issues/37329">#37329</a>)</li> <li>Remove HQQ from caching allocator warmup (<a href="https://redirect.github.com/huggingface/transformers/issues/37347">#37347</a>)</li> <li>fix derived berts _init_weights (<a href="https://redirect.github.com/huggingface/transformers/issues/37341">#37341</a>)</li> <li>Fix init empty weights without accelerate (<a href="https://redirect.github.com/huggingface/transformers/issues/37337">#37337</a>)</li> <li>Fix deepspeed with quantization (<a href="https://redirect.github.com/huggingface/transformers/issues/37324">#37324</a>)</li> <li>fix llama4 training (<a href="https://redirect.github.com/huggingface/transformers/issues/37319">#37319</a>)</li> <li>fix flex attn when optional args aren't passed (<a href="https://redirect.github.com/huggingface/transformers/issues/37327">#37327</a>)</li> <li>Multiple llama4 fixe (<a href="https://redirect.github.com/huggingface/transformers/issues/37353">#37353</a>)</li> </ul> <p>Thanks all for your patience</p> <h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2> <h2>New Model Additions</h2> <h3>Llama 4</h3> <p><img src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4" alt="image" /></p> <p>Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:</p> <ul> <li>The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.</li> <li>The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.</li> </ul> <p>Both models leverage early fusion for native multimodality, enabling them to process text and image inputs. Maverick and Scout are both trained on up to 40 trillion tokens on data encompassing 200 languages (with specific fine-tuning support for 12 languages including Arabic, Spanish, German, and Hindi).</p> <p>For deployment, Llama 4 Scout is designed for accessibility, fitting on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization, while Maverick is available in BF16 and FP8 formats. These models are released under the custom Llama 4 Community License Agreement, available on the model repositories</p> <p>Getting started with Llama 4 using transformers is straightforward. Make sure you have transformers v4.51.0 or later installed:</p> <pre><code>pip install -U transformers[hf_xet] &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a> Release: v4.52.1</li> <li><a href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a> Revert parallelism temporarily (<a href="https://redirect.github.com/huggingface/transformers/issues/38240">#38240</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a> Protect ParallelInterface</li> <li><a href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a> Release: v4.52.0</li> <li><a href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a> [gemma3] fix bidirectional attention mask (<a href="https://redirect.github.com/huggingface/transformers/issues/38080">#38080</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a> [mllama] fix loading and inference (<a href="https://redirect.github.com/huggingface/transformers/issues/38223">#38223</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a> Add padding-free to bamba (<a href="https://redirect.github.com/huggingface/transformers/issues/35861">#35861</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a> Fixing Bitnet after use_rms_norm introduction (<a href="https://redirect.github.com/huggingface/transformers/issues/38229">#38229</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a> Enable Quantize KV Cache for Mistral Model (<a href="https://redirect.github.com/huggingface/transformers/issues/35042">#35042</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a> parallelism goes brrr (<a href="https://redirect.github.com/huggingface/transformers/issues/37877">#37877</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.48.0&new-version=4.52.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump ruff from 0.12.2 to 0.12.3 (microsoft#25382) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.2 to 0.12.3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.12.3</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>flake8-bugbear</code>] Support non-context-manager calls in <code>B017</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/19063">#19063</a>)</li> <li>[<code>flake8-use-pathlib</code>] Add autofixes for <code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>, <code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>, <code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>, <code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>, <code>PTH120</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/19213">#19213</a>)</li> <li>[<code>flake8-use-pathlib</code>] Add autofixes for <code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18922">#18922</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-return</code>] Fix false-positive for variables used inside nested functions in <code>RET504</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18433">#18433</a>)</li> <li>Treat form feed as valid whitespace before a line continuation (<a href="https://redirect.github.com/astral-sh/ruff/pull/19220">#19220</a>)</li> <li>[<code>flake8-type-checking</code>] Fix syntax error introduced by fix (<code>TC008</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19150">#19150</a>)</li> <li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code> should suppress the <code>UP008</code> fix (<a href="https://redirect.github.com/astral-sh/ruff/pull/19131">#19131</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>[<code>flake8-pyi</code>] Make example error out-of-the-box (<code>PYI007</code>, <code>PYI008</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19103">#19103</a>)</li> <li>[<code>flake8-simplify</code>] Make example error out-of-the-box (<code>SIM116</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19111">#19111</a>)</li> <li>[<code>flake8-type-checking</code>] Make example error out-of-the-box (<code>TC001</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19151">#19151</a>)</li> <li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box (<code>PTH210</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19189">#19189</a>)</li> <li>[<code>pycodestyle</code>] Make example error out-of-the-box (<code>E272</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19191">#19191</a>)</li> <li>[<code>pycodestyle</code>] Make example not raise unnecessary <code>SyntaxError</code> (<code>E114</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19190">#19190</a>)</li> <li>[<code>pydoclint</code>] Make example error out-of-the-box (<code>DOC501</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19218">#19218</a>)</li> <li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in examples (<code>PLW1501</code>, <code>UP028</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19127">#19127</a>)</li> <li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs and error to suggest proper usage (<code>PLC0207</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18949">#18949</a>)</li> <li>[<code>flake8-bandit</code>] Make example error out-of-the-box (<code>S412</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19241">#19241</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li> <li><a href="https://github.com/BurntSushi"><code>@​BurntSushi</code></a></li> <li><a href="https://github.com/Gankra"><code>@​Gankra</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li> <li><a href="https://github.com/LaBatata101"><code>@​LaBatata101</code></a></li> <li><a href="https://github.com/MatthewMckee4"><code>@​MatthewMckee4</code></a></li> <li><a href="https://github.com/MeGaGiGaGon"><code>@​MeGaGiGaGon</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li> <li><a href="https://github.com/NamelessGO"><code>@​NamelessGO</code></a></li> <li><a href="https://github.com/UnboundVariable"><code>@​UnboundVariable</code></a></li> <li><a href="https://github.com/abhijeetbodas2001"><code>@​abhijeetbodas2001</code></a></li> <li><a href="https://github.com/carljm"><code>@​carljm</code></a></li> <li><a href="https://github.com/charliermarsh"><code>@​charliermarsh</code></a></li> <li><a href="https://github.com/chirizxc"><code>@​chirizxc</code></a></li> <li><a href="https://github.com/danparizher"><code>@​danparizher</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li> <li><a href="https://github.com/fdosani"><code>@​fdosani</code></a></li> <li><a href="https://github.com/github-actions"><code>@​github-actions</code></a></li> <li><a href="https://github.com/ibraheemdev"><code>@​ibraheemdev</code></a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.12.3</h2> <h3>Preview features</h3> <ul> <li>[<code>flake8-bugbear</code>] Support non-context-manager calls in <code>B017</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/19063">#19063</a>)</li> <li>[<code>flake8-use-pathlib</code>] Add autofixes for <code>PTH100</code>, <code>PTH106</code>, <code>PTH107</code>, <code>PTH108</code>, <code>PTH110</code>, <code>PTH111</code>, <code>PTH112</code>, <code>PTH113</code>, <code>PTH114</code>, <code>PTH115</code>, <code>PTH117</code>, <code>PTH119</code>, <code>PTH120</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/19213">#19213</a>)</li> <li>[<code>flake8-use-pathlib</code>] Add autofixes for <code>PTH203</code>, <code>PTH204</code>, <code>PTH205</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18922">#18922</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-return</code>] Fix false-positive for variables used inside nested functions in <code>RET504</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18433">#18433</a>)</li> <li>Treat form feed as valid whitespace before a line continuation (<a href="https://redirect.github.com/astral-sh/ruff/pull/19220">#19220</a>)</li> <li>[<code>flake8-type-checking</code>] Fix syntax error introduced by fix (<code>TC008</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19150">#19150</a>)</li> <li>[<code>pyupgrade</code>] Keyword arguments in <code>super</code> should suppress the <code>UP008</code> fix (<a href="https://redirect.github.com/astral-sh/ruff/pull/19131">#19131</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>[<code>flake8-pyi</code>] Make example error out-of-the-box (<code>PYI007</code>, <code>PYI008</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19103">#19103</a>)</li> <li>[<code>flake8-simplify</code>] Make example error out-of-the-box (<code>SIM116</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19111">#19111</a>)</li> <li>[<code>flake8-type-checking</code>] Make example error out-of-the-box (<code>TC001</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19151">#19151</a>)</li> <li>[<code>flake8-use-pathlib</code>] Make example error out-of-the-box (<code>PTH210</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19189">#19189</a>)</li> <li>[<code>pycodestyle</code>] Make example error out-of-the-box (<code>E272</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19191">#19191</a>)</li> <li>[<code>pycodestyle</code>] Make example not raise unnecessary <code>SyntaxError</code> (<code>E114</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19190">#19190</a>)</li> <li>[<code>pydoclint</code>] Make example error out-of-the-box (<code>DOC501</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19218">#19218</a>)</li> <li>[<code>pylint</code>, <code>pyupgrade</code>] Fix syntax errors in examples (<code>PLW1501</code>, <code>UP028</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19127">#19127</a>)</li> <li>[<code>pylint</code>] Update <code>missing-maxsplit-arg</code> docs and error to suggest proper usage (<code>PLC0207</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18949">#18949</a>)</li> <li>[<code>flake8-bandit</code>] Make example error out-of-the-box (<code>S412</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/19241">#19241</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/astral-sh/ruff/commit/5bc81f26c8a820835067280153a279658477ccf2"><code>5bc81f2</code></a> Bump 0.12.3 (<a href="https://redirect.github.com/astral-sh/ruff/issues/19279">#19279</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/6908e2682f14792898cb8f9e4d920021da022307"><code>6908e26</code></a> Filter <code>ruff_linter::VERSION</code> out of SARIF output tests (<a href="https://redirect.github.com/astral-sh/ruff/issues/19280">#19280</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/25c429556421ddd6f715f5aaf906610e0c564606"><code>25c4295</code></a> [ty] Avoid stale diagnostics for open files diagnostic mode (<a href="https://redirect.github.com/astral-sh/ruff/issues/19273">#19273</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/426fa4bb12d8c47185800ba14dd5b4e721fd2c29"><code>426fa4b</code></a> [ty] Add signature help provider to playground (<a href="https://redirect.github.com/astral-sh/ruff/issues/19276">#19276</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/b0b65c24ff01dc9095f17b3768cf2b9a336a5a8c"><code>b0b65c2</code></a> [ty] Initial implementation of signature help provider (<a href="https://redirect.github.com/astral-sh/ruff/issues/19194">#19194</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/08bc6d25899501d690c37a87d6da51951280dfc5"><code>08bc6d2</code></a> Add simple integration tests for all output formats (<a href="https://redirect.github.com/astral-sh/ruff/issues/19265">#19265</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/f2ae12bab33d80d52caa3047775371fca83f6e96"><code>f2ae12b</code></a> [<code>flake8-return</code>] Fix false-positive for variables used inside nested functio...</li> <li><a href="https://github.com/astral-sh/ruff/commit/965f415212f4f9f3ef855b647d53e892e6913828"><code>965f415</code></a> [ty] Add a <code>--quiet</code> mode (<a href="https://redirect.github.com/astral-sh/ruff/issues/19233">#19233</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/83b5bbf004bf2e47dd4ca5c049930894856547f1"><code>83b5bbf</code></a> Treat form feed as valid whitespace before a line continuation (<a href="https://redirect.github.com/astral-sh/ruff/issues/19220">#19220</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/87f6f08ef53edc2cbe8632d612f6d4fd016fe2ff"><code>87f6f08</code></a> [ty] Make <code>check_file</code> a salsa query (<a href="https://redirect.github.com/astral-sh/ruff/issues/19255">#19255</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.12.2...0.12.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.12.2&new-version=0.12.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [QNN EP] Upgrade QNN to 2.36.1 (microsoft#25388) ### Description Update Qnn default version to 2.36.1.250708 Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com> * Add vendor id to OrtEpFactory and default ORT logger to CreateEpFactories (microsoft#25365) ### Description <!-- Describe your changes. --> Add vendor id to OrtEpFactory. It's easier to get the vendor id than name on other platforms. Update the selection policy to prefer match on vendor id with fallback to vendor name. Add default ORT logger to CreateEpFactories. The OrtEpFactory currently has no way to log informational messages or issues. CreateEp is given the session logger for use by the OrtEp instance so that part of things is good. Misc cleanups. Make usage of ORT_API2_STATUS and ORT_API_T consistent on onnxruntime_ep_c_api.h. See ort_version_supported in some EP factories where it was missed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Vendor id is easier to match against OrtHardwareDevice when doing auto EP selection. OrtEpFactory should have a logger. Last chance to cleanup APIs before 1.23 release * Bump lintrunner-adapters from 0.12.4 to 0.12.5 (microsoft#25380) * [WebNN] Add rank range validation for rest ops (microsoft#25383) - Add common rank range validation to base_op_builder.cc - Handle specific rank range validation for rest ops - Remove duplicated input_shape validation - Fix some typos BTW * Fix some test issues when WebGPU and DML are enabled in the same build (microsoft#25401) ### Description <!-- Describe your changes. --> Fix some test setups where both EPs being in the same build wasn't expected. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * Fix SigLIP casual mask bug (microsoft#25360) ### Description <!-- Describe your changes. --> SigLIP architecture inside the vision encoder should not use a causal mask on the attention. This change will fix Phi 4 MM accuracy issues we have seen. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [CPU] GQA supports attention scores output (microsoft#25319) ### Description 1. Add optional output to CPU impl of GQA op for storing attention scores (QK). Buffer is of shape (B, N, S, T) and can either be fp16 or fp32, depending on the type of other inputs 2. Add `qk_output` attribute to GQA, which controls if attention scores should be saved before or after softmax is applied 3. Add unit tests to cover this use case 4. Added asserts on other EPs if this feature is used * [QNN-EP] Support GridSample of linear mode for ONNX opset 20+ (microsoft#25408) [QNN-EP] Support GridSample of linear mode for ONNX opset 20+ * [QNN-EP] Update ScatterND op to reject only QNN-CPU (microsoft#25403) Current limitation is more than necessary -- only reject when targeting QNN CPU. * Fix 2 device discovery issues. (microsoft#25397) ### Description <!-- Describe your changes. --> Fix vendor and device id conversion from SetupApi info. Detect Remote Display Adapter and skip. This results in a bogus device appearing when you're connected to a machine using remote desktop. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> * [webgpu] fix Slice implementation (microsoft#25415) ### Description Bugfix: crash when dim_value is 0 ### Motivation and Context Thanks to @skottmckay who found the bug. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jianhui Dai <jianhui.j.dai@intel.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Fei Chen <feich@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: vraspar <vrajang@outlook.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Akupadhye <aupadhye@qti.qualcomm.com> Co-authored-by: Wang Ning <ning4.wang@intel.com> Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Wanming Lin <wanming.lin@intel.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: xhcao <xinghua.cao@intel.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: quic-hungjuiw <quic_hungjuiw@quicinc.com> Co-authored-by: Ian Hunter <ianfhunter@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com> Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Nenad Banfic <46795300+nenad1002@users.noreply.github.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com>
1 parent 776bedf commit bc3dc45

File tree

174 files changed

+3985
-877
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

174 files changed

+3985
-877
lines changed

.github/workflows/windows_webgpu.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ jobs:
2222
strategy:
2323
matrix:
2424
vcpkg_option: [novcpkg, vcpkg]
25+
wgsl_template: [static, dynamic]
2526
env:
2627
OrtPackageId: Microsoft.ML.OnnxRuntime
2728
OnnxRuntimeBuildDirectory: ${{ github.workspace }}
@@ -123,6 +124,7 @@ jobs:
123124
--build_nodejs `
124125
--build_java `
125126
--use_webgpu `
127+
--wgsl_template ${{ matrix.wgsl_template }} `
126128
${{ matrix.vcpkg_option == 'vcpkg' && '--use_vcpkg' || '' }} `
127129
--cmake_extra_defines `
128130
onnxruntime_BUILD_UNIT_TESTS=ON `

cmake/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ option(onnxruntime_DISABLE_SPARSE_TENSORS "Disable sparse tensors data types" OF
151151
option(onnxruntime_DISABLE_OPTIONAL_TYPE "Disable optional type" OFF)
152152
option(onnxruntime_DISABLE_FLOAT8_TYPES "Disable float 8 types" OFF)
153153
option(onnxruntime_MINIMAL_BUILD "Exclude as much as possible from the build. Support ORT format models. No support for ONNX format models." OFF)
154+
option(onnxruntime_CLIENT_PACKAGE_BUILD "Enables default settings that are more appropriate for client/on-device workloads." OFF)
154155
cmake_dependent_option(onnxruntime_DISABLE_RTTI "Disable RTTI" ON "NOT onnxruntime_ENABLE_PYTHON;NOT onnxruntime_USE_CUDA" OFF)
155156
# For now onnxruntime_DISABLE_EXCEPTIONS will only work with onnxruntime_MINIMAL_BUILD, more changes (ONNX, non-CPU EP, ...) are required to run this standalone
156157
cmake_dependent_option(onnxruntime_DISABLE_EXCEPTIONS "Disable exception handling. Requires onnxruntime_MINIMAL_BUILD currently." ON "onnxruntime_MINIMAL_BUILD;NOT onnxruntime_ENABLE_PYTHON" OFF)

cmake/adjust_global_compile_flags.cmake

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,11 @@ if (onnxruntime_MINIMAL_BUILD)
9595
endif()
9696
endif()
9797

98+
# ORT build with default settings more appropriate for client/on-device workloads.
99+
if (onnxruntime_CLIENT_PACKAGE_BUILD)
100+
add_compile_definitions(ORT_CLIENT_PACKAGE_BUILD)
101+
endif()
102+
98103
if (onnxruntime_ENABLE_LTO)
99104
include(CheckIPOSupported)
100105
check_ipo_supported(RESULT ipo_enabled OUTPUT ipo_output)

cmake/external/onnxruntime_external_deps.cmake

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -774,13 +774,24 @@ if (onnxruntime_USE_WEBGPU)
774774
endif()
775775

776776
if (NOT CMAKE_SYSTEM_NAME STREQUAL "Emscripten" AND onnxruntime_WGSL_TEMPLATE STREQUAL "dynamic")
777-
onnxruntime_fetchcontent_declare(
778-
duktape
779-
URL ${DEP_URL_duktape}
780-
URL_HASH SHA1=${DEP_SHA1_duktape}
781-
EXCLUDE_FROM_ALL
782-
)
783-
onnxruntime_fetchcontent_makeavailable(duktape)
777+
if(onnxruntime_USE_VCPKG)
778+
find_package(unofficial-duktape CONFIG REQUIRED)
779+
add_library(duktape_static ALIAS unofficial::duktape::duktape)
780+
else()
781+
onnxruntime_fetchcontent_declare(
782+
duktape
783+
URL ${DEP_URL_duktape}
784+
URL_HASH SHA1=${DEP_SHA1_duktape}
785+
EXCLUDE_FROM_ALL
786+
)
787+
onnxruntime_fetchcontent_makeavailable(duktape)
788+
789+
if(NOT TARGET duktape_static)
790+
add_library(duktape_static STATIC "${duktape_SOURCE_DIR}/src/duktape.c")
791+
target_compile_features(duktape_static PRIVATE c_std_99)
792+
target_include_directories(duktape_static INTERFACE $<BUILD_INTERFACE:${duktape_SOURCE_DIR}/src>)
793+
endif()
794+
endif()
784795
endif()
785796
endif()
786797

cmake/onnxruntime_mlas.cmake

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ onnxruntime_add_static_library(onnxruntime_mlas
3131
${MLAS_SRC_DIR}/eltwise.cpp
3232
${MLAS_SRC_DIR}/erf.cpp
3333
${MLAS_SRC_DIR}/compute.cpp
34+
${MLAS_SRC_DIR}/dequantize.cpp
3435
${MLAS_SRC_DIR}/quantize.cpp
3536
${MLAS_SRC_DIR}/qgemm_kernel_default.cpp
3637
${MLAS_SRC_DIR}/qladd.cpp

cmake/onnxruntime_providers_tensorrt.cmake

Lines changed: 5 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -72,26 +72,21 @@
7272
endif()
7373

7474
# TensorRT 10 GA onwards, the TensorRT libraries will have major version appended to the end on Windows,
75-
# for example, nvinfer_10.dll, nvinfer_plugin_10.dll, nvonnxparser_10.dll ...
75+
# for example, nvinfer_10.dll, nvonnxparser_10.dll ...
7676
if (WIN32 AND TRT_GREATER_OR_EQUAL_TRT_10_GA)
7777
set(NVINFER_LIB "nvinfer_${NV_TENSORRT_MAJOR}")
78-
set(NVINFER_PLUGIN_LIB "nvinfer_plugin_${NV_TENSORRT_MAJOR}")
7978
set(PARSER_LIB "nvonnxparser_${NV_TENSORRT_MAJOR}")
8079
endif()
8180

8281
if (NOT NVINFER_LIB)
8382
set(NVINFER_LIB "nvinfer")
8483
endif()
8584

86-
if (NOT NVINFER_PLUGIN_LIB)
87-
set(NVINFER_PLUGIN_LIB "nvinfer_plugin")
88-
endif()
89-
9085
if (NOT PARSER_LIB)
9186
set(PARSER_LIB "nvonnxparser")
9287
endif()
9388

94-
MESSAGE(STATUS "Looking for ${NVINFER_LIB} and ${NVINFER_PLUGIN_LIB}")
89+
MESSAGE(STATUS "Looking for ${NVINFER_LIB}")
9590

9691
find_library(TENSORRT_LIBRARY_INFER ${NVINFER_LIB}
9792
HINTS ${TENSORRT_ROOT}
@@ -101,14 +96,6 @@
10196
MESSAGE(STATUS "Can't find ${NVINFER_LIB}")
10297
endif()
10398

104-
find_library(TENSORRT_LIBRARY_INFER_PLUGIN ${NVINFER_PLUGIN_LIB}
105-
HINTS ${TENSORRT_ROOT}
106-
PATH_SUFFIXES lib lib64 lib/x64)
107-
108-
if (NOT TENSORRT_LIBRARY_INFER_PLUGIN)
109-
MESSAGE(STATUS "Can't find ${NVINFER_PLUGIN_LIB}")
110-
endif()
111-
11299
if (onnxruntime_USE_TENSORRT_BUILTIN_PARSER)
113100
MESSAGE(STATUS "Looking for ${PARSER_LIB}")
114101

@@ -120,7 +107,7 @@
120107
MESSAGE(STATUS "Can't find ${PARSER_LIB}")
121108
endif()
122109

123-
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_INFER_PLUGIN} ${TENSORRT_LIBRARY_NVONNXPARSER})
110+
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_NVONNXPARSER})
124111
MESSAGE(STATUS "Find TensorRT libs at ${TENSORRT_LIBRARY}")
125112
else()
126113
if (TRT_GREATER_OR_EQUAL_TRT_10_GA)
@@ -153,15 +140,15 @@
153140
endif()
154141
# Static libraries are just nvonnxparser_static on all platforms
155142
set(onnxparser_link_libs nvonnxparser_static)
156-
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER} ${TENSORRT_LIBRARY_INFER_PLUGIN})
143+
set(TENSORRT_LIBRARY ${TENSORRT_LIBRARY_INFER})
157144
MESSAGE(STATUS "Find TensorRT libs at ${TENSORRT_LIBRARY}")
158145
endif()
159146

160147
# ${TENSORRT_LIBRARY} is empty if we link nvonnxparser_static.
161148
# nvonnxparser_static is linked against tensorrt libraries in onnx-tensorrt
162149
# See https://github.com/onnx/onnx-tensorrt/blob/8af13d1b106f58df1e98945a5e7c851ddb5f0791/CMakeLists.txt#L121
163150
# However, starting from TRT 10 GA, nvonnxparser_static doesn't link against tensorrt libraries.
164-
# Therefore, the above code finds ${TENSORRT_LIBRARY_INFER} and ${TENSORRT_LIBRARY_INFER_PLUGIN}.
151+
# Therefore, the above code finds ${TENSORRT_LIBRARY_INFER}.
165152
if(onnxruntime_CUDA_MINIMAL)
166153
set(trt_link_libs ${CMAKE_DL_LIBS} ${TENSORRT_LIBRARY})
167154
else()

cmake/onnxruntime_providers_webgpu.cmake

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,12 @@
172172
file(MAKE_DIRECTORY ${WGSL_GENERATED_DIR})
173173

174174
# Find all WGSL template input files
175-
file(GLOB_RECURSE WGSL_TEMPLATE_FILES "${ONNXRUNTIME_ROOT}/core/providers/webgpu/*.wgsl.template")
175+
file(GLOB_RECURSE WGSL_TEMPLATE_FILES
176+
"${ONNXRUNTIME_ROOT}/core/providers/webgpu/*.wgsl.template"
177+
"${ONNXRUNTIME_ROOT}/contrib_ops/webgpu/*.wgsl.template")
176178

177179
# Set wgsl-gen command line options as a list
178-
set(WGSL_GEN_OPTIONS "-i" "../" "--output" "${WGSL_GENERATED_DIR}" "-I" "wgsl_template_gen/" "--preserve-code-ref" "--verbose")
180+
set(WGSL_GEN_OPTIONS "-i" "${ONNXRUNTIME_ROOT}/core/providers/webgpu/" "-i" "${ONNXRUNTIME_ROOT}/contrib_ops/webgpu/" "--output" "${WGSL_GENERATED_DIR}" "-I" "wgsl_template_gen/" "--preserve-code-ref" "--verbose")
179181
if (onnxruntime_WGSL_TEMPLATE STREQUAL "static")
180182
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
181183
list(APPEND WGSL_GEN_OPTIONS "--generator" "static-cpp-literal")
@@ -207,10 +209,9 @@
207209
# Add the generated directory to include paths
208210
target_include_directories(onnxruntime_providers_webgpu PRIVATE ${WGSL_GENERATED_ROOT})
209211
elseif(onnxruntime_WGSL_TEMPLATE STREQUAL "dynamic")
210-
add_library(duktape_static STATIC "${duktape_SOURCE_DIR}/src/duktape.c")
211-
target_compile_features(duktape_static PRIVATE c_std_99)
212212
target_link_libraries(onnxruntime_providers_webgpu duktape_static)
213-
target_include_directories(onnxruntime_providers_webgpu PRIVATE ${duktape_SOURCE_DIR}/src)
213+
onnxruntime_add_include_to_target(onnxruntime_providers_webgpu duktape_static)
214+
214215
# Define the path to the generated templates.js file
215216
target_compile_definitions(onnxruntime_providers_webgpu PRIVATE
216217
"ORT_WGSL_TEMPLATES_JS_PATH=\"${WGSL_GENERATED_TEMPLATES_JS}\"")

cmake/vcpkg.json

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,6 @@
4343
"ms-gsl",
4444
"nlohmann-json",
4545
"onnx",
46-
"optional-lite",
4746
{
4847
"name": "protobuf",
4948
"version>=": "3.21.12"
@@ -94,6 +93,10 @@
9493
"webgpu-ep": {
9594
"description": "Build with WebGPU EP",
9695
"dependencies": []
96+
},
97+
"webgpu-ep-wgsl-template-dynamic": {
98+
"description": "Build with WebGPU EP with dynamic WGSL template code generator",
99+
"dependencies": ["duktape"]
97100
}
98101
},
99102
"overrides": [
@@ -104,6 +107,10 @@
104107
{
105108
"name": "flatbuffers",
106109
"version": "23.5.26"
110+
},
111+
{
112+
"name": "duktape",
113+
"version": "2.7.0#2"
107114
}
108115
]
109116
}

csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Mobile/EndToEndTests.Mobile.Automation/Tests.cs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,12 @@ public void RunPlatformUnitTest()
4040
var serializedResultSummary = _app.Invoke(_getResultsBackdoorMethodName)?.ToString();
4141
Assert.IsNotEmpty(serializedResultSummary, "Test results were not returned");
4242

43+
// Fix security issue (overflow with too much nesting): GHSA-5crp-9r3c-p9vr
44+
JsonConvert.DefaultSettings = () => new JsonSerializerSettings { MaxDepth = 128 };
4345
var testSummary = JsonConvert.DeserializeObject<TestResultSummary>(serializedResultSummary);
4446
Assert.AreEqual(testSummary.Failed, 0, $"{testSummary.Failed} tests failed");
4547

4648
_app.Screenshot("Post-testing");
4749
}
4850
}
49-
}
51+
}

csharp/test/Microsoft.ML.OnnxRuntime.Tests.Devices/TestResultProcessor.cs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,9 @@ public TestResultSummary GetResults()
4545
public string GetSerializedResults()
4646
{
4747
var resultSummary = GetResults();
48+
JsonConvert.DefaultSettings = () => new JsonSerializerSettings { MaxDepth = 128 };
4849
var serializedResultSummary = JsonConvert.SerializeObject(resultSummary, Formatting.Indented);
4950
return serializedResultSummary;
5051
}
5152
}
52-
}
53+
}

0 commit comments

Comments
 (0)