Skip to content

Commit aafdb3a

Browse files
snnnyuslepukhin
andauthored
Fix shape inference failure with in-memory external data (microsoft#26263)
## Description Fixes microsoft#26261 This PR resolves a regression introduced in v1.23.0 where models with Constant nodes containing tensors larger than 127 bytes fail to load with a shape inference error. ### Root Cause Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert large Constant node tensors (> 127 bytes) into OrtValues with in-memory external data references for better memory management. However, ONNX shape inference cannot distinguish between in-memory and file-based external data, and rejects any TensorProto with `data_location = EXTERNAL`. ### The Fix Modified `InferenceContextImpl::getInputData()` to: 1. Detect tensors with in-memory external data using `utils::HasExternalDataInMemory()` 2. Retrieve the corresponding OrtValue 3. Create a temporary TensorProto with embedded data (not external reference) 4. Provide this temporary proto to ONNX shape inference This allows ONNX shape inference to access the actual tensor data without rejecting it as external. ### Memory Impact This fix introduces a minor and temporary increase in memory usage during the model loading phase. - **When:** The additional memory is allocated only when the shape inference engine needs to access the data of a constant tensor that is larger than 127 bytes. This is a one-time event during the initial analysis of the model. - **What:** The fix creates a temporary in-memory copy of the tensor data. - **Duration:** This temporary copy is released as soon as shape inference is complete. The impact on the overall peak memory usage of the application is expected to be negligible. The memory usage during inference is not affected. While it is theoretically possible for the temporary tensor to be large if a multi-gigabyte constant tensor is used for shape inference, this is a highly unlikely scenario in practice for well-designed models. ### Testing - Tested with the problematic model from issue microsoft#26261 - All optimization levels now work correctly (DISABLE_ALL, BASIC, EXTENDED, ALL) - Unit tests to be added ### Changes - **onnxruntime/core/graph/graph.cc**: - Modified `getInputData()` method in `InferenceContextImpl` class - Added `temp_tensor_protos_` member to store temporary TensorProtos during shape inference ## TODO - [ ] Add unit tests - [ ] Run full test suite --------- Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
1 parent 04ed484 commit aafdb3a

File tree

2 files changed

+284
-0
lines changed

2 files changed

+284
-0
lines changed

onnxruntime/core/graph/graph.cc

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2678,6 +2678,27 @@ class InferenceContextImpl : public ONNX_NAMESPACE::InferenceContext {
26782678
// only return data if it's for a constant initializer. checks for outer scope initializers
26792679
// if this is a subgraph and the name isn't found locally.
26802680
const TensorProto* initializer = graph_.GetConstantInitializer(def->Name(), true);
2681+
if (initializer != nullptr) {
2682+
// Check if this is in-memory external data (data stored in OrtValue)
2683+
// ONNX shape inference cannot handle external data, so we need to materialize it
2684+
if (utils::HasExternalDataInMemory(*initializer)) {
2685+
// Try to get the OrtValue for this initializer
2686+
OrtValue ort_value;
2687+
if (graph_.GetOrtValueInitializer(def->Name(), ort_value, true)) {
2688+
// Create a temporary TensorProto with the actual data from the OrtValue
2689+
// This allows ONNX shape inference to access the data
2690+
const Tensor& tensor = ort_value.Get<Tensor>();
2691+
auto temp_tensor_proto = utils::TensorToTensorProto(tensor, initializer->name(), /*use_tensor_buffer=*/false);
2692+
// Store the temporary proto so it outlives this call, maintain pointers steady
2693+
temp_tensor_protos_.push_back(std::make_unique<ONNX_NAMESPACE::TensorProto>(std::move(temp_tensor_proto)));
2694+
return temp_tensor_protos_.back().get();
2695+
} else {
2696+
// If we can't get the OrtValue, it is a bug
2697+
ORT_THROW("Initializer ", def->Name(),
2698+
" has in-memory external data but cannot get OrtValue during shape inference");
2699+
}
2700+
}
2701+
}
26812702
return initializer;
26822703
}
26832704

@@ -2717,6 +2738,11 @@ class InferenceContextImpl : public ONNX_NAMESPACE::InferenceContext {
27172738
std::vector<std::unique_ptr<GraphInferencerImpl>> graph_inferencers_;
27182739
const Graph& graph_;
27192740
const Graph::ResolveOptions& options_;
2741+
// Temporary TensorProtos created for in-memory external data during shape inference
2742+
// These need to outlive the shape inference call, so we store them here
2743+
// Inference is per node and the instance of this context is on the stack,
2744+
// so this is safe.
2745+
mutable InlinedVector<std::unique_ptr<ONNX_NAMESPACE::TensorProto>> temp_tensor_protos_;
27202746
};
27212747

27222748
Status Graph::InferAndVerifySubgraphTypes(const Node& node, Graph& subgraph,

onnxruntime/test/ir/graph_test.cc

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,17 @@
22
// Licensed under the MIT License.
33

44
#include <iostream>
5+
#include <fstream>
56
#include "core/common/inlined_containers.h"
67
#include "core/common/span_utils.h"
78
#include "core/framework/tensorprotoutils.h"
89
#include "core/graph/graph_viewer.h"
910
#include "core/graph/model.h"
1011
#include "core/graph/op.h"
12+
#include "core/session/inference_session.h"
13+
#include "core/session/environment.h"
1114
#include "test/providers/provider_test_utils.h"
15+
#include "test/test_environment.h"
1216
#include "gtest/gtest.h"
1317
#include "gmock/gmock.h"
1418
#include "onnx/defs/function.h"
@@ -2573,5 +2577,259 @@ TEST_F(GraphTest, GraphConstruction_MemoryEfficientTopologicalSort_SubgraphGener
25732577

25742578
#endif
25752579

2580+
// Test for shape inference with in-memory external data (issue #26261)
2581+
// This tests the fix for a regression where Constant nodes with large tensors (>127 bytes)
2582+
// stored as in-memory external data would cause shape inference to fail
2583+
TEST_F(GraphTest, ShapeInferenceWithInMemoryExternalData) {
2584+
// Create a model with a Constant node that produces a tensor larger than kSmallTensorExternalDataThreshold (127 bytes)
2585+
// This will trigger the in-memory externalization path
2586+
ModelProto model_proto;
2587+
model_proto.set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);
2588+
auto* opset = model_proto.add_opset_import();
2589+
opset->set_version(17);
2590+
2591+
auto* graph_proto = model_proto.mutable_graph();
2592+
graph_proto->set_name("test_graph");
2593+
2594+
// Create a Constant node with a tensor of 16 INT64 values (128 bytes, just over the 127 threshold)
2595+
auto* constant_node = graph_proto->add_node();
2596+
constant_node->set_op_type("Constant");
2597+
constant_node->set_name("const_node");
2598+
constant_node->add_output("const_output");
2599+
2600+
// Add the value attribute with a tensor
2601+
auto* attr = constant_node->add_attribute();
2602+
attr->set_name("value");
2603+
attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_TENSOR);
2604+
auto* tensor = attr->mutable_t();
2605+
tensor->set_data_type(ONNX_NAMESPACE::TensorProto_DataType_INT64);
2606+
tensor->add_dims(16); // 16 elements * 8 bytes = 128 bytes
2607+
// Each split will be size 1, totaling 16
2608+
for (int64_t i = 0; i < 16; ++i) {
2609+
tensor->add_int64_data(1);
2610+
}
2611+
2612+
// Create a Split node that uses the constant as input
2613+
// Split requires constant input for the 'split' parameter, which triggers shape inference
2614+
auto* split_node = graph_proto->add_node();
2615+
split_node->set_op_type("Split");
2616+
split_node->set_name("split_node");
2617+
split_node->add_input("input_data");
2618+
split_node->add_input("const_output"); // Use constant as split sizes
2619+
for (int i = 0; i < 16; ++i) {
2620+
split_node->add_output("split_output_" + std::to_string(i));
2621+
}
2622+
2623+
// Add axis attribute
2624+
auto* axis_attr = split_node->add_attribute();
2625+
axis_attr->set_name("axis");
2626+
axis_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_INT);
2627+
axis_attr->set_i(0);
2628+
2629+
// Add graph input
2630+
auto* input = graph_proto->add_input();
2631+
input->set_name("input_data");
2632+
auto* input_type = input->mutable_type()->mutable_tensor_type();
2633+
input_type->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
2634+
input_type->mutable_shape()->add_dim()->set_dim_value(16);
2635+
input_type->mutable_shape()->add_dim()->set_dim_value(10);
2636+
2637+
// Add graph outputs
2638+
for (int i = 0; i < 16; ++i) {
2639+
auto* output = graph_proto->add_output();
2640+
output->set_name("split_output_" + std::to_string(i));
2641+
}
2642+
2643+
// Load the model - this should succeed with the fix
2644+
// Before the fix, this would fail with:
2645+
// "Cannot parse data from external tensors. Please load external data into raw data for tensor"
2646+
std::shared_ptr<Model> model;
2647+
ASSERT_STATUS_OK(Model::Load(std::move(model_proto), model, nullptr, *logger_));
2648+
2649+
// Verify the graph was properly constructed
2650+
Graph& graph = model->MainGraph();
2651+
ASSERT_STATUS_OK(graph.Resolve());
2652+
2653+
// Verify the constant node was converted to an initializer
2654+
const ONNX_NAMESPACE::TensorProto* initializer = nullptr;
2655+
ASSERT_TRUE(graph.GetInitializedTensor("const_output", initializer));
2656+
ASSERT_NE(initializer, nullptr);
2657+
2658+
// Verify the Split node can access the constant data during shape inference
2659+
const Node* split_node_ptr = nullptr;
2660+
for (const auto& node : graph.Nodes()) {
2661+
if (node.Name() == "split_node") {
2662+
split_node_ptr = &node;
2663+
break;
2664+
}
2665+
}
2666+
ASSERT_NE(split_node_ptr, nullptr);
2667+
2668+
// Verify outputs are properly shaped
2669+
ASSERT_EQ(split_node_ptr->OutputDefs().size(), 16u);
2670+
}
2671+
2672+
// Test for shape inference with in-memory external data using InferenceSession
2673+
// This test more accurately reproduces the issue by going through the full session initialization
2674+
// which includes graph optimizations that trigger the in-memory externalization
2675+
TEST_F(GraphTest, ShapeInferenceWithInMemoryExternalDataViaSession) {
2676+
// Create the same model as above
2677+
ModelProto model_proto;
2678+
model_proto.set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);
2679+
auto* opset = model_proto.add_opset_import();
2680+
opset->set_version(17);
2681+
2682+
auto* graph_proto = model_proto.mutable_graph();
2683+
graph_proto->set_name("test_graph");
2684+
2685+
// Create a Constant node with a tensor of 16 INT64 values (128 bytes)
2686+
auto* constant_node = graph_proto->add_node();
2687+
constant_node->set_op_type("Constant");
2688+
constant_node->set_name("const_node");
2689+
constant_node->add_output("const_output");
2690+
2691+
auto* attr = constant_node->add_attribute();
2692+
attr->set_name("value");
2693+
attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_TENSOR);
2694+
auto* tensor = attr->mutable_t();
2695+
tensor->set_data_type(ONNX_NAMESPACE::TensorProto_DataType_INT64);
2696+
tensor->add_dims(16);
2697+
for (int64_t i = 0; i < 16; ++i) {
2698+
tensor->add_int64_data(1);
2699+
}
2700+
2701+
// Create a Split node
2702+
auto* split_node = graph_proto->add_node();
2703+
split_node->set_op_type("Split");
2704+
split_node->set_name("split_node");
2705+
split_node->add_input("input_data");
2706+
split_node->add_input("const_output");
2707+
for (int i = 0; i < 16; ++i) {
2708+
split_node->add_output("split_output_" + std::to_string(i));
2709+
}
2710+
2711+
auto* axis_attr = split_node->add_attribute();
2712+
axis_attr->set_name("axis");
2713+
axis_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_INT);
2714+
axis_attr->set_i(0);
2715+
2716+
// Add graph input
2717+
auto* input = graph_proto->add_input();
2718+
input->set_name("input_data");
2719+
auto* input_type = input->mutable_type()->mutable_tensor_type();
2720+
input_type->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
2721+
input_type->mutable_shape()->add_dim()->set_dim_value(16);
2722+
input_type->mutable_shape()->add_dim()->set_dim_value(10);
2723+
2724+
// Add graph outputs
2725+
for (int i = 0; i < 16; ++i) {
2726+
auto* output = graph_proto->add_output();
2727+
output->set_name("split_output_" + std::to_string(i));
2728+
}
2729+
2730+
// Save to a temporary file
2731+
const std::string model_path = "test_in_memory_external_data.onnx";
2732+
{
2733+
std::ofstream file(model_path, std::ios::binary);
2734+
ASSERT_TRUE(file.is_open());
2735+
ASSERT_TRUE(model_proto.SerializeToOstream(&file));
2736+
}
2737+
2738+
// Test with ORT_DISABLE_ALL optimization which should trigger the bug without the fix
2739+
SessionOptions so;
2740+
so.graph_optimization_level = TransformerLevel::Default; // This triggers the issue
2741+
so.session_logid = "GraphTest.ShapeInferenceWithInMemoryExternalDataViaSession";
2742+
2743+
InferenceSession session_object{so, GetEnvironment()};
2744+
2745+
// This should succeed with the fix, fail without it
2746+
ASSERT_STATUS_OK(session_object.Load(model_path));
2747+
ASSERT_STATUS_OK(session_object.Initialize());
2748+
2749+
// Clean up
2750+
std::remove(model_path.c_str());
2751+
}
2752+
2753+
// Test that explicitly triggers the in-memory externalization and then shape inference
2754+
// This test directly reproduces the bug scenario
2755+
TEST_F(GraphTest, ShapeInferenceAfterInitializerExternalization) {
2756+
// Create a model with a Split node that depends on a constant initializer
2757+
ModelProto model_proto;
2758+
model_proto.set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION);
2759+
auto* opset = model_proto.add_opset_import();
2760+
opset->set_version(17);
2761+
2762+
auto* graph_proto = model_proto.mutable_graph();
2763+
graph_proto->set_name("test_graph");
2764+
2765+
// Create initializer directly (not as Constant node) with 128 bytes
2766+
auto* initializer = graph_proto->add_initializer();
2767+
initializer->set_name("split_sizes");
2768+
initializer->set_data_type(ONNX_NAMESPACE::TensorProto_DataType_INT64);
2769+
initializer->add_dims(16); // 16 * 8 = 128 bytes
2770+
for (int64_t i = 0; i < 16; ++i) {
2771+
initializer->add_int64_data(1);
2772+
}
2773+
2774+
// Create a Split node that uses this initializer
2775+
auto* split_node = graph_proto->add_node();
2776+
split_node->set_op_type("Split");
2777+
split_node->set_name("split_node");
2778+
split_node->add_input("input_data");
2779+
split_node->add_input("split_sizes"); // Uses the large initializer
2780+
for (int i = 0; i < 16; ++i) {
2781+
split_node->add_output("split_output_" + std::to_string(i));
2782+
}
2783+
2784+
auto* axis_attr = split_node->add_attribute();
2785+
axis_attr->set_name("axis");
2786+
axis_attr->set_type(ONNX_NAMESPACE::AttributeProto_AttributeType_INT);
2787+
axis_attr->set_i(0);
2788+
2789+
// Add graph input
2790+
auto* input = graph_proto->add_input();
2791+
input->set_name("input_data");
2792+
auto* input_type = input->mutable_type()->mutable_tensor_type();
2793+
input_type->set_elem_type(ONNX_NAMESPACE::TensorProto_DataType_FLOAT);
2794+
input_type->mutable_shape()->add_dim()->set_dim_value(16);
2795+
input_type->mutable_shape()->add_dim()->set_dim_value(10);
2796+
2797+
// Add graph outputs
2798+
for (int i = 0; i < 16; ++i) {
2799+
auto* output = graph_proto->add_output();
2800+
output->set_name("split_output_" + std::to_string(i));
2801+
}
2802+
2803+
// Load model
2804+
std::shared_ptr<Model> model;
2805+
ASSERT_STATUS_OK(Model::Load(std::move(model_proto), model, nullptr, *logger_));
2806+
2807+
Graph& graph = model->MainGraph();
2808+
// First resolve should succeed
2809+
ASSERT_STATUS_OK(graph.Resolve());
2810+
2811+
// Now trigger the in-memory externalization
2812+
// This converts initializers > 127 bytes to OrtValues with external data references
2813+
Status convert_status = graph.ConvertInitializersIntoOrtValues();
2814+
ASSERT_TRUE(convert_status.IsOK()) << "ConvertInitializersIntoOrtValues failed: " << convert_status.ErrorMessage();
2815+
2816+
// Check if the initializer was actually externalized
2817+
const ONNX_NAMESPACE::TensorProto* initializer_after = nullptr;
2818+
ASSERT_TRUE(graph.GetInitializedTensor("split_sizes", initializer_after));
2819+
ASSERT_NE(initializer_after, nullptr);
2820+
// Debug: verify it was externalized
2821+
ASSERT_TRUE(utils::HasExternalDataInMemory(*initializer_after))
2822+
<< "Initializer was not externalized to in-memory external data";
2823+
2824+
// Mark the graph as needing resolve to force shape inference to run again
2825+
graph.SetGraphResolveNeeded();
2826+
2827+
// Resolve again - this should trigger shape inference with the externalized initializer
2828+
// Without the fix, this will fail with "Cannot parse data from external tensors"
2829+
// With the fix, getInputData() materializes the external data for shape inference
2830+
Status second_resolve = graph.Resolve();
2831+
ASSERT_TRUE(second_resolve.IsOK()) << "Second resolve failed: " << second_resolve.ErrorMessage();
2832+
}
2833+
25762834
} // namespace test
25772835
} // namespace onnxruntime

0 commit comments

Comments
 (0)