You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
179
180
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
180
181
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
182
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
181
183
**kwargs: Any,
182
184
Returns:
183
185
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
"""Compile an ExportedProgram module for NVIDIA GPUs using TensorRT
@@ -511,6 +515,7 @@ def compile(
511
515
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
512
516
offload_module_to_cpu (bool): Offload the module to CPU. This is useful when we need to minimize GPU memory usage.
513
517
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
518
+
dynamically_allocate_resources (bool): Dynamically allocate resources during engine execution.
514
519
**kwargs: Any,
515
520
Returns:
516
521
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@
11
11
DLA_GLOBAL_DRAM_SIZE,
12
12
DLA_LOCAL_DRAM_SIZE,
13
13
DLA_SRAM_SIZE,
14
+
DYNAMICALLY_ALLOCATE_RESOURCES,
14
15
DRYRUN,
15
16
ENABLE_CROSS_COMPILE_FOR_WINDOWS,
16
17
ENABLE_EXPERIMENTAL_DECOMPOSITIONS,
@@ -97,6 +98,8 @@ class CompilationSettings:
97
98
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
98
99
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
99
100
use_distributed_mode_trace (bool): Using aot_autograd to trace the graph. This is enabled when DTensors or distributed tensors are present in distributed model
101
+
offload_module_to_cpu (bool): Offload the model to CPU to reduce memory footprint during compilation
102
+
dynamically_allocate_resources (bool): Dynamically allocate resources for TensorRT engines
0 commit comments