You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.
6
+
- Updates since [TensorRT 8.6.0 EA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA).
7
+
- Please refer to the [TensorRT 8.6.1.6 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) for more information.
8
+
9
+
Key Features and Updates:
10
+
11
+
- Added a new flag `--use-cuda-graph` to demoDiffusion to improve performance.
12
+
- Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.
TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.
6
-
- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3).
7
-
- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#tensorrt-8) for more information.
17
+
- Updates since [TensorRT 8.5.3 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3).
18
+
- Please refer to the [TensorRT 8.6.0.12 EA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-0-EA) for more information.
8
19
9
20
Key Features and Updates:
10
21
11
22
- demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
12
23
- The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.
18
-
- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2).
19
-
- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-3) for more information.
29
+
- Updates since [TensorRT 8.5.2 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2).
30
+
- Please refer to the [TensorRT 8.5.3 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-3) for more information.
20
31
21
32
Key Features and Updates:
22
33
23
34
- Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.
30
-
- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1).
31
-
- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-2) for more information.
41
+
- Updates since [TensorRT 8.5.1 GA release](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1).
42
+
- Please refer to the [TensorRT 8.5.2 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-2) for more information.
TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
57
68
- Updates since [TensorRT 8.4.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.4.1).
58
-
- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-5-1) for more information.
69
+
- Please refer to the [TensorRT 8.5.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-5-1) for more information.
Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-2) for more information
98
+
Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-2) for more information
88
99
89
100
### Changed
90
101
- Updated default protobuf version to 3.20.x
@@ -114,11 +125,11 @@ Updated TensorRT version to 8.4.2 - see the [TensorRT 8.4.2 release notes](https
TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
120
131
- Updates since [TensorRT 8.2.1 GA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.1).
121
-
- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-4-1) for more information.
132
+
- Please refer to the [TensorRT 8.4.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-4-1) for more information.
TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
264
275
- Updates since [TensorRT 8.2.0 EA release](https://github.com/NVIDIA/TensorRT/releases/tag/8.2.0-EA).
265
-
- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-1) for more information.
276
+
- Please refer to the [TensorRT 8.2.1 GA release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-2-1) for more information.
- Added support for the following ONNX operators: `Celu`, `CumSum`, `EyeLike`, `GatherElements`, `GlobalLpPool`, `GreaterOrEqual`, `LessOrEqual`, `LpNormalization`, `LpPool`, `ReverseSequence`, and `SoftmaxCrossEntropyLoss`[details]().
432
443
- Rehauled `Resize` ONNX operator, now fully supporting the following modes:
@@ -48,8 +48,8 @@ To build the TensorRT-OSS components, you will first need the following software
48
48
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 5.0 (current support only for TensorRT 8.4.0 and TensorRT 8.5.2)
49
49
* (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
**Example: Linux (aarch64) build with default cuda-12.0**
@@ -174,14 +174,14 @@ For Linux platforms, we recommend that you generate a docker container for build
174
174
> NOTE: The latest JetPack SDK v5.1 only supports TensorRT 8.5.2.
175
175
176
176
> NOTE:
177
-
<br> 1. The default CUDA version used by CMake is 11.8.0. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
177
+
<br> 1. The default CUDA version used by CMake is 12.0.1. To override this, for example to 11.8, append `-DCUDA_VERSION=11.8` to the cmake command.
178
178
<br> 2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
179
179
* Required CMake build arguments are:
180
180
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
181
181
- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
182
182
* Optional CMake build arguments:
183
183
- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] |`Debug`
184
-
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.7.1`].
184
+
- `CUDA_VERSION`: The version of CUDA to target, for example [`11.7.1`].
185
185
- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.6`].
186
186
- `PROTOBUF_VERSION`: The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
187
187
- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.
@@ -64,7 +64,7 @@ Since the tokenizer and projection of the final predictions are not nearly as co
64
64
65
65
The tokenizer splits the input text into tokens that can be consumed by the model. For details on this process, see [this tutorial](https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/).
66
66
67
-
To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://ngc.nvidia.com/models/nvidian:bert_tf_v2_large_fp16_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.
67
+
To run the BERT model in TensorRT, we construct the model using TensorRT APIs and import the weights from a pre-trained TensorFlow checkpoint from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_tf_ckpt_large_qa_squad2_amp_128). Finally, a TensorRT engine is generated and serialized to the disk. The various inference scripts then load this engine for inference.
68
68
69
69
Lastly, the tokens predicted by the model are projected back to the original text to get a final result.
70
70
@@ -586,3 +586,4 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
0 commit comments