@@ -2,8 +2,8 @@ Deploying a Torch-TensorRT model (to Triton)
22============================================
33
44Optimization and deployment go hand in hand in a discussion about Machine
5- Learning infrastructure. For a Torch-TensorRT user, network level optimzation
6- to get the maximum performance would already be an area of expertize .
5+ Learning infrastructure. Once network level optimzation are done
6+ to get the maximum performance, the next step would be to deploy it .
77
88However, serving this optimized model comes with it's own set of considerations
99and challenges like: building an infrastructure to support concorrent model
@@ -18,7 +18,7 @@ Step 1: Optimize your model with Torch-TensorRT
1818-----------------------------------------------
1919
2020Most Torch-TensorRT users will be familiar with this step. For the purpose of
21- this demoonstration , we will be using a ResNet50 model from Torchhub.
21+ this demonstration , we will be using a ResNet50 model from Torchhub.
2222
2323Let’s first pull the NGC PyTorch Docker container. You may need to create
2424an account and get the API key from `here <https://ngc.nvidia.com/setup/ >`__.
@@ -30,7 +30,7 @@ Sign up and login with your key (follow the instructions
3030 # <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Pytorch
3131 # container; eg. 22.04
3232
33- docker run -it --gpus all -v /path/to/folder:/resnet50_eg nvcr.io/nvidia/pytorch:<xx.xx>-py3
33+ docker run -it --gpus all -v /path/to/local/ folder/to/copy/model :/resnet50_eg nvcr.io/nvidia/pytorch:<xx.xx>-py3
3434
3535Once inside the container, we can proceed to download a ResNet model from
3636Torchhub and optimize it with Torch-TensorRT.
@@ -180,25 +180,25 @@ with the Triton Inference Server.
180180::
181181
182182 # Setting up client
183- triton_client = httpclient.InferenceServerClient(url="localhost:8000")
183+ client = httpclient.InferenceServerClient(url="localhost:8000")
184184
185185Secondly, we specify the names of the input and output layer(s) of our model.
186186
187187::
188188
189- test_input = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
190- test_input .set_data_from_numpy(transformed_img, binary_data=True)
189+ inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
190+ inputs .set_data_from_numpy(transformed_img, binary_data=True)
191191
192- test_output = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
192+ outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
193193
194194Lastly, we send an inference request to the Triton Inference Server.
195195
196196::
197197
198198 # Querying the server
199- results = triton_client .infer(model_name="resnet50", inputs=[test_input ], outputs=[test_output ])
200- test_output_fin = results.as_numpy('output__0')
201- print(test_output_fin [:5])
199+ results = client .infer(model_name="resnet50", inputs=[inputs ], outputs=[outputs ])
200+ inference_output = results.as_numpy('output__0')
201+ print(inference_output [:5])
202202
203203The output of the same should look like below:
204204
0 commit comments