|
15 | 15 | Introduction |
16 | 16 | ------------ |
17 | 17 | PyTorch 1.8 includes an updated profiler API capable of |
18 | | -recording the CPU side operations as well as the CUDA kernel launches on the GPU side. |
| 18 | +recording CPU-side operations as well as device-side kernel launches (for example CUDA or XPU), |
| 19 | +when supported by the platform and underlying tracing integrations. |
| 20 | +
|
19 | 21 | The profiler can visualize this information |
20 | 22 | in TensorBoard Plugin and provide analysis of the performance bottlenecks. |
21 | 23 |
|
|
76 | 78 | # Next, create Resnet model, loss function, and optimizer objects. |
77 | 79 | # To run on GPU, move model and loss to GPU device. |
78 | 80 |
|
79 | | -device = torch.device("cuda:0") |
80 | | -model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device) |
81 | | -criterion = torch.nn.CrossEntropyLoss().cuda(device) |
| 81 | +acc = torch.accelerator.current_accelerator() |
| 82 | +device = torch.device(f'{acc}:0') |
| 83 | +model = torchvision.models.resnet18(weights='IMAGENET1K_V1').to(device) |
| 84 | +criterion = torch.nn.CrossEntropyLoss().to(device) |
82 | 85 | optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) |
83 | 86 | model.train() |
84 | 87 |
|
@@ -346,7 +349,7 @@ def train(data): |
346 | 349 | # For example, "GPU0" means the following table only shows each operator's memory usage on GPU 0, not including CPU or other GPUs. |
347 | 350 | # |
348 | 351 | # The memory curve shows the trends of memory consumption. The "Allocated" curve shows the total memory that is actually |
349 | | -# in use, e.g., tensors. In PyTorch, caching mechanism is employed in CUDA allocator and some other allocators. The |
| 352 | +# in use, e.g., tensors. In PyTorch, caching mechanism is employed in the device allocator and some other allocators. The |
350 | 353 | # "Reserved" curve shows the total memory that is reserved by the allocator. You can left click and drag on the graph |
351 | 354 | # to select events in the desired range: |
352 | 355 | # |
|
0 commit comments