1- (beta) Compiling the optimizer with torch.compile
1+ (beta) torch.compile๋ก ์ตํฐ๋ง์ด์ ์ปดํ์ผํ๊ธฐ
22==========================================================================================
33
4- **Author: ** `Michael Lazos <https://github.com/mlazos >`_
4+ **์ ์: ** `Michael Lazos <https://github.com/mlazos >`_
5+ **๋ฒ์ญ: ** `๊น์นํ <https://github.com/7SH7 >`_
56
6- The optimizer is a key algorithm for training any deep learning model.
7- Since it is responsible for updating every model parameter, it can often
8- become the bottleneck in training performance for large models. In this recipe,
9- we will apply ``torch.compile `` to the optimizer to observe the GPU performance
10- improvement.
7+ ์ตํฐ๋ง์ด์ ๋ ๋ฅ๋ฌ๋ ๋ชจ๋ธ์ ํ๋ จํ๋ ํต์ฌ ์๊ณ ๋ฆฌ์ฆ์
๋๋ค.
8+ ๋ชจ๋ ๋ชจ๋ธ ํ๋ผ๋ฏธํฐ๋ฅผ ์
๋ฐ์ดํธํ๋ ์ญํ ์ ํ๊ธฐ ๋๋ฌธ์, ๋๊ท๋ชจ ๋ชจ๋ธ์์๋ ์ข
์ข
ํ๋ จ ์ฑ๋ฅ์ ๋ณ๋ชฉ์ด ๋ ์ ์์ต๋๋ค.
9+ ์ด ๋ ์ํผ์์๋ ์ตํฐ๋ง์ด์ ์ ``torch.compile``์ ์ ์ฉํ์ฌ GPU ์ฑ๋ฅ ํฅ์์ ๊ด์ฐฐํด๋ณด๊ฒ ์ต๋๋ค.
1110
1211.. note::
1312
14- This tutorial requires PyTorch 2.2.0 or later .
13+ ์ด ํํ ๋ฆฌ์ผ์ PyTorch 2.2.0 ์ด์์ด ํ์ํฉ๋๋ค .
1514
16- Model Setup
15+ ๋ชจ๋ธ ์ค์
1716~~~~~~~~~~~~~~~~~~~~~
18- For this example, we'll use a simple sequence of linear layers .
19- Since we are only benchmarking the optimizer, the choice of model doesn't matter
20- because optimizer performance is a function of the number of parameters .
17+ ์ด ์์ ์์๋ ๊ฐ๋จํ ์ ํ ๊ณ์ธต์ ์ํ์ค๋ฅผ ์ฌ์ฉํ ๊ฒ์
๋๋ค .
18+ ์ฐ๋ฆฌ๋ ์ตํฐ๋ง์ด์ ์ ์ฑ๋ฅ๋ง ๋ฒค์น๋งํนํ ๊ฒ์ด๊ธฐ ๋๋ฌธ์, ๋ชจ๋ธ์ ์ ํ์ ์ค์ํ์ง ์์ต๋๋ค.
19+ ์ตํฐ๋ง์ด์ ์ ์ฑ๋ฅ์ ํ๋ผ๋ฏธํฐ์ ์์ ๋ฐ๋ผ ๋ฌ๋ผ์ง๊ธฐ ๋๋ฌธ์
๋๋ค .
2120
22- Depending on what machine you are using, your exact results may vary .
21+ ์ฌ์ฉํ๋ ๋จธ์ ์ ๋ฐ๋ผ ์ ํํ ๊ฒฐ๊ณผ๋ ๋ค๋ฅผ ์ ์์ต๋๋ค .
2322
2423.. code-block:: python
2524
@@ -32,19 +31,17 @@ Depending on what machine you are using, your exact results may vary.
3231 output = model(input)
3332 output.sum().backward()
3433
35- Setting up and running the optimizer benchmark
34+ ์ตํฐ๋ง์ด์ ๋ฒค์น๋งํฌ ์ค์ ๋ฐ ์คํ
3635~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37- In this example, we'll use the Adam optimizer
38- and create a helper function to wrap the step()
39- in ``torch.compile() ``.
36+ ์ด ์์ ์์๋ Adam ์ตํฐ๋ง์ด์ ๋ฅผ ์ฌ์ฉํ๊ณ , ``torch.compile()``์์ step()์ ๊ฐ์ธ๋ ๋์ฐ๋ฏธ ํจ์๋ฅผ ์์ฑํฉ๋๋ค.
4037
4138.. note::
4239
4340 ``torch.compile `` is only supported on cuda devices with compute capability >= 7.0
4441
4542.. code-block :: python
4643
47- # exit cleanly if we are on a device that doesn't support torch.compile
44+ # torch.compile์ด ์ง์๋์ง ์๋ ๋๋ฐ์ด์ค์์๋ ๊น๋ํ๊ฒ ์ข
๋ฃํฉ๋๋ค.
4845 if torch.cuda.get_device_capability() < (7 , 0 ):
4946 print (" Exiting because torch.compile is not supported on this device." )
5047 import sys
@@ -59,7 +56,7 @@ in ``torch.compile()``.
5956 opt.step()
6057
6158
62- # Let's define a helpful benchmarking function:
59+ # ์ ์ฉํ ๋ฒค์น๋งํน ํจ์๋ฅผ ์ ์ํด๋ด
์๋ค.
6360 import torch.utils.benchmark as benchmark
6461
6562
@@ -70,7 +67,7 @@ in ``torch.compile()``.
7067 return t0.blocked_autorange().mean * 1e6
7168
7269
73- # Warmup runs to compile the function
70+ # ํจ์๋ฅผ ์ปดํ์ผํ๊ธฐ ์ํ ์์
์คํ
7471 for _ in range (5 ):
7572 fn()
7673
@@ -82,13 +79,12 @@ in ``torch.compile()``.
8279 print (f " eager runtime: { eager_runtime} us " )
8380 print (f " compiled runtime: { compiled_runtime} us " )
8481
85- Sample Results :
82+ ์ํ ๊ฒฐ๊ณผ :
8683
8784* Eager runtime: 747.2437149845064us
8885* Compiled runtime: 392.07384741178us
8986
9087See Also
9188~~~~~~~~~
9289
93- * For an in-depth technical overview, see
94- `Compiling the optimizer with PT2 <https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669 >`__
90+ * ์ฌ์ธต์ ์ธ ๊ธฐ์ ๊ฐ์๋ฅผ ์ํด์, `PT2๋ก ์ตํฐ๋ง์ด์ ์ปดํ์ผํ๊ธฐ <https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669 >`__ ๋ฅผ ์ฐธ์กฐํ์ธ์.
0 commit comments