1- Compile Time Caching in ``torch.compile ``
1+ ``torch.compile``μ μ»΄νμΌ μμ μΊμ±
22=========================================================
3- **Author : ** `Oguz Ulgen <https://github.com/oulgen >`_
4-
3+ **μ μ :** `Oguz Ulgen <https://github.com/oulgen>`_
4+ **λ²μ:** `κΉμμ€ <https://github.com/YoungyoungJ>`_
55Introduction
66------------------
77
8- PyTorch Compiler provides several caching offerings to reduce compilation latency .
9- This recipe will explain these offerings in detail to help users pick the best option for their use case .
8+ PyTorch Compilerλ μ»΄νμΌ μ§μ° μκ°μ μ€μ΄κΈ° μν΄ μ¬λ¬ κ°μ§ μΊμ± κΈ°λ₯μ μ 곡ν©λλ€ .
9+ μ΄ λ μνΌμμλ μ΄λ¬ν μΊμ± κΈ°λ₯λ€μ μμΈν μ€λͺ
νκ³ , μ¬μ©μκ° μμ μ νμ© λͺ©μ μ κ°μ₯ μ ν©ν μ΅μ
μ μ νν μ μλλ‘ μλ΄ν©λλ€ .
1010
11- Check out ` Compile Time Caching Configurations <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html >`__ for how to configure these caches .
11+ μΊμλ₯Ό μ€μ νλ λ°©λ²μ `μ»΄νμΌ μμ μΊμ± μ€μ <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html>`__ λ¬Έμλ₯Ό μ°Έκ³ νμΈμ .
1212
13- Also check out our caching benchmark at `PT CacheBench Benchmarks <https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=TorchCache+Benchmark >`__.
13+ λν `PT CacheBench λ²€μΉλ§ν¬ <https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=TorchCache+Benchmark>`__ μμ μΊμ± μ±λ₯ λΉκ΅ κ²°κ³Όλ νμΈν μ μμ΅λλ€ .
1414
15- Prerequisites
15+ μ¬μ μ€λΉ μ¬ν
1616-------------------
1717
18- Before starting this recipe, make sure that you have the following:
18+ μ΄ λ μνΌλ₯Ό μμνκΈ° μ μ λ€μ νλͺ©μ μ€λΉνλμ§ νμΈνμΈμ.
1919
20- * Basic understanding of ``torch.compile ``. See:
20+ * ``torch.compile `` μ λν κΈ°λ³Έμ μΈ μ΄ν΄κ° νμν©λλ€. μλ μλ£λ₯Ό μ°Έκ³ νμΈμ.
2121
2222 * `torch.compiler API documentation <https://pytorch.org/docs/stable/torch.compiler.html#torch-compiler >`__
2323 * `Introduction to torch.compile <https://tutorials.pytorch.kr/intermediate/torch_compile_tutorial.html >`__
2424 * `Triton language documentation <https://triton-lang.org/main/index.html >`__
2525
26- * PyTorch 2.4 or later
26+ * PyTorch 2.4 μ΄μ λ²μ
2727
28- Caching Offerings
28+ μΊμ± κΈ°λ₯
2929---------------------
3030
31- ``torch.compile `` provides the following caching offerings:
31+ ``torch.compile `` μ λ€μκ³Ό κ°μ μΊμ± κΈ°λ₯μ μ 곡ν©λλ€.
3232
33- * End to end caching (also known as ``Mega-Cache ``)
34- * Modular caching of ``TorchDynamo ``, ``TorchInductor ``, and ``Triton ``
33+ * μλν¬μλ μΊμ± ( ``Mega-Cache `` λΌκ³ λ λΆλ¦Ό )
34+ * ``TorchDynamo ``, ``TorchInductor ``, ``Triton `` λͺ¨λλ³ μΊμ±
3535
36- It is important to note that caching validates that the cache artifacts are used with the same PyTorch and Triton version, as well as, same GPU when device is set to be cuda.
36+ μΊμκ° μ¬λ°λ₯΄κ² λμνκΈ° μν΄μλ μΊμ μν°ν©νΈκ° λμΌν PyTorch λ° Triton λ²μ μμ μμ±λ κ²μ΄μ΄μΌ νλ©°,
37+ λλ°μ΄μ€κ° CUDAλ‘ μ€μ λ κ²½μ°μλ κ°μ GPUμμ μ¬μ©λμ΄μΌ νλ€λ μ μ μ μν΄μΌ ν©λλ€.
3738
38- ``torch.compile `` end-to-end caching (``Mega-Cache ``)
39+ ``torch.compile `` μλν¬μλ μΊμ± (``Mega-Cache ``)
3940------------------------------------------------------------
4041
41- End to end caching, from here onwards referred to ``Mega-Cache ``, is the ideal solution for users looking for a portable caching solution that can be stored in a database and can later be fetched possibly on a separate machine .
42+ ``Mega-Cache `` λ‘ μ§μΉλλ μλν¬μλ μΊμ±μ, μΊμ λ°μ΄ν°λ₯Ό λ°μ΄ν°λ² μ΄μ€μ μ μ₯ν΄ λ€λ₯Έ λ¨Έμ μμλ λΆλ¬μ¬ μ μλ μ΄μ κ°λ₯ν(portable) μΊμ± μ루μ
μ μ°Ύλ μ¬μ©μμκ² μ΄μμ μΈ λ°©λ²μ
λλ€ .
4243
43- ``Mega-Cache `` provides two compiler APIs:
44+ ``Mega-Cache `` λ λ€μ λ κ°μ§ μ»΄νμΌλ¬ APIλ₯Ό μ 곡ν©λλ€.
4445
4546* ``torch.compiler.save_cache_artifacts() ``
4647* ``torch.compiler.load_cache_artifacts() ``
4748
48- The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts() `` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts() `` with these artifacts to pre-populate the ``torch.compile `` caches in order to jump-start their cache.
49+ μΌλ°μ μΈ μ¬μ© λ°©μμ λ€μκ³Ό κ°μ΅λλ€. λͺ¨λΈμ μ»΄νμΌνκ³ μ€νν ν, μ¬μ©μλ ``torch.compiler.save_cache_artifacts() `` ν¨μλ₯Ό νΈμΆνμ¬ μ΄μ κ°λ₯ν ννμ μ»΄νμΌλ¬ μν°ν©νΈλ₯Ό λ°νλ°μ΅λλ€.
50+ κ·Έ ν, λ€λ₯Έ λ¨Έμ μμ μ΄ μν°ν©νΈλ₯Ό ``torch.compiler.load_cache_artifacts() `` μ μ λ¬νμ¬ ``torch.compile `` μΊμλ₯Ό 미리 μ±μ μΊμλ₯Ό λΉ λ₯΄κ² μ΄κΈ°νν μ μμ΅λλ€.
4951
50- Consider the following example. First, compile and save the cache artifacts .
52+ λ€μ μμλ₯Ό μ΄ν΄λ³΄μΈμ. λ¨Όμ λͺ¨λΈμ μ»΄νμΌνκ³ μΊμ μν°ν©νΈλ₯Ό μ μ₯ν©λλ€ .
5153
5254.. code-block :: python
5355
@@ -65,43 +67,43 @@ Consider the following example. First, compile and save the cache artifacts.
6567 assert artifacts is not None
6668 artifact_bytes, cache_info = artifacts
6769
68- # Now, potentially store artifact_bytes in a database
69- # You can use cache_info for logging
70+ # μ΄μ artifact_bytesλ₯Ό λ°μ΄ν°λ² μ΄μ€μ μ μ₯ν μλ μμ΅λλ€.
71+ # cache_infoλ κΈ°λ‘(logging)ν μ μμ΅λλ€.
7072
71- Later, you can jump-start the cache by the following:
73+ μ΄νμ μλ λ°©λ²μΌλ‘ μΊμλ₯Ό 미리 λΆλ¬μ μ€ν μλλ₯Ό λμΌ μ μμ΅λλ€.
7274
7375.. code-block :: python
7476
75- # Potentially download/fetch the artifacts from the database
77+ # λ°μ΄ν°λ² μ΄μ€μμ μν°ν©νΈλ₯Ό λ€μ΄λ‘λνκ±°λ λΆλ¬μ¬ μλ μμ΅λλ€.
7678 torch.compiler.load_cache_artifacts(artifact_bytes)
7779
78- This operation populates all the modular caches that will be discussed in the next section, including ``PGO ``, ``AOTAutograd ``, ``Inductor ``, ``Triton ``, and ``Autotuning ``.
80+ μ΄ μμ
μ λ€μ μΉμ
μμ λ€λ£° λͺ¨λ λͺ¨λλ³ μΊμ( modular caches)λ₯Ό 미리 μ±μλλ€. μ¬κΈ°μλ ``PGO ``, ``AOTAutograd ``, ``Inductor ``, ``Triton ``, κ·Έλ¦¬κ³ ``Autotuning `` μ΄ ν¬ν¨λ©λλ€ .
7981
8082
81- Modular caching of ``TorchDynamo ``, ``TorchInductor ``, and ``Triton ``
83+ ``TorchDynamo ``, ``TorchInductor ``, κ·Έλ¦¬κ³ ``Triton `` μ λͺ¨λλ³ μΊμ±
8284-----------------------------------------------------------
8385
84- The aforementioned ``Mega-Cache `` is composed of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo ``, ``TorchInductor ``, and ``Triton ``. These caches include:
86+ μμ μΈκΈν ``Mega-Cache `` λ μ¬μ©μμ λ³λ κ°μ
μμ΄ μλμΌλ‘ λμνλ κ°λ³ ꡬμ±μμλ€λ‘ μ΄λ£¨μ΄μ Έ μμ΅λλ€. κΈ°λ³Έμ μΌλ‘ PyTorch Compilerλ ``TorchDynamo ``, ``TorchInductor ``, κ·Έλ¦¬κ³ ``Triton `` μ μν
87+ λ‘컬 λμ€ν¬ κΈ°λ°(on-disk) μΊμλ₯Ό ν¨κ» μ 곡ν©λλ€. μ΄λ¬ν μΊμμλ λ€μμ΄ ν¬ν¨λ©λλ€.
8588
86- * ``FXGraphCache ``: A cache of graph-based IR components used in compilation .
87- * ``TritonCache ``: A cache of Triton-compilation results, including `` cubin `` files generated by ``Triton `` and other caching artifacts .
88- * ``InductorCache ``: A bundle of ``FXGraphCache `` and ``Triton `` cache.
89- * ``AOTAutogradCache ``: A cache of joint graph artifacts .
90- * ``PGO-cache ``: A cache of dynamic shape decisions to reduce number of recompilations .
89+ * ``FXGraphCache ``: μ»΄νμΌ κ³Όμ μμ μ¬μ©λλ κ·Έλν κΈ°λ° μ€κ° νν(IR, Intermediate Representation) ꡬμ±μμλ₯Ό μ μ₯νλ μΊμμ
λλ€ .
90+ * ``TritonCache ``: μ»΄νμΌ κ²°κ³Όλ₯Ό μ μ₯νλ μΊμλ‘, `` Triton `` μ μν΄ μμ±λ ``cubin `` νμΌκ³Ό κΈ°ν μΊμ± κ΄λ ¨ μν°ν©νΈλ₯Ό ν¬ν¨ν©λλ€ .
91+ * ``InductorCache ``: ``FXGraphCache `` μ ``Triton `` μΊμλ₯Ό ν¨κ» ν¬ν¨νλ ν΅ν© μΊμ(bundled cache) μ
λλ€ .
92+ * ``AOTAutogradCache ``: ν΅ν© κ·Έλν( joint graph) κ΄λ ¨ μν°ν©νΈλ₯Ό μ μ₯νλ μΊμμ
λλ€ .
93+ * ``PGO-cache ``: λμ μ
λ ₯ νν μ λν κ²°μ μ 보λ₯Ό μ μ₯νμ¬ μ¬μ»΄νμΌ νμλ₯Ό μ€μ΄λ λ° μ¬μ©λλ μΊμμ
λλ€ .
9194* `AutotuningCache <https://github.com/pytorch/pytorch/blob/795a6a0affd349adfb4e3df298b604b74f27b44e/torch/_inductor/runtime/autotune_cache.py#L116 >`__:
92- * ``Inductor `` generates ``Triton `` kernels and benchmarks them to select the fastest kernels .
93- * ``torch.compile ``'s built-in ``AutotuningCache `` caches these results .
95+ * ``Inductor `` λ ``Triton `` 컀λμ μμ±νκ³ , κ°μ₯ λΉ λ₯Έ 컀λμ μ ννκΈ° μν΄ λκ° λ λΉ λ₯Έμ§, ν¨μ¨μ μΈμ§λ₯Ό λΉκ΅ν©λλ€ .
96+ * ``torch.compile `` μ λ΄μ₯λ ``AutotuningCache `` λ μ΄ κ²°κ³Όλ₯Ό μΊμ±ν©λλ€ .
9497
95- All these cache artifacts are written to ``TORCHINDUCTOR_CACHE_DIR `` which by default will look like ``/tmp/torchinductor_myusername ``.
98+ μ΄ λͺ¨λ μΊμ μν°ν©νΈλ ``TORCHINDUCTOR_CACHE_DIR `` κ²½λ‘μ μ μ₯λ©λλ€. κΈ°λ³Έκ°( default)μ ``/tmp/torchinductor_myusername `` ννλ‘ μ€μ λ©λλ€ .
9699
97100
98- Remote Caching
99- ----------------
101+ μ격 μΊμ±( Remote Caching)
102+ -------------------------
100103
101- We also provide a remote caching option for users who would like to take advantage of a Redis based cache. Check out ` Compile Time Caching Configurations <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html >`__ to learn more about how to enable the Redis-based caching .
104+ Redis κΈ°λ° μΊμλ₯Ό νμ©νκ³ μ νλ μ¬μ©μλ₯Ό μν΄ μ격 μΊμ± μ΅μ
λ μ 곡ν©λλ€. Redis κΈ°λ° μΊμ±μ νμ±ννλ λ°©λ²μ λν΄μλ ` μ»΄νμΌ μμ μΊμ± μ€μ <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html >`__ λ¬Έμλ₯Ό μ°Έκ³ νμΈμ .
102105
103106
104- Conclusion
107+ κ²°λ‘
105108-------------
106- In this recipe, we have learned that PyTorch Inductor's caching mechanisms significantly reduce compilation latency by utilizing both local and remote caches, which operate seamlessly in the background without requiring user intervention.
107-
109+ μ΄ λ μνΌμμλ PyTorch Inductorμ μΊμ± λ©μ»€λμ¦μ΄ λ‘컬 μΊμμ μ격 μΊμλ₯Ό λͺ¨λ νμ©νμ¬ μ»΄νμΌ μ§μ° μκ°μ ν¬κ² μ€μΌ μ μλ€λ μ μ λ°°μ μ΅λλ€. μ΄λ¬ν μΊμλ€μ μ¬μ©μμ λ³λ κ°μ
μμ΄ λ°±κ·ΈλΌμ΄λμμ μννκ² μλν©λλ€.
0 commit comments