Skip to content

Commit f4dd6ea

Browse files
YoungyoungJyoungjun
andauthored
recipes_source/torch_compile_caching_tutorial.rst λ²ˆμ—­ (#1039)
* recipes_source/torch_compile_caching_tutorial.rst λ²ˆμ—­ * Update Korean translations in caching tutorial * Update torch_compile_caching_tutorial.rst * Fix minor typos in Mega-Cache tutorial * Update torch_compile_caching_tutorial.rst --------- Co-authored-by: youngjun <aakimyoungjun1109@gmail.com>
1 parent b54b597 commit f4dd6ea

File tree

1 file changed

+44
-42
lines changed

1 file changed

+44
-42
lines changed
Lines changed: 44 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,55 @@
1-
Compile Time Caching in ``torch.compile``
1+
``torch.compile``의 컴파일 μ‹œμ  캐싱
22
=========================================================
3-
**Author:** `Oguz Ulgen <https://github.com/oulgen>`_
4-
3+
**μ €μž:** `Oguz Ulgen <https://github.com/oulgen>`_
4+
**λ²ˆμ—­:** `κΉ€μ˜μ€€ <https://github.com/YoungyoungJ>`_
55
Introduction
66
------------------
77
8-
PyTorch Compiler provides several caching offerings to reduce compilation latency.
9-
This recipe will explain these offerings in detail to help users pick the best option for their use case.
8+
PyTorch CompilerλŠ” 컴파일 μ§€μ—° μ‹œκ°„μ„ 쀄이기 μœ„ν•΄ μ—¬λŸ¬ κ°€μ§€ 캐싱 κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€.
9+
이 λ ˆμ‹œν”Όμ—μ„œλŠ” μ΄λŸ¬ν•œ 캐싱 κΈ°λŠ₯듀을 μžμ„Ένžˆ μ„€λͺ…ν•˜κ³ , μ‚¬μš©μžκ°€ μžμ‹ μ˜ ν™œμš© λͺ©μ μ— κ°€μž₯ μ ν•©ν•œ μ˜΅μ…˜μ„ 선택할 수 μžˆλ„λ‘ μ•ˆλ‚΄ν•©λ‹ˆλ‹€.
1010
11-
Check out `Compile Time Caching Configurations <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html>`__ for how to configure these caches.
11+
μΊμ‹œλ₯Ό μ„€μ •ν•˜λŠ” 방법은 `컴파일 μ‹œμ  캐싱 μ„€μ • <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html>`__ λ¬Έμ„œλ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
1212
13-
Also check out our caching benchmark at `PT CacheBench Benchmarks <https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=TorchCache+Benchmark>`__.
13+
λ˜ν•œ `PT CacheBench 벀치마크 <https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=TorchCache+Benchmark>`__ μ—μ„œ 캐싱 μ„±λŠ₯ 비ꡐ 결과도 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
1414
15-
Prerequisites
15+
사전 μ€€λΉ„ 사항
1616
-------------------
1717
18-
Before starting this recipe, make sure that you have the following:
18+
이 λ ˆμ‹œν”Όλ₯Ό μ‹œμž‘ν•˜κΈ° 전에 λ‹€μŒ ν•­λͺ©μ„ μ€€λΉ„ν–ˆλŠ”μ§€ ν™•μΈν•˜μ„Έμš”.
1919
20-
* Basic understanding of ``torch.compile``. See:
20+
* ``torch.compile`` 에 λŒ€ν•œ 기본적인 이해가 ν•„μš”ν•©λ‹ˆλ‹€. μ•„λž˜ 자료λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
2121

2222
* `torch.compiler API documentation <https://pytorch.org/docs/stable/torch.compiler.html#torch-compiler>`__
2323
* `Introduction to torch.compile <https://tutorials.pytorch.kr/intermediate/torch_compile_tutorial.html>`__
2424
* `Triton language documentation <https://triton-lang.org/main/index.html>`__
2525

26-
* PyTorch 2.4 or later
26+
* PyTorch 2.4 이상 버전
2727

28-
Caching Offerings
28+
캐싱 κΈ°λŠ₯
2929
---------------------
3030

31-
``torch.compile`` provides the following caching offerings:
31+
``torch.compile`` 은 λ‹€μŒκ³Ό 같은 캐싱 κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€.
3232

33-
* End to end caching (also known as ``Mega-Cache``)
34-
* Modular caching of ``TorchDynamo``, ``TorchInductor``, and ``Triton``
33+
* μ—”λ“œνˆ¬μ—”λ“œ 캐싱 (``Mega-Cache`` 라고도 뢈림)
34+
* ``TorchDynamo``, ``TorchInductor``, ``Triton`` λͺ¨λ“ˆλ³„ 캐싱
3535

36-
It is important to note that caching validates that the cache artifacts are used with the same PyTorch and Triton version, as well as, same GPU when device is set to be cuda.
36+
μΊμ‹œκ°€ μ˜¬λ°”λ₯΄κ²Œ λ™μž‘ν•˜κΈ° μœ„ν•΄μ„œλŠ” μΊμ‹œ μ•„ν‹°νŒ©νŠΈκ°€ λ™μΌν•œ PyTorch 및 Triton λ²„μ „μ—μ„œ μƒμ„±λœ 것이어야 ν•˜λ©°,
37+
λ””λ°”μ΄μŠ€κ°€ CUDA둜 μ„€μ •λœ κ²½μš°μ—λŠ” 같은 GPUμ—μ„œ μ‚¬μš©λ˜μ–΄μ•Ό ν•œλ‹€λŠ” 점에 μœ μ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.
3738

38-
``torch.compile`` end-to-end caching (``Mega-Cache``)
39+
``torch.compile`` μ—”λ“œνˆ¬μ—”λ“œ 캐싱 (``Mega-Cache``)
3940
------------------------------------------------------------
4041

41-
End to end caching, from here onwards referred to ``Mega-Cache``, is the ideal solution for users looking for a portable caching solution that can be stored in a database and can later be fetched possibly on a separate machine.
42+
``Mega-Cache`` 둜 μ§€μΉ­λ˜λŠ” μ—”λ“œνˆ¬μ—”λ“œ 캐싱은, μΊμ‹œ 데이터λ₯Ό λ°μ΄ν„°λ² μ΄μŠ€μ— μ €μž₯ν•΄ λ‹€λ₯Έ λ¨Έμ‹ μ—μ„œλ„ 뢈러올 수 μžˆλŠ” 이식 κ°€λŠ₯ν•œ(portable) 캐싱 μ†”λ£¨μ…˜μ„ μ°ΎλŠ” μ‚¬μš©μžμ—κ²Œ 이상적인 λ°©λ²•μž…λ‹ˆλ‹€.
4243

43-
``Mega-Cache`` provides two compiler APIs:
44+
``Mega-Cache`` λŠ” λ‹€μŒ 두 κ°€μ§€ 컴파일러 APIλ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.
4445

4546
* ``torch.compiler.save_cache_artifacts()``
4647
* ``torch.compiler.load_cache_artifacts()``
4748

48-
The intended use case is after compiling and executing a model, the user calls ``torch.compiler.save_cache_artifacts()`` which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call ``torch.compiler.load_cache_artifacts()`` with these artifacts to pre-populate the ``torch.compile`` caches in order to jump-start their cache.
49+
일반적인 μ‚¬μš© 방식은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€. λͺ¨λΈμ„ μ»΄νŒŒμΌν•˜κ³  μ‹€ν–‰ν•œ ν›„, μ‚¬μš©μžλŠ” ``torch.compiler.save_cache_artifacts()`` ν•¨μˆ˜λ₯Ό ν˜ΈμΆœν•˜μ—¬ 이식 κ°€λŠ₯ν•œ ν˜•νƒœμ˜ 컴파일러 μ•„ν‹°νŒ©νŠΈλ₯Ό λ°˜ν™˜λ°›μŠ΅λ‹ˆλ‹€.
50+
κ·Έ ν›„, λ‹€λ₯Έ λ¨Έμ‹ μ—μ„œ 이 μ•„ν‹°νŒ©νŠΈλ₯Ό ``torch.compiler.load_cache_artifacts()`` 에 μ „λ‹¬ν•˜μ—¬ ``torch.compile`` μΊμ‹œλ₯Ό 미리 μ±„μ›Œ μΊμ‹œλ₯Ό λΉ λ₯΄κ²Œ μ΄ˆκΈ°ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
4951

50-
Consider the following example. First, compile and save the cache artifacts.
52+
λ‹€μŒ μ˜ˆμ‹œλ₯Ό μ‚΄νŽ΄λ³΄μ„Έμš”. λ¨Όμ € λͺ¨λΈμ„ μ»΄νŒŒμΌν•˜κ³  μΊμ‹œ μ•„ν‹°νŒ©νŠΈλ₯Ό μ €μž₯ν•©λ‹ˆλ‹€.
5153

5254
.. code-block:: python
5355
@@ -65,43 +67,43 @@ Consider the following example. First, compile and save the cache artifacts.
6567
assert artifacts is not None
6668
artifact_bytes, cache_info = artifacts
6769
68-
# Now, potentially store artifact_bytes in a database
69-
# You can use cache_info for logging
70+
# 이제 artifact_bytesλ₯Ό λ°μ΄ν„°λ² μ΄μŠ€μ— μ €μž₯ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
71+
# cache_infoλŠ” 기둝(logging)ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
7072
71-
Later, you can jump-start the cache by the following:
73+
이후에 μ•„λž˜ λ°©λ²•μœΌλ‘œ μΊμ‹œλ₯Ό 미리 λΆˆλŸ¬μ™€ μ‹€ν–‰ 속도λ₯Ό 높일 수 μžˆμŠ΅λ‹ˆλ‹€.
7274

7375
.. code-block:: python
7476
75-
# Potentially download/fetch the artifacts from the database
77+
# λ°μ΄ν„°λ² μ΄μŠ€μ—μ„œ μ•„ν‹°νŒ©νŠΈλ₯Ό λ‹€μš΄λ‘œλ“œν•˜κ±°λ‚˜ 뢈러올 μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
7678
torch.compiler.load_cache_artifacts(artifact_bytes)
7779
78-
This operation populates all the modular caches that will be discussed in the next section, including ``PGO``, ``AOTAutograd``, ``Inductor``, ``Triton``, and ``Autotuning``.
80+
이 μž‘μ—…μ€ λ‹€μŒ μ„Ήμ…˜μ—μ„œ λ‹€λ£° λͺ¨λ“  λͺ¨λ“ˆλ³„ μΊμ‹œ(modular caches)λ₯Ό 미리 μ±„μ›λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” ``PGO``, ``AOTAutograd``, ``Inductor``, ``Triton``, 그리고 ``Autotuning`` 이 ν¬ν•¨λ©λ‹ˆλ‹€.
7981

8082

81-
Modular caching of ``TorchDynamo``, ``TorchInductor``, and ``Triton``
83+
``TorchDynamo``, ``TorchInductor``, 그리고 ``Triton`` 의 λͺ¨λ“ˆλ³„ 캐싱
8284
-----------------------------------------------------------
8385

84-
The aforementioned ``Mega-Cache`` is composed of individual components that can be used without any user intervention. By default, PyTorch Compiler comes with local on-disk caches for ``TorchDynamo``, ``TorchInductor``, and ``Triton``. These caches include:
86+
μ•žμ„œ μ–ΈκΈ‰ν•œ ``Mega-Cache`` λŠ” μ‚¬μš©μžμ˜ 별도 κ°œμž… 없이 μžλ™μœΌλ‘œ λ™μž‘ν•˜λŠ” κ°œλ³„ κ΅¬μ„±μš”μ†Œλ“€λ‘œ 이루어져 μžˆμŠ΅λ‹ˆλ‹€. 기본적으둜 PyTorch CompilerλŠ” ``TorchDynamo``, ``TorchInductor``, 그리고 ``Triton`` 을 μœ„ν•œ
87+
둜컬 λ””μŠ€ν¬ 기반(on-disk) μΊμ‹œλ₯Ό ν•¨κ»˜ μ œκ³΅ν•©λ‹ˆλ‹€. μ΄λŸ¬ν•œ μΊμ‹œμ—λŠ” λ‹€μŒμ΄ ν¬ν•¨λ©λ‹ˆλ‹€.
8588

86-
* ``FXGraphCache``: A cache of graph-based IR components used in compilation.
87-
* ``TritonCache``: A cache of Triton-compilation results, including ``cubin`` files generated by ``Triton`` and other caching artifacts.
88-
* ``InductorCache``: A bundle of ``FXGraphCache`` and ``Triton`` cache.
89-
* ``AOTAutogradCache``: A cache of joint graph artifacts.
90-
* ``PGO-cache``: A cache of dynamic shape decisions to reduce number of recompilations.
89+
* ``FXGraphCache``: 컴파일 κ³Όμ •μ—μ„œ μ‚¬μš©λ˜λŠ” κ·Έλž˜ν”„ 기반 쀑간 ν‘œν˜„(IR, Intermediate Representation) κ΅¬μ„±μš”μ†Œλ₯Ό μ €μž₯ν•˜λŠ” μΊμ‹œμž…λ‹ˆλ‹€.
90+
* ``TritonCache``: 컴파일 κ²°κ³Όλ₯Ό μ €μž₯ν•˜λŠ” μΊμ‹œλ‘œ, ``Triton`` 에 μ˜ν•΄ μƒμ„±λœ ``cubin`` 파일과 기타 캐싱 κ΄€λ ¨ μ•„ν‹°νŒ©νŠΈλ₯Ό ν¬ν•¨ν•©λ‹ˆλ‹€.
91+
* ``InductorCache``: ``FXGraphCache`` 와 ``Triton`` μΊμ‹œλ₯Ό ν•¨κ»˜ ν¬ν•¨ν•˜λŠ” 톡합 μΊμ‹œ(bundled cache) μž…λ‹ˆλ‹€.
92+
* ``AOTAutogradCache``: 톡합 κ·Έλž˜ν”„(joint graph) κ΄€λ ¨ μ•„ν‹°νŒ©νŠΈλ₯Ό μ €μž₯ν•˜λŠ” μΊμ‹œμž…λ‹ˆλ‹€.
93+
* ``PGO-cache``: 동적 μž…λ ₯ ν˜•νƒœ 에 λŒ€ν•œ κ²°μ • 정보λ₯Ό μ €μž₯ν•˜μ—¬ 재컴파일 횟수λ₯Ό μ€„μ΄λŠ” 데 μ‚¬μš©λ˜λŠ” μΊμ‹œμž…λ‹ˆλ‹€.
9194
* `AutotuningCache <https://github.com/pytorch/pytorch/blob/795a6a0affd349adfb4e3df298b604b74f27b44e/torch/_inductor/runtime/autotune_cache.py#L116>`__:
92-
* ``Inductor`` generates ``Triton`` kernels and benchmarks them to select the fastest kernels.
93-
* ``torch.compile``'s built-in ``AutotuningCache`` caches these results.
95+
* ``Inductor`` λŠ” ``Triton`` 컀널을 μƒμ„±ν•˜κ³ , κ°€μž₯ λΉ λ₯Έ 컀널을 μ„ νƒν•˜κΈ° μœ„ν•΄ λˆ„κ°€ 더 λΉ λ₯Έμ§€, νš¨μœ¨μ μΈμ§€λ₯Ό λΉ„κ΅ν•©λ‹ˆλ‹€.
96+
* ``torch.compile`` 에 λ‚΄μž₯된 ``AutotuningCache`` λŠ” 이 κ²°κ³Όλ₯Ό μΊμ‹±ν•©λ‹ˆλ‹€.
9497

95-
All these cache artifacts are written to ``TORCHINDUCTOR_CACHE_DIR`` which by default will look like ``/tmp/torchinductor_myusername``.
98+
이 λͺ¨λ“  μΊμ‹œ μ•„ν‹°νŒ©νŠΈλŠ” ``TORCHINDUCTOR_CACHE_DIR`` κ²½λ‘œμ— μ €μž₯λ©λ‹ˆλ‹€. κΈ°λ³Έκ°’(default)은 ``/tmp/torchinductor_myusername`` ν˜•νƒœλ‘œ μ„€μ •λ©λ‹ˆλ‹€.
9699

97100

98-
Remote Caching
99-
----------------
101+
원격 캐싱(Remote Caching)
102+
-------------------------
100103

101-
We also provide a remote caching option for users who would like to take advantage of a Redis based cache. Check out `Compile Time Caching Configurations <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html>`__ to learn more about how to enable the Redis-based caching.
104+
Redis 기반 μΊμ‹œλ₯Ό ν™œμš©ν•˜κ³ μž ν•˜λŠ” μ‚¬μš©μžλ₯Ό μœ„ν•΄ 원격 캐싱 μ˜΅μ…˜λ„ μ œκ³΅ν•©λ‹ˆλ‹€. Redis 기반 캐싱을 ν™œμ„±ν™”ν•˜λŠ” 방법에 λŒ€ν•΄μ„œλŠ” `컴파일 μ‹œμ  캐싱 μ„€μ • <https://tutorials.pytorch.kr/recipes/torch_compile_caching_configuration_tutorial.html>`__ λ¬Έμ„œλ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
102105

103106

104-
Conclusion
107+
κ²°λ‘ 
105108
-------------
106-
In this recipe, we have learned that PyTorch Inductor's caching mechanisms significantly reduce compilation latency by utilizing both local and remote caches, which operate seamlessly in the background without requiring user intervention.
107-
109+
이 λ ˆμ‹œν”Όμ—μ„œλŠ” PyTorch Inductor의 캐싱 λ©”μ»€λ‹ˆμ¦˜μ΄ 둜컬 μΊμ‹œμ™€ 원격 μΊμ‹œλ₯Ό λͺ¨λ‘ ν™œμš©ν•˜μ—¬ 컴파일 μ§€μ—° μ‹œκ°„μ„ 크게 쀄일 수 μžˆλ‹€λŠ” 점을 λ°°μ› μŠ΅λ‹ˆλ‹€. μ΄λŸ¬ν•œ μΊμ‹œλ“€μ€ μ‚¬μš©μžμ˜ 별도 κ°œμž… 없이 λ°±κ·ΈλΌμš΄λ“œμ—μ„œ μ›ν™œν•˜κ²Œ μž‘λ™ν•©λ‹ˆλ‹€.

0 commit comments

Comments
Β (0)