1- Getting Started with ``CommDebugMode ``
1+ PyTorchμμ ``CommDebugMode `` μμνκΈ°
22=====================================================
33
4- **Author **: `Anshul Sinha <https://github.com/sinhaanshul >`__
4+ **μ μ **: `Anshul Sinha <https://github.com/sinhaanshul >`__
5+ **μμ: ** `κΉμ μ§ <https://github.com/yujinpink1023 >`_
56
7+ μ΄ νν 리μΌμμλ PyTorchμ DistributedTensor(DTensor)μ ν¨κ» ``CommDebugMode `` λ₯Ό μ¬μ©νλ λ°©λ²μ μ΄ν΄λ΄
λλ€.
8+ μ΄λ₯Ό ν΅ν΄ λΆμ° νμ΅ νκ²½μμ μνλλ μ§ν© μ°μ°(collective operation)μ μΆμ νμ¬ λλ²κΉ
ν μ μμ΅λλ€.
69
7- In this tutorial, we will explore how to use ``CommDebugMode `` with PyTorch's
8- DistributedTensor (DTensor) for debugging by tracking collective operations in distributed training environments.
9-
10- Prerequisites
10+ μ¬μ μ€λΉ(Prerequisites)
1111---------------------
1212
1313* Python 3.8 - 3.11
14- * PyTorch 2.2 or later
14+ * PyTorch 2.2 μ΄μ
1515
1616
17- What is ``CommDebugMode `` and why is it useful
17+ ``CommDebugMode `` λ 무μμ΄λ©°, μ μ μ©νκ°
1818----------------------------------------------------
19- As the size of models continues to increase, users are seeking to leverage various combinations
20- of parallel strategies to scale up distributed training. However, the lack of interoperability
21- between existing solutions poses a significant challenge, primarily due to the absence of a
22- unified abstraction that can bridge these different parallelism strategies. To address this
23- issue, PyTorch has proposed `DistributedTensor(DTensor)
24- <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py> `_
25- which abstracts away the complexities of tensor communication in distributed training,
26- providing a seamless user experience. However, when dealing with existing parallelism solutions and
27- developing parallelism solutions using the unified abstraction like DTensor, the lack of transparency
28- about what and when the collective communications happens under the hood could make it challenging
29- for advanced users to identify and resolve issues. To address this challenge, `` CommDebugMode ``, a
30- Python context manager will serve as one of the primary debugging tools for DTensors, enabling
31- users to view when and why collective operations are happening when using DTensors, effectively
32- addressing this issue .
33-
34-
35- Using ``CommDebugMode ``
19+ λͺ¨λΈμ ν¬κΈ°κ° 컀μ§μ λ°λΌ, μ¬μ©μλ λ€μν λ³λ ¬ν(parallelism) μ λ΅μ μ‘°ν©νμ¬ λΆμ° νμ΅(distributed training)μ νμ₯νλ € ν©λλ€.
20+ νμ§λ§ κΈ°μ‘΄ μ루μ
κ°μ μνΈμ΄μ©μ±(interoperability) λΆμ‘±μ μ¬μ ν ν° κ³Όμ λ‘ λ¨μ μμ΅λλ€.
21+ μ΄λ μλ‘ λ€λ₯Έ λ³λ ¬ν μ λ΅μ μ°κ²°ν μ μλ ν΅ν©λ μΆμν(unified abstraction)κ° λΆμ‘±νκΈ° λλ¬Έμ
λλ€.
22+
23+ μ΄ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ PyTorchλ `DistributedTensor(DTensor)
24+ <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py> `_ λ₯Ό λμ
νμ΅λλ€.
25+ DTensorλ λΆμ° νμ΅ νκ²½μμ Tensor ν΅μ μ 볡μ‘μ±μ μΆμννμ¬ μ¬μ©μμκ² μΌκ΄λκ³ κ°κ²°ν κ²½νμ μ 곡ν©λλ€.
26+
27+ κ·Έλ¬λ μ΄λ¬ν ν΅ν© μΆμνλ₯Ό μ¬μ©νλ κ³Όμ μμ, λ΄λΆμ μΌλ‘ μ΄λ€ μμ μ μ§ν© ν΅μ μ΄ μνλλμ§ λͺ
νν μκΈ° μ΄λ €μ
28+ κ³ κΈ μ¬μ©μκ° λλ²κΉ
νκ±°λ λ¬Έμ λ₯Ό μλ³νκΈ° μ΄λ ΅μ΅λλ€.
29+
30+ μ΄λ `` CommDebugMode `` λ Pythonμ 컨ν
μ€νΈ λ§€λμ (context manager)λ‘μ
31+ DTensor μ¬μ© μ€ λ°μνλ μ§ν© μ°μ°μ μμ κ³Ό μ΄μ λ₯Ό μκ°μ μΌλ‘ μΆμ ν μ μλ μ£Όμ λλ²κΉ
λꡬμ
λλ€.
32+ μ΄λ₯Ό ν΅ν΄ μ¬μ©μλ μΈμ , μ collective μ°μ°μ΄ μ€νλλμ§λ₯Ό λͺ
νν νμ
ν μ μμ΅λλ€ .
33+
34+
35+ ``CommDebugMode `` μ¬μ©λ²
3636------------------------
3737
38- Here is how you can use ``CommDebugMode ``:
38+ λ€μμ ``CommDebugMode `` λ₯Ό μ¬μ©νλ μμμ
λλ€ :
3939
4040.. code-block :: python
4141
42- # The model used in this example is a MLPModule applying Tensor Parallel
42+ # μ΄ μμ μμ μ¬μ©λ λͺ¨λΈμ ν
μ λ³λ ¬ν(tensor parallelism)λ₯Ό μ μ©ν MLPModuleμ
λλ€.
4343 comm_mode = CommDebugMode()
44- with comm_mode:
45- output = model(inp)
44+ with comm_mode:
45+ output = model(inp)
4646
47- # print the operation level collective tracing information
47+ # μ°μ° λ¨μμ collective μΆμ μ 보λ₯Ό μΆλ ₯
4848 print (comm_mode.generate_comm_debug_tracing_table(noise_level = 0 ))
4949
50- # log the operation level collective tracing information to a file
50+ # μ°μ° λ¨μμ collective μΆμ μ 보λ₯Ό νμΌλ‘ κΈ°λ‘
5151 comm_mode.log_comm_debug_tracing_table_to_file(
5252 noise_level = 1 , file_name = " transformer_operation_log.txt"
5353 )
5454
55- # dump the operation level collective tracing information to json file,
56- # used in the visual browser below
55+ # μ°μ° λ¨μμ collective μΆμ μ 보λ₯Ό JSON νμΌλ‘ λ€ν(dump)
56+ # μλμ μκ°ν λΈλΌμ°μ μμ μ΄ JSON νμΌμ μ¬μ©ν μ μμ΅λλ€.
5757 comm_mode.generate_json_dump(noise_level = 2 )
5858
59- This is what the output looks like for a MLPModule at noise level 0:
59+
60+ λ€μμ noise level 0μμ MLPModuleμ μΆλ ₯ μμμ
λλ€:
6061
6162.. code-block :: python
6263
@@ -73,21 +74,24 @@ This is what the output looks like for a MLPModule at noise level 0:
7374 FORWARD PASS
7475 * c10d_functional.all_reduce: 1
7576
76- To use ``CommDebugMode ``, you must wrap the code running the model in ``CommDebugMode `` and call the API that
77- you want to use to display the data. You can also use a ``noise_level `` argument to control the verbosity
78- level of displayed information. Here is what each noise level displays:
7977
80- | 0. Prints module-level collective counts
81- | 1. Prints DTensor operations (not including trivial operations), module sharding information
82- | 2. Prints tensor operations (not including trivial operations)
83- | 3. Prints all operations
78+ ``CommDebugMode `` λ₯Ό μ¬μ©νλ €λ©΄ λͺ¨λΈ μ€ν μ½λλ₯Ό ``CommDebugMode `` λΈλ‘ μμ κ°μΈκ³ ,
79+ μνλ μ 보λ₯Ό νμνλ APIλ₯Ό νΈμΆνλ©΄ λ©λλ€.
80+
81+ λν ``noise_level `` μΈμλ₯Ό μ¬μ©ν΄ μΆλ ₯λλ μ 보μ μμΈ μμ€(verbosity level)μ μ μ΄ν μ μμ΅λλ€.
82+ κ° noise levelμ λ€μκ³Ό κ°μ μ 보λ₯Ό μ 곡ν©λλ€:
83+
84+ | 0. λͺ¨λ λ¨μμ collective μ°μ° κ°μ μΆλ ₯
85+ | 1. μ€μνμ§ μμ μ°μ°μ μ μΈν DTensor μ°μ° λ° λͺ¨λ μ€λ©(sharding) μ 보 μΆλ ₯
86+ | 2. μ€μνμ§ μμ μ°μ°μ μ μΈν ν
μ λ¨μ μ°μ° μΆλ ₯
87+ | 3. λͺ¨λ μ°μ° μΆλ ₯
8488
85- In the example above, you can see that the collective operation, all_reduce, occurs once in the forward pass
86- of the `` MLPModule ``. Furthermore, you can use `` CommDebugMode `` to pinpoint that the all-reduce operation happens
87- in the second linear layer of the `` MLPModule `` .
89+ μμ μμμμ λ³Ό μ μλ―μ΄, collective μ°μ°μΈ all_reduceλ `` MLPModule `` μ forward λ¨κ³μμ ν λ² λ°μν©λλ€.
90+ λν `` CommDebugMode `` λ₯Ό μ¬μ©νλ©΄ μ΄ all-reduce μ°μ°μ΄ `` MLPModule `` μ λ λ²μ§Έ μ ν κ³μΈ΅(linear layer)μμ
91+ λ°μνλ€λ μ μ μ νν νμΈν μ μμ΅λλ€ .
8892
8993
90- Below is the interactive module tree visualization that you can use to upload your own JSON dump :
94+ μλλ μμ±λ JSON νμΌμ μ
λ‘λνμ¬ μκ°μ μΌλ‘ νμν μ μλ μΈν°λν°λΈ λͺ¨λ νΈλ¦¬ μκ°ν(interactive module tree visualization)μ
λλ€ :
9195
9296.. raw :: html
9397
@@ -198,13 +202,15 @@ Below is the interactive module tree visualization that you can use to upload yo
198202 </body >
199203 </html >
200204
201- Conclusion
205+
206+
207+ κ²°λ‘
202208------------------------------------------
203209
204- In this recipe, we have learned how to use ``CommDebugMode `` to debug Distributed Tensors and
205- parallelism solutions that uses communication collectives with PyTorch. You can use your own
206- JSON outputs in the embedded visual browser.
210+ μ΄ λ μνΌμμλ PyTorchμ ``CommDebugMode `` λ₯Ό μ¬μ©νμ¬
211+ μ§ν© ν΅μ (collective communication)μ ν¬ν¨νλ DistributedTensor λ° λ³λ ¬ν μ루μ
μ λλ²κΉ
νλ λ°©λ²μ λ°°μ μ΅λλ€.
212+ λν μμ±λ JSON μΆλ ₯μ λ΄μ₯λ μκ°ν λΈλΌμ°μ μμ μ§μ λΆλ¬μ νμΈν μλ μμ΅λλ€.
207213
208- For more detailed information about ``CommDebugMode ``, see
209- `comm_mode_features_example.py
210- <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py> `_
214+ ``CommDebugMode `` μ λν λ³΄λ€ μμΈν λ΄μ©μ
215+ `comm_mode_features_example.py
216+ <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py> `_ λ₯Ό μ°Έκ³ νμΈμ.
0 commit comments