Skip to content

Commit a776f7c

Browse files
recipes: re-commit translation for distributed_comm_debug_mode (#1051)
* recipes: re-commit translation for distributed_comm_debug_mode (refs #1035) * additional review modification * fix: apply review comments μΆ”κ°€ μˆ˜μ •
1 parent 3b500e2 commit a776f7c

File tree

1 file changed

+58
-52
lines changed

1 file changed

+58
-52
lines changed

β€Žrecipes_source/distributed_comm_debug_mode.rstβ€Ž

Lines changed: 58 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,63 @@
1-
Getting Started with ``CommDebugMode``
1+
PyTorchμ—μ„œ ``CommDebugMode`` μ‹œμž‘ν•˜κΈ°
22
=====================================================
33

4-
**Author**: `Anshul Sinha <https://github.com/sinhaanshul>`__
4+
**μ €μž**: `Anshul Sinha <https://github.com/sinhaanshul>`__
5+
**μ—­μž:** `κΉ€μœ μ§„ <https://github.com/yujinpink1023>`_
56

7+
이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ” PyTorch의 DistributedTensor(DTensor)와 ν•¨κ»˜ ``CommDebugMode`` λ₯Ό μ‚¬μš©ν•˜λŠ” 방법을 μ‚΄νŽ΄λ΄…λ‹ˆλ‹€.
8+
이λ₯Ό 톡해 λΆ„μ‚° ν•™μŠ΅ ν™˜κ²½μ—μ„œ μˆ˜ν–‰λ˜λŠ” μ§‘ν•© μ—°μ‚°(collective operation)을 μΆ”μ ν•˜μ—¬ 디버깅할 수 μžˆμŠ΅λ‹ˆλ‹€.
69

7-
In this tutorial, we will explore how to use ``CommDebugMode`` with PyTorch's
8-
DistributedTensor (DTensor) for debugging by tracking collective operations in distributed training environments.
9-
10-
Prerequisites
10+
사전 μ€€λΉ„(Prerequisites)
1111
---------------------
1212

1313
* Python 3.8 - 3.11
14-
* PyTorch 2.2 or later
14+
* PyTorch 2.2 이상
1515

1616

17-
What is ``CommDebugMode`` and why is it useful
17+
``CommDebugMode`` λž€ 무엇이며, μ™œ μœ μš©ν•œκ°€
1818
----------------------------------------------------
19-
As the size of models continues to increase, users are seeking to leverage various combinations
20-
of parallel strategies to scale up distributed training. However, the lack of interoperability
21-
between existing solutions poses a significant challenge, primarily due to the absence of a
22-
unified abstraction that can bridge these different parallelism strategies. To address this
23-
issue, PyTorch has proposed `DistributedTensor(DTensor)
24-
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py>`_
25-
which abstracts away the complexities of tensor communication in distributed training,
26-
providing a seamless user experience. However, when dealing with existing parallelism solutions and
27-
developing parallelism solutions using the unified abstraction like DTensor, the lack of transparency
28-
about what and when the collective communications happens under the hood could make it challenging
29-
for advanced users to identify and resolve issues. To address this challenge, ``CommDebugMode``, a
30-
Python context manager will serve as one of the primary debugging tools for DTensors, enabling
31-
users to view when and why collective operations are happening when using DTensors, effectively
32-
addressing this issue.
33-
34-
35-
Using ``CommDebugMode``
19+
λͺ¨λΈμ˜ 크기가 컀짐에 따라, μ‚¬μš©μžλŠ” λ‹€μ–‘ν•œ 병렬화(parallelism) μ „λž΅μ„ μ‘°ν•©ν•˜μ—¬ λΆ„μ‚° ν•™μŠ΅(distributed training)을 ν™•μž₯ν•˜λ € ν•©λ‹ˆλ‹€.
20+
ν•˜μ§€λ§Œ κΈ°μ‘΄ μ†”λ£¨μ…˜ κ°„μ˜ μƒν˜Έμš΄μš©μ„±(interoperability) 뢀쑱은 μ—¬μ „νžˆ 큰 과제둜 남아 μžˆμŠ΅λ‹ˆλ‹€.
21+
μ΄λŠ” μ„œλ‘œ λ‹€λ₯Έ 병렬화 μ „λž΅μ„ μ—°κ²°ν•  수 μžˆλŠ” ν†΅ν•©λœ 좔상화(unified abstraction)κ°€ λΆ€μ‘±ν•˜κΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.
22+
23+
이 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ PyTorchλŠ” `DistributedTensor(DTensor)
24+
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py>`_ λ₯Ό λ„μž…ν–ˆμŠ΅λ‹ˆλ‹€.
25+
DTensorλŠ” λΆ„μ‚° ν•™μŠ΅ ν™˜κ²½μ—μ„œ Tensor ν†΅μ‹ μ˜ λ³΅μž‘μ„±μ„ μΆ”μƒν™”ν•˜μ—¬ μ‚¬μš©μžμ—κ²Œ μΌκ΄€λ˜κ³  κ°„κ²°ν•œ κ²½ν—˜μ„ μ œκ³΅ν•©λ‹ˆλ‹€.
26+
27+
κ·ΈλŸ¬λ‚˜ μ΄λŸ¬ν•œ 톡합 좔상화λ₯Ό μ‚¬μš©ν•˜λŠ” κ³Όμ •μ—μ„œ, λ‚΄λΆ€μ μœΌλ‘œ μ–΄λ–€ μ‹œμ μ— μ§‘ν•© 톡신이 μˆ˜ν–‰λ˜λŠ”μ§€ λͺ…ν™•νžˆ μ•ŒκΈ° μ–΄λ €μ›Œ
28+
κ³ κΈ‰ μ‚¬μš©μžκ°€ λ””λ²„κΉ…ν•˜κ±°λ‚˜ 문제λ₯Ό μ‹λ³„ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€.
29+
30+
μ΄λ•Œ ``CommDebugMode`` λŠ” Python의 μ»¨ν…μŠ€νŠΈ λ§€λ‹ˆμ €(context manager)λ‘œμ„œ
31+
DTensor μ‚¬μš© 쀑 λ°œμƒν•˜λŠ” μ§‘ν•© μ—°μ‚°μ˜ μ‹œμ κ³Ό 이유λ₯Ό μ‹œκ°μ μœΌλ‘œ 좔적할 수 μžˆλŠ” μ£Όμš” 디버깅 λ„κ΅¬μž…λ‹ˆλ‹€.
32+
이λ₯Ό 톡해 μ‚¬μš©μžλŠ” μ–Έμ œ, μ™œ collective 연산이 μ‹€ν–‰λ˜λŠ”μ§€λ₯Ό λͺ…ν™•νžˆ νŒŒμ•…ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
33+
34+
35+
``CommDebugMode`` μ‚¬μš©λ²•
3636
------------------------
3737

38-
Here is how you can use ``CommDebugMode``:
38+
λ‹€μŒμ€ ``CommDebugMode`` λ₯Ό μ‚¬μš©ν•˜λŠ” μ˜ˆμ‹œμž…λ‹ˆλ‹€:
3939

4040
.. code-block:: python
4141
42-
# The model used in this example is a MLPModule applying Tensor Parallel
42+
# 이 μ˜ˆμ œμ—μ„œ μ‚¬μš©λœ λͺ¨λΈμ€ ν…μ„œ 병렬화(tensor parallelism)λ₯Ό μ μš©ν•œ MLPModuleμž…λ‹ˆλ‹€.
4343
comm_mode = CommDebugMode()
44-
with comm_mode:
45-
output = model(inp)
44+
with comm_mode:
45+
output = model(inp)
4646
47-
# print the operation level collective tracing information
47+
# μ—°μ‚° λ‹¨μœ„μ˜ collective 좔적 정보λ₯Ό 좜λ ₯
4848
print(comm_mode.generate_comm_debug_tracing_table(noise_level=0))
4949
50-
# log the operation level collective tracing information to a file
50+
# μ—°μ‚° λ‹¨μœ„μ˜ collective 좔적 정보λ₯Ό 파일둜 기둝
5151
comm_mode.log_comm_debug_tracing_table_to_file(
5252
noise_level=1, file_name="transformer_operation_log.txt"
5353
)
5454
55-
# dump the operation level collective tracing information to json file,
56-
# used in the visual browser below
55+
# μ—°μ‚° λ‹¨μœ„μ˜ collective 좔적 정보λ₯Ό JSON 파일둜 덀프(dump)
56+
# μ•„λž˜μ˜ μ‹œκ°ν™” λΈŒλΌμš°μ €μ—μ„œ 이 JSON νŒŒμΌμ„ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
5757
comm_mode.generate_json_dump(noise_level=2)
5858
59-
This is what the output looks like for a MLPModule at noise level 0:
59+
60+
λ‹€μŒμ€ noise level 0μ—μ„œ MLPModule의 좜λ ₯ μ˜ˆμ‹œμž…λ‹ˆλ‹€:
6061

6162
.. code-block:: python
6263
@@ -73,21 +74,24 @@ This is what the output looks like for a MLPModule at noise level 0:
7374
FORWARD PASS
7475
*c10d_functional.all_reduce: 1
7576
76-
To use ``CommDebugMode``, you must wrap the code running the model in ``CommDebugMode`` and call the API that
77-
you want to use to display the data. You can also use a ``noise_level`` argument to control the verbosity
78-
level of displayed information. Here is what each noise level displays:
7977
80-
| 0. Prints module-level collective counts
81-
| 1. Prints DTensor operations (not including trivial operations), module sharding information
82-
| 2. Prints tensor operations (not including trivial operations)
83-
| 3. Prints all operations
78+
``CommDebugMode`` λ₯Ό μ‚¬μš©ν•˜λ €λ©΄ λͺ¨λΈ μ‹€ν–‰ μ½”λ“œλ₯Ό ``CommDebugMode`` 블둝 μ•ˆμ— 감싸고,
79+
μ›ν•˜λŠ” 정보λ₯Ό ν‘œμ‹œν•˜λŠ” APIλ₯Ό ν˜ΈμΆœν•˜λ©΄ λ©λ‹ˆλ‹€.
80+
81+
λ˜ν•œ ``noise_level`` 인자λ₯Ό μ‚¬μš©ν•΄ 좜λ ₯λ˜λŠ” μ •λ³΄μ˜ 상세 μˆ˜μ€€(verbosity level)을 μ œμ–΄ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
82+
각 noise level은 λ‹€μŒκ³Ό 같은 정보λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€:
83+
84+
| 0. λͺ¨λ“ˆ λ‹¨μœ„μ˜ collective μ—°μ‚° 개수 좜λ ₯
85+
| 1. μ€‘μš”ν•˜μ§€ μ•Šμ€ 연산을 μ œμ™Έν•œ DTensor μ—°μ‚° 및 λͺ¨λ“ˆ 샀딩(sharding) 정보 좜λ ₯
86+
| 2. μ€‘μš”ν•˜μ§€ μ•Šμ€ 연산을 μ œμ™Έν•œ ν…μ„œ λ‹¨μœ„ μ—°μ‚° 좜λ ₯
87+
| 3. λͺ¨λ“  μ—°μ‚° 좜λ ₯
8488
85-
In the example above, you can see that the collective operation, all_reduce, occurs once in the forward pass
86-
of the ``MLPModule``. Furthermore, you can use ``CommDebugMode`` to pinpoint that the all-reduce operation happens
87-
in the second linear layer of the ``MLPModule``.
89+
μœ„μ˜ μ˜ˆμ‹œμ—μ„œ λ³Ό 수 μžˆλ“―μ΄, collective 연산인 all_reduceλŠ” ``MLPModule`` 의 forward λ‹¨κ³„μ—μ„œ ν•œ 번 λ°œμƒν•©λ‹ˆλ‹€.
90+
λ˜ν•œ ``CommDebugMode`` λ₯Ό μ‚¬μš©ν•˜λ©΄ 이 all-reduce 연산이 ``MLPModule`` 의 두 번째 μ„ ν˜• 계측(linear layer)μ—μ„œ
91+
λ°œμƒν•œλ‹€λŠ” 점을 μ •ν™•νžˆ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
8892

8993

90-
Below is the interactive module tree visualization that you can use to upload your own JSON dump:
94+
μ•„λž˜λŠ” μƒμ„±λœ JSON νŒŒμΌμ„ μ—…λ‘œλ“œν•˜μ—¬ μ‹œκ°μ μœΌλ‘œ 탐색할 수 μžˆλŠ” μΈν„°λž™ν‹°λΈŒ λͺ¨λ“ˆ 트리 μ‹œκ°ν™”(interactive module tree visualization)μž…λ‹ˆλ‹€:
9195

9296
.. raw:: html
9397

@@ -198,13 +202,15 @@ Below is the interactive module tree visualization that you can use to upload yo
198202
</body>
199203
</html>
200204

201-
Conclusion
205+
206+
207+
κ²°λ‘ 
202208
------------------------------------------
203209

204-
In this recipe, we have learned how to use ``CommDebugMode`` to debug Distributed Tensors and
205-
parallelism solutions that uses communication collectives with PyTorch. You can use your own
206-
JSON outputs in the embedded visual browser.
210+
이 λ ˆμ‹œν”Όμ—μ„œλŠ” PyTorch의 ``CommDebugMode`` λ₯Ό μ‚¬μš©ν•˜μ—¬
211+
μ§‘ν•© 톡신(collective communication)을 ν¬ν•¨ν•˜λŠ” DistributedTensor 및 병렬화 μ†”λ£¨μ…˜μ„ λ””λ²„κΉ…ν•˜λŠ” 방법을 λ°°μ› μŠ΅λ‹ˆλ‹€.
212+
λ˜ν•œ μƒμ„±λœ JSON 좜λ ₯을 λ‚΄μž₯된 μ‹œκ°ν™” λΈŒλΌμš°μ €μ—μ„œ 직접 λΆˆλŸ¬μ™€ 확인할 μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
207213

208-
For more detailed information about ``CommDebugMode``, see
209-
`comm_mode_features_example.py
210-
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py>`_
214+
``CommDebugMode`` 에 λŒ€ν•œ 보닀 μžμ„Έν•œ λ‚΄μš©μ€
215+
`comm_mode_features_example.py
216+
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/examples/comm_mode_features_example.py>`_ λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.

0 commit comments

Comments
Β (0)