Skip to content

Commit 37cb31f

Browse files
authored
fix typo in advanced_source/rpc_ddp_tutorial.rst
1 parent e9ab351 commit 37cb31f

File tree

1 file changed

+25
-25
lines changed

1 file changed

+25
-25
lines changed

โ€Žadvanced_source/rpc_ddp_tutorial.rstโ€Ž

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,23 @@
1111
๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(distributed model parallelism)๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ ํ•™์Šต์‹œํ‚ฌ ๋•Œ
1212
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DistributedDataParallel) <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)๊ณผ
1313
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ(Distributed RPC framework) <https://pytorch.org/docs/master/rpc.html>`__ ๋ฅผ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
14-
์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
14+
์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__ ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1515

1616
์ด์ „ ํŠœํ† ๋ฆฌ์–ผ ๋‚ด์šฉ์ด์—ˆ๋˜
17-
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__์™€
18-
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__๋Š”
17+
`๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__ ์™€
18+
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/rpc_tutorial.html>`__ ๋Š”
1919
๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ๋ฐ ๋ถ„์‚ฐ ๋ชจ๋ธ ๋ณ‘๋ ฌ ํ•™์Šต์„ ๊ฐ๊ฐ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
2020
๊ทธ๋Ÿฌ๋‚˜ ์ด ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์„ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด:
2121

2222
1) ํฌ์†Œ ๋ถ€๋ถ„(ํฐ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”)๊ณผ ๋ฐ€์ง‘ ๋ถ€๋ถ„(FC ๋ ˆ์ด์–ด)์ด ์žˆ๋Š” ๋ชจ๋ธ์ด ์žˆ๋Š” ๊ฒฝ์šฐ,
23-
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„(parameter server)์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(embedding table)์„ ๋†“๊ณ  `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
23+
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„(parameter server)์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(embedding table)์„ ๋†“๊ณ  `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ
2424
์—ฌ๋Ÿฌ ํŠธ๋ ˆ์ด๋„ˆ์— ๊ฑธ์ณ FC ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์„ ์›ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
25-
์ด๋•Œ `๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__๋Š”
25+
์ด๋•Œ `๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__ ๋Š”
2626
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2727
2) ๋‹ค์Œ์€ `PipeDream <https://arxiv.org/abs/1806.03377>`__ ๋ฌธ์„œ์—์„œ ์„ค๋ช…๋œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์ž…๋‹ˆ๋‹ค.
2828
`๋ถ„์‚ฐ RPC ํ”„๋ ˆ์ž„์›Œํฌ <https://pytorch.org/docs/master/rpc.html>`__ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ
2929
์—ฌ๋Ÿฌ worker์— ๊ฑธ์ณ ๋ชจ๋ธ์˜ ๋‹จ๊ณ„๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ(pipeline)ํ•  ์ˆ˜ ์žˆ๊ณ 
30-
(ํ•„์š”์— ๋”ฐ๋ผ) `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„ ์ด์šฉํ•ด์„œ
30+
(ํ•„์š”์— ๋”ฐ๋ผ) `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ ์„ ์ด์šฉํ•ด์„œ
3131
๊ฐ ๋‹จ๊ณ„๋ฅผ ๋ณต์ œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3232

3333
|
@@ -38,23 +38,23 @@
3838
1) 1๊ฐœ์˜ ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(nn.EmbeddingBag) ์ƒ์„ฑ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
3939
๋˜ํ•œ ๋งˆ์Šคํ„ฐ๋Š” ๋‘ ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
4040
2) 1๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋ณด์œ ํ•˜๊ณ  ๋งˆ์Šคํ„ฐ ๋ฐ ํŠธ๋ ˆ์ด๋„ˆ์˜ RPC์— ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค.
41-
3) 2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ๋Š” `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__์„
41+
3) 2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ๋Š” `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ ์„
4242
์‚ฌ์šฉํ•˜์—ฌ ์ž์ฒด์ ์œผ๋กœ ๋ณต์ œ๋˜๋Š” FC ๋ ˆ์ด์–ด(nn.Linear)๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
4343
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋˜ํ•œ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ(forward pass), ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ(backward pass) ๋ฐ ์ตœ์ ํ™” ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
4444

4545
|
4646
์ „์ฒด์ ์ธ ํ•™์Šต๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค:
4747

4848
1) ๋งˆ์Šคํ„ฐ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์„ ๋‹ด๊ณ  ์žˆ๋Š”
49-
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
49+
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__ ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
5050
2) ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ํŠธ๋ ˆ์ด๋„ˆ์˜ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์›๊ฒฉ ๋ชจ๋“ˆ(remote module)์„ ํŠธ๋ ˆ์ด๋„ˆ์—๊ฒŒ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
5151
3) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งˆ์Šคํ„ฐ์—์„œ ์ œ๊ณตํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
52-
์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•œ ๋‹ค์Œ DDP ๋‚ด๋ถ€์— ๊ฐ์‹ธ์ง„ FC ๋ ˆ์ด์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
52+
์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•œ ๋‹ค์Œ DDP ๋‚ด๋ถ€์— ๊ฐ์‹ธ์ง„ FC ๋ ˆ์ด์–ด๋ฅผ ์‹คํ–‰ํ•˜๋Š” ``HybridModel`` ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
5353
4) ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๊ณ  ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ `๋ถ„์‚ฐ Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__ ๋ฅผ
5454
์‚ฌ์šฉํ•˜์—ฌ ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
5555
5) ์—ญ๋ฐฉํ–ฅ ์ „๋‹ฌ์˜ ์ผ๋ถ€๋กœ FC ๋ ˆ์ด์–ด์˜ ๋ณ€ํ™”๋„๊ฐ€ ๋จผ์ € ๊ณ„์‚ฐ๋˜๊ณ  DDP์˜ allreduce๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋™๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
5656
6) ๋‹ค์Œ์œผ๋กœ, ๋ถ„์‚ฐ Autograd๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„๋กœ ๋ณ€ํ™”๋„๋ฅผ ์ „ํŒŒํ•˜๊ณ  ๊ทธ๊ณณ์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ณ€ํ™”๋„๊ฐ€ ์—…๋ฐ์ดํŠธ๋ฉ๋‹ˆ๋‹ค.
57-
7) ๋งˆ์ง€๋ง‰์œผ๋กœ, `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
57+
7) ๋งˆ์ง€๋ง‰์œผ๋กœ, `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__ ๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
5858

5959
.. warning::
6060

@@ -68,18 +68,18 @@
6868

6969
TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 4๊ฐœ์˜ ๋ชจ๋“  worker์—์„œ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
7070
RPC ์ดˆ๊ธฐํ™”๊ฐ€ ๋๋‚˜๋ฉด, ๋งˆ์Šคํ„ฐ๋Š” `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__ ๋ ˆ์ด์–ด๋ฅผ
71-
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
71+
`์›๊ฒฉ ๋ชจ๋“ˆ(RemoteModule) <https://pytorch.org/docs/master/rpc.html#remotemodule>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ
7272
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์— ๋‹ด๊ณ  ์žˆ๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ ํ•˜๋‚˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
7373
๊ทธ๋Ÿฐ ๋‹ค์Œ ๋งˆ์Šคํ„ฐ๋Š” ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ๋ฅผ ๋ฐ˜๋ณตํ•˜๊ณ  `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__ ๋ฅผ
74-
์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ``_run_trainer``๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ฐ˜๋ณต ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
74+
์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ``_run_trainer`` ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ฐ˜๋ณต ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
7575
๋งˆ์ง€๋ง‰์œผ๋กœ ๋งˆ์Šคํ„ฐ๋Š” ์ข…๋ฃŒํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ํ•™์Šต์ด ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
7676

77-
ํŠธ๋ ˆ์ด๋„ˆ๋Š” `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__์„ ์‚ฌ์šฉํ•˜์—ฌ
78-
(2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ) world_size=2๋กœ DDP๋ฅผ ์œ„ํ•ด ``ProcessGroup``์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
77+
ํŠธ๋ ˆ์ด๋„ˆ๋Š” `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__ ์„ ์‚ฌ์šฉํ•˜์—ฌ
78+
(2๊ฐœ์˜ ํŠธ๋ ˆ์ด๋„ˆ) world_size=2๋กœ DDP๋ฅผ ์œ„ํ•ด ``ProcessGroup`` ์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
7979
๋‹ค์Œ์œผ๋กœ TCP init_method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
8080
์—ฌ๊ธฐ์„œ ์ฃผ์˜ ํ•  ์ ์€ RPC ์ดˆ๊ธฐํ™”์™€ ProgressGroup ์ดˆ๊ธฐํ™”์—์„œ ์“ฐ์ด๋Š” ํฌํŠธ(port)๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
8181
์ด๋Š” ๋‘ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ดˆ๊ธฐํ™” ๊ฐ„์— ํฌํŠธ ์ถฉ๋Œ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.
82-
์ดˆ๊ธฐํ™”๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋งˆ์Šคํ„ฐ์˜ ``_run_trainer` RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
82+
์ดˆ๊ธฐํ™”๊ฐ€ ์™„๋ฃŒ๋˜๋ฉด ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋งˆ์Šคํ„ฐ์˜ ``_run_trainer`` RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
8383

8484
ํŒŒ๋ผํ”ผํ„ฐ ์„œ๋ฒ„๋Š” RPC ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋„ˆ์™€ ๋งˆ์Šคํ„ฐ์˜ RPC๋ฅผ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.
8585

@@ -89,14 +89,14 @@ RPC ์ดˆ๊ธฐํ™”๊ฐ€ ๋๋‚˜๋ฉด, ๋งˆ์Šคํ„ฐ๋Š” `EmbeddingBag <https://pytorch.org/docs
8989
:start-after: BEGIN run_worker
9090
:end-before: END run_worker
9191

92-
ํŠธ๋ ˆ์ด๋„ˆ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์— ์•ž์„œ, ํŠธ๋ ˆ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ``HybridModel``์— ๋Œ€ํ•ด ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
93-
์•„๋ž˜์— ์„ค๋ช…๋œ ๋Œ€๋กœ ``HybridModel``์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(``remote_emb_module``)๊ณผ DDP์— ์‚ฌ์šฉํ•  ``device``๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
92+
ํŠธ๋ ˆ์ด๋„ˆ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์— ์•ž์„œ, ํŠธ๋ ˆ์ด๋„ˆ๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ``HybridModel`` ์— ๋Œ€ํ•ด ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
93+
์•„๋ž˜์— ์„ค๋ช…๋œ ๋Œ€๋กœ ``HybridModel`` ์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”(``remote_emb_module``)๊ณผ DDP์— ์‚ฌ์šฉํ•  ``device`` ๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
9494
๋ชจ๋ธ ์ดˆ๊ธฐํ™”๋Š” DDP ๋‚ด๋ถ€์˜ `nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__ ๋ ˆ์ด์–ด๋ฅผ
9595
๊ฐ์‹ธ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์ด ๋ ˆ์ด์–ด๋ฅผ ๋ณต์ œํ•˜๊ณ  ๋™๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
9696

9797

9898
๋ชจ๋ธ์˜ ์ˆœ๋ฐฉํ–ฅ(forward) ํ•จ์ˆ˜๋Š” ๊ฝค ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค.
99-
RemoteModule์˜ ``forward``๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ์ถœ๋ ฅ์„ FC ๋ ˆ์ด์–ด์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
99+
RemoteModule์˜ ``forward`` ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐพ๊ธฐ ์ž‘์—…(embedding lookup)์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ์ถœ๋ ฅ์„ FC ๋ ˆ์ด์–ด์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
100100

101101

102102
.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
@@ -106,18 +106,18 @@ RemoteModule์˜ ``forward``๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ
106106

107107
๋‹ค์Œ์œผ๋กœ ํŠธ๋ ˆ์ด๋„ˆ์˜ ์„ค์ •์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
108108
ํŠธ๋ ˆ์ด๋„ˆ๋Š” ๋จผ์ € ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์˜ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”๊ณผ ์ž์ฒด ์ˆœ์œ„๋ฅผ ๋ณด์œ ํ•˜๋Š” ์›๊ฒฉ ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ
109-
์œ„์—์„œ ์„ค๋ช…ํ•œ ``HybridModel``์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
109+
์œ„์—์„œ ์„ค๋ช…ํ•œ ``HybridModel`` ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
110110

111-
์ด์ œ `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__๋กœ
111+
์ด์ œ `๋ถ„์‚ฐ ์˜ตํ‹ฐ๋งˆ์ด์ €(DistributedOptimizer) <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__ ๋กœ
112112
์ตœ์ ํ™”ํ•˜๋ ค๋Š” ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
113113
๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด
114114
RemoteModule์˜ `remote_parameters <https://pytorch.org/docs/master/rpc.html#torch.distributed.nn.api.remote_module.RemoteModule.remote_parameters>`__ ๋ฅผ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
115115
๊ทธ๋ฆฌ๊ณ  ์ด๊ฒƒ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž„๋ฒ ๋”ฉ ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ดํŽด๋ณด๊ณ  RRef ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
116116
ํŠธ๋ ˆ์ด๋„ˆ๋Š” RPC๋ฅผ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ์„œ๋ฒ„์—์„œ ์ด ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์›ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ์ˆ˜์‹ ํ•ฉ๋‹ˆ๋‹ค.
117117
DistributedOptimizer๋Š” ํ•ญ์ƒ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ๋•Œ๋ฌธ์— FC ๋ ˆ์ด์–ด์˜ ์ „์—ญ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ๋„ RRef๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
118-
์ด๊ฒƒ์€ ``model.fc.parameters()``๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
119-
``remote_parameters()``์—์„œ ๋ฐ˜ํ™˜๋œ ๋ชฉ๋ก์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
120-
์ฐธ๊ณ ๋กœ ``model.parameters()``๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ``RemoteModule``์—์„œ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ``model.remote_emb_module.parameters()``๋ฅผ ์žฌ๊ท€์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
118+
์ด๊ฒƒ์€ ``model.fc.parameters()`` ๋ฅผ ํƒ์ƒ‰ํ•˜๊ณ  ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRef๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
119+
``remote_parameters()`` ์—์„œ ๋ฐ˜ํ™˜๋œ ๋ชฉ๋ก์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
120+
์ฐธ๊ณ ๋กœ ``model.parameters()`` ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ``RemoteModule`` ์—์„œ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ``model.remote_emb_module.parameters()`` ๋ฅผ ์žฌ๊ท€์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
121121

122122
๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  RRef๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ DistributedOptimizer๋ฅผ ๋งŒ๋“ค๊ณ  CrossEntropyLoss ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
123123

@@ -127,7 +127,7 @@ DistributedOptimizer๋Š” ํ•ญ์ƒ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRe
127127
:end-before: END setup_trainer
128128

129129
์ด์ œ ๊ฐ ํŠธ๋ ˆ์ด๋„ˆ์—์„œ ์‹คํ–‰๋˜๋Š” ๊ธฐ๋ณธ ํ•™์Šต ๋ฃจํ”„๋ฅผ ์†Œ๊ฐœํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
130-
``get_next_batch``๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ž„์˜์˜ ์ž…๋ ฅ๊ณผ ๋Œ€์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๋Š” ํ•จ์ˆ˜์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.
130+
``get_next_batch`` ๋Š” ํ•™์Šต์„ ์œ„ํ•œ ์ž„์˜์˜ ์ž…๋ ฅ๊ณผ ๋Œ€์ƒ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋„์™€์ฃผ๋Š” ํ•จ์ˆ˜์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.
131131
์—ฌ๋Ÿฌ ์—ํญ(epoch)๊ณผ ๊ฐ ๋ฐฐ์น˜(batch)์— ๋Œ€ํ•ด ํ•™์Šต ๋ฃจํ”„๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
132132

133133
1) ๋จผ์ € ๋ถ„์‚ฐ Autograd์— ๋Œ€ํ•ด
@@ -143,4 +143,4 @@ DistributedOptimizer๋Š” ํ•ญ์ƒ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ RRe
143143
:end-before: END run_trainer
144144
.. code:: python
145145
146-
์ „์ฒด ์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
146+
์ „์ฒด ์˜ˆ์ œ์˜ ์†Œ์Šค ์ฝ”๋“œ๋Š” `์—ฌ๊ธฐ <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__ ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

0 commit comments

Comments
ย (0)