@@ -117,6 +117,76 @@ try to limit the cases where a deepcopy will be executed. The following chart sh
117117
118118 Policy copy decision tree in Collectors.
119119
120+ Weight Synchronization in Distributed Environments
121+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122+ In distributed and multiprocessed environments, ensuring that all instances of a policy are synchronized with the
123+ latest trained weights is crucial for consistent performance. The API introduces a flexible and extensible
124+ mechanism for updating policy weights across different devices and processes, accommodating various deployment scenarios.
125+
126+ Local and Remote Weight Updaters
127+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128+
129+ The weight synchronization process is facilitated by two main components: :class: `~torchrl.collectors.LocalWeightUpdaterBase `
130+ and :class: `~torchrl.collectors.RemoteWeightUpdaterBase `. These base classes provide a structured interface for
131+ implementing custom weight update logic, allowing users to tailor the synchronization process to their specific needs.
132+
133+ - :class: `~torchrl.collectors.LocalWeightUpdaterBase `: This component is responsible for updating the policy weights on
134+ the local inference worker. It is particularly useful when the training and inference occur on the same machine but on
135+ different devices. Users can extend this class to define how weights are fetched from a server and applied locally.
136+ It is also the extension point for collectors where the workers need to ask for weight updates (in contrast with
137+ situations where the server decides when to update the worker policies).
138+ - :class: `~torchrl.collectors.RemoteWeightUpdaterBase `: This component handles the distribution of policy weights to
139+ remote inference workers. It is essential in distributed systems where multiple workers need to be kept in sync with
140+ the central policy. Users can extend this class to implement custom logic for synchronizing weights across a network of
141+ devices or processes.
142+
143+ Extending the Updater Classes
144+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145+
146+ To accommodate diverse use cases, the API allows users to extend the updater classes with custom implementations.
147+ This flexibility is particularly beneficial in scenarios involving complex network architectures or specialized hardware
148+ setups. By implementing the abstract methods in these base classes, users can define how weights are retrieved,
149+ transformed, and applied, ensuring seamless integration with their existing infrastructure.
150+
151+ Default Implementations
152+ ~~~~~~~~~~~~~~~~~~~~~~~
153+
154+ For common scenarios, the API provides default implementations of these updaters, such as
155+ :class: `~torchrl.collectors.VanillaLocalWeightUpdater `, :class: `~torchrl.collectors.MultiProcessedRemoteWeightUpdate `,
156+ :class: `~torchrl.collectors.RayRemoteWeightUpdater `, :class: `~torchrl.collectors.RPCRemoteWeightUpdater `, and
157+ :class: `~torchrl.collectors.DistributedRemoteWeightUpdater `.
158+ These implementations cover a range of typical deployment configurations, from single-device setups to large-scale
159+ distributed systems.
160+
161+ Practical Considerations
162+ ~~~~~~~~~~~~~~~~~~~~~~~~
163+
164+ When designing a system that leverages this API, consider the following:
165+
166+ - Network Latency: In distributed environments, network latency can impact the speed of weight updates. Ensure that your
167+ implementation accounts for potential delays and optimizes data transfer where possible.
168+ - Consistency: Ensure that all workers receive the updated weights in a timely manner to maintain consistency across
169+ the system. This is particularly important in reinforcement learning scenarios where stale weights can lead to
170+ suboptimal policy performance.
171+ - Scalability: As your system grows, the weight synchronization mechanism should scale efficiently. Consider the
172+ overhead of broadcasting weights to a large number of workers and optimize the process to minimize bottlenecks.
173+
174+ By leveraging the API, users can achieve robust and efficient weight synchronization across a variety of deployment
175+ scenarios, ensuring that their policies remain up-to-date and performant.
176+
177+ .. currentmodule :: torchrl.collectors
178+
179+ .. autosummary ::
180+ :toctree: generated/
181+ :template: rl_template.rst
182+
183+ LocalWeightUpdaterBase
184+ RemoteWeightUpdaterBase
185+ VanillaLocalWeightUpdater
186+ MultiProcessedRemoteWeightUpdate
187+ RayRemoteWeightUpdater
188+ DistributedRemoteWeightUpdater
189+ RPCRemoteWeightUpdater
120190
121191Collectors and replay buffers interoperability
122192----------------------------------------------
0 commit comments