1- .. currentmodule :: torchrl.trainers
1+ .. currentmodule :: torchrl
22
33LLM interface
44=============
@@ -7,13 +7,125 @@ LLM interface
77
88TorchRL offers a set of tools for LLM post-training, as well as some examples for training or setup.
99
10+ Collectors
11+ ----------
12+
13+ TorchRL offers a specialized collector class (:class: `~torchrl.collectors.llm.LLMCollector `) that is tailored for LLM
14+ use cases. We also provide dedicated updaters for some inference engines.
15+
16+ .. currentmodule :: torchrl.collectors.llm
17+
18+ .. autosummary ::
19+ :toctree: generated/
20+ :template: rl_template.rst
21+
22+ vLLMUpdater
23+ LLMCollector
24+
25+
1026Data structures
1127---------------
1228
29+ To handle text-based data structures (such as conversations etc.), we offer a few data structures dedicated to carrying
30+ data for LLM post-training.
31+
1332.. currentmodule :: torchrl.data.llm
1433
1534.. autosummary ::
1635 :toctree: generated/
1736 :template: rl_template.rst
1837
1938 History
39+ LLMData
40+
41+ Environments
42+ ------------
43+
44+ When fine-tuning an LLM using TorchRL, the environment is a crucial component of the inference pipeline, alongside the
45+ policy and collector. Environments manage operations that are not handled by the LLM itself, such as interacting with
46+ tools, loading prompts from datasets, computing rewards (when necessary), and formatting data.
47+
48+ The design of environments in TorchRL allows for flexibility and modularity. By framing tasks as environments, users can
49+ easily extend or modify existing environments using transforms. This approach enables the isolation of individual
50+ components within specific :class: `~torchrl.envs.EnvBase ` or :class: `~torchrl.envs.Transform ` subclasses, making it
51+ simpler to augment or alter the environment logic.
52+
53+ Available Environment Classes and Utilities
54+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55+
56+ TorchRL provides various environment classes and utilities for working with LLMs, including:
57+
58+ - Various environment classes (:class: `~torchrl.envs.llm.ChatEnv `, :class: `~torchrl.envs.llm.DatasetChatEnv `,
59+ :class: `~torchrl.envs.llm.GSM8KEnv `, etc.)
60+ - Utility functions (:class: `~torchrl.envs.make_gsm8k_env `, :class: `~torchrl.envs.make_mlgym `, etc.)
61+ - Transforms and other supporting classes (:class: `~torchrl.envs.KLRewardTransform `,
62+ :class: `~torchrl.envs.TemplateTransform `, :class: `~torchrl.envs.Tokenizer `, etc.)
63+
64+ These components can be used to create customized environments tailored to specific use cases and requirements.
65+
66+ .. currentmodule :: torchrl.envs.llm
67+
68+ .. autosummary ::
69+ :toctree: generated/
70+ :template: rl_template.rst
71+
72+ ChatEnv
73+ DatasetChatEnv
74+ GSM8KEnv
75+ make_gsm8k_env
76+ GSM8KPrepareQuestion
77+ GSM8KEnv
78+ IFEvalEnv
79+ IfEvalScorer
80+ IFEvalScoreData
81+ LLMEnv
82+ LLMHashingEnv
83+ make_mlgym
84+ MLGymWrapper
85+ GSM8KRewardParser
86+ IfEvalScorer
87+ as_nested_tensor
88+ as_padded_tensor
89+ DataLoadingPrimer
90+ KLRewardTransform
91+ TemplateTransform
92+ Tokenizer
93+
94+ Modules
95+ -------
96+
97+ The :ref: `~torchrl.modules.llm ` section provides a set of wrappers and utility functions for popular training and
98+ inference backends. The main goal of these primitives is to:
99+
100+ - Unify the input / output data format across training and inference pipelines;
101+ - Unify the input / output data format across backends (to be able to use different backends across losses and
102+ collectors, for instance)
103+ - Give appropriate tooling to construct these objects in typical RL settings (resource allocation, async execution,
104+ weight update, etc.)
105+
106+ Wrappers
107+ ~~~~~~~~
108+
109+ .. currentmodule :: torchrl.modules.llm
110+
111+ .. autosummary ::
112+ :toctree: generated/
113+ :template: rl_template.rst
114+
115+ TransformersWrapper
116+ vLLMWrapper
117+
118+ Utils
119+ ~~~~~
120+
121+ .. currentmodule :: torchrl.modules.llm
122+
123+ .. autosummary ::
124+ :toctree: generated/
125+ :template: rl_template.rst
126+
127+ CategoricalSequential
128+ LLMOnDevice
129+ make_vllm_worker
130+ stateless_init_process_group
131+ vLLMWorker
0 commit comments