Add configurable temperature parameter for RL rollout sampling #86

Xiuyu-Li · 2025-11-11T08:29:37Z

This PR adds temperature as a configurable argument in Config, allowing users to adjust the sampling temperature used for the RL training. It also updates the math_rl example (tinker_cookbook/recipes/math_rl/train.py) to demonstrate how to easily apply this setting within a training script.

Note: This change only affects the TinkerTokenCompleter class, which is the default completer used by most RL recipe scripts. The TinkerMessageCompleter class (in tinker_cookbook/completers.py) remains unchanged — it still uses a hard-coded temperature=1.0 in its SamplingParams, left as-is for now to avoid altering potential design-specific behavior. As a result, this update does not impact scripts that rely on TinkerMessageCompleter, such as text_arena, twenty_questions, and similar examples.

Tiiiger

LGTM! I'll click on merge once I can confirm the backend actually supports this correctly.

joschu

I'm not sure logprobs are calculated the right way when temperature != 1 -- for this to make sense, we'd need for the sampling engine to compute the logprob of the temperature-scaled model, not the original model.
Have you done experiments @Xiuyu-Li?
Marking as "Requesting Changes" until we've validated that this makes sense.

Add configurable temperature parameter for RL rollout sampling

5f8b59c

Tiiiger approved these changes Nov 11, 2025

View reviewed changes

joschu requested changes Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configurable temperature parameter for RL rollout sampling #86

Add configurable temperature parameter for RL rollout sampling #86

Uh oh!

Xiuyu-Li commented Nov 11, 2025

Uh oh!

Tiiiger left a comment

Uh oh!

joschu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add configurable temperature parameter for RL rollout sampling #86

Are you sure you want to change the base?

Add configurable temperature parameter for RL rollout sampling #86

Uh oh!

Conversation

Xiuyu-Li commented Nov 11, 2025

Uh oh!

Tiiiger left a comment

Choose a reason for hiding this comment

Uh oh!

joschu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants