Skip to content

Conversation

@Xiuyu-Li
Copy link
Contributor

This PR adds temperature as a configurable argument in Config, allowing users to adjust the sampling temperature used for the RL training. It also updates the math_rl example (tinker_cookbook/recipes/math_rl/train.py) to demonstrate how to easily apply this setting within a training script.

Note: This change only affects the TinkerTokenCompleter class, which is the default completer used by most RL recipe scripts. The TinkerMessageCompleter class (in tinker_cookbook/completers.py) remains unchanged — it still uses a hard-coded temperature=1.0 in its SamplingParams, left as-is for now to avoid altering potential design-specific behavior. As a result, this update does not impact scripts that rely on TinkerMessageCompleter, such as text_arena, twenty_questions, and similar examples.

Copy link
Collaborator

@Tiiiger Tiiiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll click on merge once I can confirm the backend actually supports this correctly.

Copy link
Collaborator

@joschu joschu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure logprobs are calculated the right way when temperature != 1 -- for this to make sense, we'd need for the sampling engine to compute the logprob of the temperature-scaled model, not the original model.
Have you done experiments @Xiuyu-Li?
Marking as "Requesting Changes" until we've validated that this makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants