[Draft] Fix free cache engine weight sync #298

listar2000 · 2025-11-09T00:54:14Z

This is a tentative fix to the issue #291: where the weight sync between rollout engine and trainer engine is completely missing (an obvious issue that verl also hasn't fixed yet) when config actor_rollout_ref.rollout.free_cache_engine=False.

Fortunately this seems fixable without patching verl as in rLLM we implement our own rollout trajectory generation logic. In fact, the rollout_engine.wake_up()/sleep() method should be run regardless of the free_cache_engine flag -- the flag will be checked again anyway during the wake_up/sleep call anyway. For instance, the release of kv-cache in the inference engine will do an extra check here.

This PR should not affect the usual case when free_cache_engine = True at all. Some experiments are on-going to see if there is any side-effect when free_cache_engine = False (so this PR is still under draft)

listar2000 and others added 6 commits October 24, 2025 16:05

fix agent workflow trainer and engine

fc52f8b

Merge branch 'rllm-org:nightly' into nightly

eec8446

add backward compatibility

3194183

Merge branch 'nightly' of github.com:listar2000/rllm into nightly

c95e7cd

Merge branch 'nightly' of https://github.com/rllm-org/rllm into nightly

5b75fc8

propose fix

84507ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Fix free cache engine weight sync #298

[Draft] Fix free cache engine weight sync #298

Uh oh!

listar2000 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Draft] Fix free cache engine weight sync #298

Are you sure you want to change the base?

[Draft] Fix free cache engine weight sync #298

Uh oh!

Conversation

listar2000 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant