Guided Flow Policy (GFP) is an offline RL method based on flow matching. It couples a multi-step flow-matching policy trained with value-aware behavior cloning and a distilled one-step actor through a bidirectional guidance mechanism. This enables GFP to achieve state-of-the-art performance across 144 state and pixel-based tasks from the OGBench, Minari, and D4RL benchmarks, with substantial gains on suboptimal datasets and challenging tasks.
- 🟢 2025-12-03 - Release of the paper on ArXiv
- 🔴 Code, coming soon
- 🔴 Detailed blog post, coming soon
