Skip to content

ulab-uiuc/Multi-agent-evolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Teaser

βš™οΈ Algorithm Flow


Our approach builds a self-evolving system for enhancing LLMs' general reasoning capabilities through three collaborative roles:

  1. Proposer: Generates new reasoning questions wrapped in <question>...</question>. Each question is evaluated for quality, difficulty, and format. Only high-quality and learnable questions are kept for training.

  2. Solver: Answers the valid questions within <answer>...</answer>. Its performance helps measure task difficulty and provides feedback for both question generation and model improvement.

  3. Judge: Evaluates questions and answers, reasoning in <think>...</think> and producing numeric scores in <score>...</score>. These scores serve as rewards for Proposer and Solver, enabling stable reinforcement learning.

All three roles share one underlying model and are updated together using Task-Relative REINFORCE++. The system forms a continuous self-improving loop that strengthens reasoning without external supervision.

Workflow

πŸ“Š Results


Main Results

Model ID Avg OOD Avg Total Avg
w/o reference questions
Qwen2.5-3B-Instruct 63.34 41.32 55.33
AZR 67.09 41.33 57.72
MAE (zero) 68.37 42.48 58.51
w/ reference questions
SFT 63.28 37.41 53.87
MAE (with reference) 65.07 43.18 57.11
MAE (no reference) 67.51 41.86 58.18
MAE (half reference) 68.95 43.96 59.87

✨ Getting Started


πŸŽ„ Environment Setup

conda create -n mae python=3.10
conda activate mae
pip install -r requirements.txt
pip install -r flashattn_requirements.txt
python scripts/prepare_test_datasets.py 
python -m absolute_zero_reasoner.data_construction.process_code_reasoning_data

πŸ”— Prepare API Key(s)

If you plan to use NVIDIA's integrated LLM service (NIM) for evaluation, you can obtain free API key(s) by registering an account at https://build.nvidia.com/nim.

Steps to register and save your API key(s):

  1. Go to https://build.nvidia.com/nim and create an account (or sign in with your existing NVIDIA account).
  2. After signing in, navigate to the API_KEYS section and create a new API key. You may create multiple keys (probably through multiple acounts) if you want to distribute load.
  3. Copy the generated API key(s).
  4. In the root of this repository, create a file named api.json at the repository root (same directory as README.md) and store your keys in the following format:
{
  "api_keys": [
    "sk-xxxxxxx-your-first-key-xxxx",
    "sk-yyyyyyy-your-second-key-yyyy"
  ]
}

πŸ› οΈ Prompts

Specializing the prompt can make the model tend to produce questions in certain domain or give scores according to desired rules. Make sure that the prompts are in similar format as the default prompt we provide and put under absolute_zero_reasoner/data_construction/initial_prompt_templates.

πŸ‹οΈ Training


🌚 Resuming Runs

Three resume modes are supported: disable, auto and resume_path. disable allows you to train from scratch. auto resumes the run from the latest checkpoint inside resume_dir. resume_path allows you to resume from any checkpoint you want.

    trainer.resume_mode=auto \
    trainer.resume_dir=<path_to_your_run_directory>\ # resume_dir has to be appointed if resume_mode is not `disable`
    trainer.resume_from_path=<path_to_your_checkpoint>\ # resume_from_path can be set to any specific checkpoint you wish to resume training from

When resuming runs, you can also put the original run wandb id into the script, i.e., trainer.wandb_run_id=<run_id>.

β™ŸοΈ Multi-Agent Evolve Training

We use 8x80GB GPUs for 3B models, scripts can be modified to achieve the same overall accumulated batch size for reproduction.

bash scripts/selfplay/mae.sh
# To explore different settings on reference questions, modify `include_references` to 0 or 1 for no reference and with reference

Other models are also supported in Multi-Agent Evolve framework, you can start the training for your own model by modifying actor_rollout_ref.model.path in scripts.

πŸ€— Converting veRL checkpoints to HF format

python -m absolute_zero_reasoner.utils.convert2hf \
  <veRL_ckpt_path>/actor \
  <veRL_ckpt_path>/actor/huggingface/ \
  <hf_ckpt_path>

πŸ“ƒ Evaluation


General Benchmarks

The general benchmarks will be evaluated during the training process. For complete evaluation on general benchmarks, run the following scripts by setting the resume checkpoint.

bash scripts/evaluation/eval_ID.sh
bash scripts/evaluation/eval_OOD.sh
# If you wish to evaluate base model, just set resume_mode to `disable` in these scripts

Code Benchmarks

We use evalplus for code evaluation. A new conda env is needed for evalplus.

conda create -n evalplus python=3.11
pip install --upgrade "evalplus[vllm] @ git+https://github.com/evalplus/evalplus@d362e933265c3e7e3df8101c930a89c3c470cd9f"
Evaluation:
```bash
condda activate evalplus
bash evaluation/code_eval/scripts/run_evalplus.sh 0 <humaneval|mbpp> <hf_ckpt_path>

🎈 Citation


If you find Multi-Agent Evolve helpful, please cite us.

@misc{chen2025multiagentevolvellmselfimprove,
      title={Multi-Agent Evolve: LLM Self-Improve through Co-evolution}, 
      author={Yixing Chen and Yiding Wang and Siqi Zhu and Haofei Yu and Tao Feng and Muhan Zhan and Mostofa Patwary and Jiaxuan You},
      year={2025},
      eprint={2510.23595},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.23595}, 
}

🌻 Acknowledgement


This project is inspired by and partially adapted from the Absolute Zero Reasoner (AZR) project. We thank the AZR authors for their open-source contributions and ideas.

πŸ“§ Contact


Feel free to contact Yixing Chen and Yiding Wang via the following emails: polaris_dane@sjtu.edu.cn, yidingw@stu.pku.edu.cn

πŸ“ˆ Star History


Star History Chart

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •