Skip to content

Conversation

@santhnm2
Copy link
Collaborator

No description provided.

@santhnm2 santhnm2 requested a review from okhat January 24, 2023 00:46
@okhat okhat merged commit 84665ff into main Jan 24, 2023
arnavsinghvi11 pushed a commit that referenced this pull request Mar 26, 2024
arnavsinghvi11 pushed a commit that referenced this pull request May 31, 2024
GFarnon added a commit to GFarnon/dspy that referenced this pull request Jul 13, 2024
chenmoneygithub added a commit that referenced this pull request May 14, 2025
* D1 for GRPO

* Improve type for arbor

* Add temp test script for grpo

* Add note about assumption of same inputs to all predictors

* Disable LM cache in GRPO

* Add support for valset

* Add configurable variable module invocation handling strategy

* Noahs dspy.LM changes and dspy.ArborProvider implementation

* Add latest arbor changes

* First working grpo version

* Add modules

* Add training args in initialize

* Fix grpo

* Add batches

* Update finetuning infra

* Revise server interface

* Update example script

* Move temporary interface to a separate file

* Add LM level reinforce interface

* Update testing script

* Update api_base access for finetune

* Style check

* Style check all

* Add Test script with MATH dataset

* Ensure grpo trainer does not crash due to format issues, but temporary fix

* Add error log

* Fix termination

* Delete temp files

* Add diff

* Add model update endpoint support

* Remove experimental flag

* Remove extra files

* Add GRPO error resiliency to avoid parsing failures lead to crashes

* Param Passthrough and Consistent Tutorial Script (#3)

* Add param passthrough and default banking77 tutorial

* Add more threads

* Update banking tutorial

---------

Co-authored-by: Noah Ziems <nziems2@nziems2@nd.edu>

* Lower beta param for banking tutorial

* Add warning on no training data

* Add train logging to GRPIO

* Add max_prompt_length and max_completion_length support

* fix litellm retries

* no jsonadapter

* fix errors

* fix tests

* fix tests

* add the retry strategy back

* Add working implementation of format errors and negative rewards

* Fix bugs in validation

* Add validation logic to grpo

* Add more supported args

* Support max grad norm

* Add Train Shuffling logic

* Add lora support

* Add soft format rewards

* Disable proivide_traceback in all grpo invoked evaluates

* Remove temporary tutorial script

* Revert classification finetuning tutorial

* Comment out json adapter test

* Fix ruff errors

* Add teacher (#8)

* Modify teacher preparation logic

* Re-add teachers to GRPO

* Style fix

* Update tutorial script

* Housekeeping

* Revert number of train steps

* Address PR comments

* Add wandb support for GRPO training runs

* Add completion logging

* Add logging steps support

* update report_to to be default none

* Add max_context_length

* Fix num_samples_per_input computation

* Checkpointing Endpoints (#10)

* Fix typo

* Fix checkpoint url

* fix merge conflict leftover

* shorten the warning message in json adapter

* fix the error piping

---------

Co-authored-by: Lakshya A Agrawal <lakshyaaagrawal@berkeley.edu>
Co-authored-by: Dilara Soylu <21346670+dilarasoylu@users.noreply.github.com>
Co-authored-by: Noah Ziems <nziems2@nziems2@nd.edu>
Co-authored-by: chenmoneygithub <chen.qian@databricks.com>
LakshyAAAgrawal pushed a commit to gepa-ai/dspy that referenced this pull request Aug 13, 2025
* Add param passthrough and default banking77 tutorial

* Add more threads

* Update banking tutorial

---------

Co-authored-by: Noah Ziems <nziems2@nziems2@nd.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants