Skip to content

Feature/find ppo error#185

Open
lwaekfjlk wants to merge 31 commits into
mainfrom
feature/find-ppo-error
Open

Feature/find ppo error#185
lwaekfjlk wants to merge 31 commits into
mainfrom
feature/find-ppo-error

Conversation

@lwaekfjlk

Copy link
Copy Markdown
Member

Closes #

📑 Description

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed
  • Branch name follows type/descript (e.g. feature/add-llm-agents)
  • Ready for code review

ℹ Additional Information

@lwaekfjlk

Copy link
Copy Markdown
Member Author

WE USE AN OVER-TRAINED SFT MODEL!!!

@lwaekfjlk

Copy link
Copy Markdown
Member Author

with a pretrained value model + a small value coef at the second stage

gamma=1 and base model as the reference model, ppo epoch =2 make most settings work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants