Post

大模型强化学习算法推导

  • PPO
  • GRPO
  • DAPO
  • GSPO
  • SAPO
  • GDPO
This post is licensed under CC BY 4.0 by the author.