RL 14
- 大模型强化学习算法推导
- sim2real
- 离线强化学习
- GEN-0 / Embodied Foundation Models That Scale with Physical Interaction
- 广义优势估计:High-dimensional Continuous Control Using Generalized Advantage Estimation
- 多智能体强化学习
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- π*₀.₆: a VLA That Learns From Experience
- Isaac Lab安装使用
- DeepMind强化学习综述
- 最大熵强化学习:SAC和Soft Q-learning
- 强化学习之策略梯度算法
- PPO算法:Proximal Policy Optimization
- 强化学习之蒙特卡洛方法