π*₀.₆: a VLA That Learns From Experience
It is amazing what you can learn if you're not afraid to try.
It is amazing what you can learn if you're not afraid to try.
PPO GRPO DAPO GSPO SAPO GDPO
SimpleVLA-RL 1. 研究背景 Simple VLA-RL: scaling VLA Training via Reinforcement Learning: https://github.com/PRIME-RL/SimpleVLA-RL 问题:针对机器人领域VLA模型主要采用大规模预训练和有监督微调(supervised fine-tune)的方案,存在两个问题...
guide line 热力学扩散模型 DDPM DDIM Flow Matching B站:扩散模型基础 1. 扩散模型基础 扩散模型:模拟热力学扩散过程来构建的图片生成式神经网络模型。 高维空间中图片的表示 高纬向量空间中图片的表示:每一张图片对应的就是高维向量空间中的一个点。 郎之万动力学方程Langevin Equation 郎之万动力学方程(Langev...
Domain randomization for transferring deep neural networks from simulation to the real world https://readpaper.com/pdf-annotate/note?pdfId=4557557081012445185¬eId=3173811748193745664
参考文档 https://www.doubao.com/thread/w1b0ce4572028377f BCQ IQL TD3+BC 1. 离线强化学习 2. CQL: Conservative Q-Learning for Offline Reinforcement Learning https://readpaper.com/pdf-annotat...
参考文档 #GEN-0 / Embodied Foundation Models That Scale with Physical Interaction
针对策略梯度优势估计的问题「高方差variance、高偏差bias」提出统一框架GAE来平滑地处理variance和bias之间的平衡。
参考文档 # Transformer-based Multi-Agent Reinforcement Learning for Generalization of Heterogeneous Multi-Robot Cooperation
A framework that abstracts RL as a sequence modeling problem.