DeepMind强化学习综述
DeepMind强化学习综述
1. Introduction
折扣因子
「强化学习交互过程存在终止风险」:
折扣因子$\gamma$的理解:
它本质是对 “交互能否持续” 的信念量化——交互持续到下一步的概率为$\gamma$,而非单纯的 “未来收益打折”(这是其表象)
In general, the discount factor reflects the assumption that there is a probability of 1 − γ that the interaction will end at the next step。
状态和状态价值
This post is licensed under CC BY 4.0 by the author.