DeepMind强化学习综述

1. Introduction 折扣因子「强化学习交互过程存在终止风险」: 折扣因子$\gamma$的理解：它本质是对 “交互能否持续” 的信念量化——交互持续到下一步的概率为$\gamma$，而非单纯的 “未来收益打折”（这是其表象） In general, the discount factor reflects the assumption that there i...

Nov 20, 2025 AI

最大熵强化学习：SAC和Soft Q-learning

介绍最大熵强化学习基础理论和两个主要算法soft actor-critic和soft Q-learning

Sep 27, 2025 AI

强化学习之策略梯度算法

Policy Gradient Method

Sep 7, 2025 AI

PPO算法：Proximal Policy Optimization

一、引言 PPO的特点: PPO 采用了代理目标函数（surrogate object function），一个样本可以经历多个 epoch 的小批量更新，而常规的 Policy Gradient 算法每个数据样本只进行一次梯度更新。它具备 TRPO 的优点，同时更简单，样本复杂度也更优。其他算法的不足: Q-learning：在许多简单问题上表现不佳，且其原...

Sep 1, 2025 AI

强化学习之蒙特卡洛方法

Get started with Monte Carlo Method in Reinforcement Learning You will learn the basics

Jun 9, 2025 AI

Customize the Favicon

The favicons of Chirpy are placed in the directory assets/img/favicons/. You may want to replace them with your own. The following sections will guide you to create and replace the default favicons...

Aug 10, 2019 Blogging

Getting Started

Get started with Chirpy basics in this comprehensive overview. You will learn how to install, configure, and use your first Chirpy-based website, as well as deploy it to a web server.

Aug 9, 2019 Blogging

Writing a New Post

This tutorial will guide you how to write a post in the Chirpy template, and it’s worth reading even if you’ve used Jekyll before, as many features require specific variables to be set. Naming and...

Aug 8, 2019 Blogging

Text and Typography

Examples of text, typography, math equations, diagrams, flowcharts, pictures, videos, and more.