Differences
This shows you the differences between two versions of the page.
— |
ensemble_reinforcement_learning [2017/09/08 19:30] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | https://arxiv.org/pdf/1704.00756.pdf Multi-Advisor Reinforcement Learning | ||
+ | https://arxiv.org/pdf/1709.00503v1.pdf Mean Actor Critic | ||
+ | |||
+ | We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action | ||
+ | continuous-state reinforcement learning. MAC is a policy gradient algorithm that | ||
+ | uses the agent’s explicit representation of all action values to estimate the gradient | ||
+ | of the policy, rather than using only the actions that were actually executed. This | ||
+ | significantly reduces variance in the gradient updates and removes the need for a | ||
+ | variance reduction baseline. We show empirical results on two control domains | ||
+ | where MAC performs as well as or better than other policy gradient approaches, | ||
+ | and on five Atari games, where MAC is competitive with state-of-the-art policy | ||
+ | search algorithms. |