Personal tools

Robot'2017 - David Simões


Jump to: navigation, search

Date 2017/11/23
Title Mixed-Policy Asynchronous Deep Q-Learning
Speaker David Simões
Event Robot'2017
Location Sevilla
Country Spain

Abstract: There are many open issues and challenges in the reinforcement learning field, such as handling high-dimensional environments. Function approximators, such as deep neural networks, have been successfully used in both single- and multi-agent environments with high dimensional state-spaces. The multi-agent learning paradigm faces even more problems, due to the effect of several agents learning simultaneously in the environment. One of its main concerns is how to learn mixed policies that prevent opponents from exploring them in competitive environments, achieving a Nash equilibrium. We propose an extension of several algorithms able to achieve Nash equilibriums in ingle-state games to the deep-learning paradigm. We compare their deep-learning and tablebased implementations, and demonstrate how WPL is able to achieve an equilibrium strategy in a complex environment, where agents must find each other in an infinite-state game and play a modified version of the Rock Paper Scissors game.