Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications

When: Wednesday, March 2, 2022, 4pm (CET).

Where: Zoom online, check the address in the Google Calendar Event.

Topic: Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications.

Speaker: Dr. Alessandro Ronca, PostDoc at DIAG, Sapienza University of Rome.

Abstract

The talk is a presentation of the paper “Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications” by Mingyu Cai, Erfan Aasi, Calin Belta, Cristian-Ioan Vasile (https://arxiv.org/abs/2201.12231). The abstract of the paper is reported below.

“We present a Deep Reinforcement Learning (DRL) algorithm for a task-guided robot with unknown continuous-time dynamics deployed in a large-scale complex environment. Linear Temporal Logic (LTL) is applied to express a rich robotic specification. To overcome the environmental challenge, we propose a novel path planning-guided reward scheme that is dense over the state space, and crucially, robust to infeasibility of computed geometric paths due to the unknown robot dynamics. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-tasks that are solved using distributed DRL, where the sub-tasks are trained in parallel, using Deep Policy Gradient algorithms. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments.”

Short Bio

Alessandro Ronca holds a BSc and an MSc in Engineering in Computer Science from Sapienza University of Rome, and a DPhil in Computer Science from University of Oxford. He is currently a PostDoc at Sapienza. His research interests span logic, reasoning, and learning, with a focus on temporal logic and automata theory, using complexity analysis as a guiding principle. His contributions in the field of Knowledge Representation and Reasoning include a study of Temporal Datalog as a language for reasoning over streams of data. In Reinforcement Learning, he has contributed with polynomial bounds for the complexity of PAC-learning policies in Regular Decision Processes, a generalisation of MDPs that lifts the Markov assumption.