Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications
02 Mar 2022
When: Wednesday, March 2, 2022, 4pm (CET).
Where: Zoom online, check the address in the Google Calendar Event.
Topic: Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications.
Speaker: Dr. Alessandro Ronca, PostDoc at DIAG, Sapienza University of Rome.
Abstract
The talk is a presentation of the paper “Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications” by Mingyu Cai, Erfan Aasi, Calin Belta, Cristian-Ioan Vasile (https://arxiv.org/abs/2201.12231). The abstract of the paper is reported below.
“We present a Deep Reinforcement Learning (DRL) algorithm for a task-guided robot with unknown continuous-time dynamics deployed in a large-scale complex environment. Linear Temporal Logic (LTL) is applied to express a rich robotic specification. To overcome the environmental challenge, we propose a novel path planning-guided reward scheme that is dense over the state space, and crucially, robust to infeasibility of computed geometric paths due to the unknown robot dynamics. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-tasks that are solved using distributed DRL, where the sub-tasks are trained in parallel, using Deep Policy Gradient algorithms. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments.”
Short Bio
Alessandro Ronca holds a BSc and an MSc in Engineering in Computer Science from Sapienza University of Rome, and a DPhil in Computer Science from University of Oxford. He is currently a PostDoc at Sapienza. His research interests span logic, reasoning, and learning, with a focus on temporal logic and automata theory, using complexity analysis as a guiding principle. His contributions in the field of Knowledge Representation and Reasoning include a study of Temporal Datalog as a language for reasoning over streams of data. In Reinforcement Learning, he has contributed with polynomial bounds for the complexity of PAC-learning policies in Regular Decision Processes, a generalisation of MDPs that lifts the Markov assumption.
Towards Capturing PTIME with no Counting Construct (but with a version of Hilbert's Choice operator)
18 Nov 2021
When: Thursday, November 18, 2021, 3pm (CEST).
Where: Zoom online & Aula Magna at DIAG, check the address in the Google Calendar Event.
Topic: Towards Capturing PTIME with no Counting Construct (but with a version of Hilbert’s Choice operator).
Speaker: Prof. Eugenia Ternovska, Associate Professor at Simon Fraser University, Vancouver, Canada.
Abstract
The central open question in Descriptive Complexity is whether there is a logic that characterizes deterministic polynomial-time computations (PTIME) on relational structures. In this talk, I introduce my work on this question. I define an algebra of binary relations that is obtained from first-order logic with fixed points, FO(FP), by a series of transformations that include restricting logical connectives and adding a dynamic version of Hilbert’s Choice operator Epsilon. The formalism can also be viewed as a linear-time modal Dynamic logic, where algebraic expressions describing “proofs” or
“programs” appear inside the modalities. Many typical polynomial-time properties such as cardinality, reachability and those requiring “mixed” propagations (that include linear equations modulo two) can be specified in the logic, and an arbitrary PTIME Turing machine can be encoded. For each fixed Choice function, the data complexity of model checking is in PTIME. However, there can be exponentially many such functions. “Naive evaluations” refer to a version of this model checking procedure where the Choice function variable Epsilon is simply treated as a constant. A crucial question is under what syntactic conditions on the algebraic terms such a naive evaluation works, that is, provides a certain answer to the original model checking problem. These conditions, applied to the formalism, would give us a logic for PTIME. The two views of the formalism, as an algebra and as a Dynamic logic, support application of both automata-theoretic and algebraic techniques to the study of this question.
Short Bio
Eugenia Ternovska is an associate professor at Simon Fraser University, Vancouver, Canada. She obtained her PhD in 2002 in Artificial Intelligence under the supervision of Ray Reiter (1939-2002) at the University of Toronto, Canada. Her research is mainly about Logic in Computer Science, Descriptive Complexity, Theoretical Computer Science, Knowledge Representation, Efficient Reasoning, Reasoning about Actions, Modal Logics, Fixpoint logics.
Material
Examining Games and a Way to Repair Them
04 Nov 2021
When: Thursday, November 4, 2021, 3pm (CEST).
Where: Zoom online, check the address in the Google Calendar Event.
Topic: Examining Games and a Way to Repair Them.
Speaker: Muhammad Najib, postdoctoral researcher at TU Kaiserslautern, Germany.
Abstract
Classical notion of correctness in formal verification is not appropriate for multi-agent systems–it does not capture the strategic behaviour of rational agents. As such, a different concept of correctness was proposed in rational verification. In proving such a correctness, we turn multi-agent systems into multi-player games and use game-theoretic techniques to analyse the games. In our works, we study various kinds of games with various goals and settings, and provide an algorithmic techniques to solve problems related to rational verification. We also propose a method to “repair” games should we find them “broken”.
This talk is a summary of a series of works with Julian Gutierrez, Lewis Hammond, Anthony W. Lin, Giuseppe Perelli, and Mike Wooldridge.
Short Bio
Muhammad Najib is a postdoctoral researcher at TU Kaiserslautern, Germany, in Automated Reasoning Group with Anthony W. Lin. He did his DPhil/PhD in computer science at the University of Oxford supervised by Mike Wooldridge and Julian Gutierrez. His research is mainly about the use of formal techniques for verifying and reasoning about systems composed of multiple self-interested intelligent agents.
Material
Formal Synthesis of Supervisory Controllers under Linear Temporal Logic Constraints
21 Oct 2021
When: Thursday, October 21, 2021, 3pm (CEST).
Where: Zoom online, check the address in the Google Calendar Event.
Topic: Formal Synthesis of Supervisory Controllers under Linear Temporal Logic Constraints.
Speaker: Ami Sakakibara, Postdoctoral researcher at the Division of Decision and Control Systems, KTH Royal Institute of Technology, Sweden.
Abstract
We consider a supervisory control problem of a discrete event system (DES), which is discrete-state, event-driven dynamical system model, in the presence of uncontrollable events. We aim to design an on-line supervisory controller that enforces the DES to satisfy a complex mission described by an syntactically co-safe linear temporal logic (scLTL) formula. In the precomputation process, we first compute a ranking function on the product automaton of the DES and an acceptor for the specification. The ranking function returns the minimum number of steps required to reach an accepting state from each state and embeds the influence of logical uncertainties modeled by uncontrollable events. We also show how to extend the ranking-function-based approach for different settings such as LTL over finite traces and multi-agent systems. The effectiveness of the proposed method is demonstrated in an illustrative example of a surveillance problem.
Short Bio
Ami Sakakibara is a postdoctoral researcher at the Division of Decision and Control Systems, KTH Royal Institute of Technology, Sweden, collaborating with Prof. Dimos Dimarogonas. She completed her PhD at the Graduate School of Engineering Science, Osaka University, Japan, under the supervision of Prof. Toshimitsu Ushio. Her research interests lie in formal synthesis, supervisory control, game theory, and reinforcement learning.
Material
Automated Creation and Decomposition of Reward Machines for Multiagent Reinforcement Learning
24 Sep 2021
When: Friday, September 24, 2021, 3pm (CEST).
Where: Zoom online, check the address in the Google Calendar Event.
Topic: Automated Creation and Decomposition of Reward Machines for Multiagent Reinforcement Learning.
Speaker: Giovanni Varricchione, PhD Student at Utrecht University.
Abstract
Recent years saw the introduction of reward machines (RM) in reinforcement learning (RL). An RM is a Mealy machine used
to define the reward function of learning agents depending on its current state. Intuitively, a state of an RM is a
high-level representation of either the agent’s (or team’s if in a multi-agent setting) behavior so far or of the stage
of completion of a task. RMs can then be used to easily define rewards. In this talk, I will describe our novel approach
that uses RMs and planning to train teams of agents.
I will first present (Neary, Wu, Xu, Topcu in AAMAS 2021), in which a multi-agent setting is considered. To overcome
various issues of the setting, like non-stationarity, a team RM is decomposed into multiple ones, one per agent, that
are then used to train their corresponding agents. We improve on this work in two ways: (i) by automatically generating
the team and individual RMs, and (ii) by training agents using an option-based framework in conjunction with a strategy.
We generate high-level uniform strategies using MCMAS, a model checker for multi-agent systems. Given that the strategy
obtained is uniform, we can easily generate the agents’ RMs from it, and then use them to train their agents. Finally,
as strategies are specified in terms of high-level actions, we can train agents to execute them using options, a known
concept in the RL literature. Preliminary results show that the automatically generated RMs lead to similar performance
when compared to the hand-crafted ones of Neary et al., and that the main approach of Neary et al. is outperformed by
our option-based one.
This talk is based on a joint work with Natasha Alechina, Mehdi Dastani and Brian Logan.
Material
Goal Recognition and Deception of Rational and Irrational Agents
15 Jul 2021
When: Thursday, July 15, 2021, 10am (CEST).
Where: Zoom online, check the address in the Google Calendar Event.
Topic: Goal Recognition and Deception of Rational and Irrational Agents.
Speaker: Sebastian Sardinã, Professor at RMIT University.
Abstract
The plan/activity/intention/goal recognition problem is the task of identifying an agent’s intent by observing its
behaviour. Traditionally, the problem has involved matching a sequence of observations to a plan in a pre-defined plan
library;the winning plan being the one that “best” matches the observations. Recent developments dispense with the
overhead of a plan library and instead—based on the assumption that the observed agent is behaving rationally—take a
cost-based approach and use classical planning technology to generate candidate plans as needed over a domain model. In
this talk, we will review this cost-based approach to goal recognition and some recent results, both for general task
planning and path planning. We will present a preliminary framework of deception, and techniques to handle irrational
and deceptive agents when performing goal recognition.
This is a summary of a stream of work done together with Dr. Peta Masters (now at Melbourne University) and has appeared
at AAMAS’17, IJCAI’17, JAIR’18, AAMAS’19 and AIJ’21.
Material