24 – 28 July 2023
Please refer to the ESSAI Summer school website here.
This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
April 13 – May 8 2023
All the students interested in attending the course are invited to register at the link here.
Lecture 1 – 4: Video
Lecture 5 – 6 Video
Lecture 7 – 10 Video
Lecture 11 – 14 Video
Lecture 15 – 18 Video
Lecture 19 – 20 Video
This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and in particular its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in Nondeterministic Domains, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
Introduction. Motivations and aims of the course
Reasoning about Actions. SitCalc, Reiter’s Basic Action Theories, Reasoning by regression in SitCalc, Finite Transitions Systems induced by Propositional Basic Action Theories.
Planning. Planning in deterministic domains. Transition System induced by a deterministic planning domain. Planning as search. Planning as fixpoint computation over a transition system. Planning in nondeterministic domains. Transition System, or game arena, induced by a nondeterministic planning domain. Planning as fixpoint computation over a game arena.
Transition system and fixpoints. Reactive/interactive programs (finite state), transition systems, reachability, bisimulation. Fixpoints, Knaster-Tarski theorem on least and greatest fixpoint, approximates for least and greatest fixpoint.
Games on Graphs. Transition systems as game arenas. Reachability, Safety, Buchi Objectives.
Linear Temporal Logics. Linear Temporal Logic on infinite traces from Formal Methods. Linear Temporal Logics on finite traces. LTLf to Automata. Reasoning in LTLf, verification in LTLf.
Program synthesis. Automated program synthesis, synthesis from specifications in Linear Temporal Logic on finite traces, game-theoretic view, arena and objective, adversarial reachability. Synthesis under environment specifications. Planning for LTLf goals in nondeterministic domains as synthesis under environment specifications.
Reasoning and learning. Merging Reasoning on Actions and Reinforcement Learning. RL for non-Morkovin rewards expressed in LTLf. Restraining Bolts and Sheilds for RL.
Jan 15 2023 – March 11 2023
This course studies the foundations of how to build autonomous agents that have the ability to self-programming their behavior to accomplish desired tasks in their environment. This is related to Reasoning about Actions and Planning in AI and to Program Synthesis in Formal Methods but focuses on tasks (goals) expressed over finite traces. Specifically, borrowing from Formal Methods the course will consider tasks and environment specifications expressed in LTL, as well as its finite trace variant LTLf. It will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, it will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: planning in nondeterministic domains, temporal logics, LTL, LTLf, game-theoretic techniques, safety games, reachability games, games for LTLf objectives, LTLf reactive synthesis. Also, connections with decision processes and reinforcement learning will be considered.
Part 1: Exploiting models of the environment to reason on consequences of actions (3 lectures) In this part of the course, we study classical Reasoning about Actions developed in Knowledge Representation. The notion of acting in an environment, how classically the environment is specified, how actions and their effects are modeled, how to reason to understand if a certain course of action can indeed be executed and what effects it will bring about.
Part 2: Planning courses of action in the environment (3 lectures + 1 class) In this part of the course, we study advanced forms of planning in AI, in particular, planning in so-called “nondeterministic domains”. We show how languages for expressing planning domains are based on the insight gained by studying Reasoning about Actions. We consider what kind of solution concepts are needed when planning in nondeterministic domains, and how these are related to strategic reasoning in 2-player games.
Part 3: LTL and LTLf as a specification language (2 lectures) In this part of the course, we review Linear Time Logic on infinite and finite traces used in Formal Methods for specifying dynamic systems. We intend to use it as a specification language for specifying complex tasks as well as environment properties. We review Safety Properties, Guarantee Properties, and forms of Fairness and Stability, including GR(1). We also review connections to automata and their use in model checking.
Part 4: LTL and LTLf Reactive Synthesis (4 lectures + 1 class) In this part of the course, we introduce synthesis from LTL and LTLf specifications. The idea is that, given a logical specification, we can automatically synthesize a program that is guaranteed to fulfill the specification. We focus specifically on “reactive synthesis”, i.e., synthesize programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. We consider mainly synthesis for LTLf specifications, but we will also study synthesis for some restricted forms of LTL specifications, such as safety properties, and limited forms of guarantee, reactivity, fairness, including GR(1), and stability.
Part 5: LTL and LTLf Reactive Synthesis under Environment Specifications (4 lectures + 1 class) In this part of the course, we introduce Reactive Synthesis under Environment Specifications. That is, we take advantage of the knowledge about the environment’s behavior in achieving the agent tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviors, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also discuss how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy to do so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on Safety specifications and limited forms of guarantee, reactivity, fairness, and stability.
Part 6: LTL and LTLf Reactive Synthesis Techniques (2 lectures + 1 class) In this part of the course, we further examine synthesis algorithms based on 2-player game-theoretic techniques. Specifically, we consider both symbolic algorithms and algorithms that explore the search space on-the-fly while synthesizing the program.
Part 7: Learning environment behaviors and performing tasks (2 lectures + 1 class) In this last part of the course, we make connections with Decision Processes and Reinforcement Learning. The point is that in some cases the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the kind of properties we are often interested in (the ones that we specify in LTL or LTLf ) are non-Markovian, and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.
7-21 November 2022
All the students interested in attending the course are invited to register a the following link:
https://forms.gle/Nua6Sef2xgmtsbGs8
To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09
We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
Date | Slot 1 (11:00 to 13:00 CET) | Slot 2 (14:00 to 16:00 CET) | ||
---|---|---|---|---|
Monday 7th November |
Markov Decision Processes, discounting, absorbing states, value iteration, policy iteration (Patrizi)
Material: Video Slides |
Markov Reinforcement Learning, exploration/exploitation, Q-learning, SARSA (Iocchi)
Material: Video Slides |
||
Friday 11th November |
Linear Temporal Logic on Finite Traces (LTLf) and automata (De Giacomo)
Material: Video Slides |
Non-Markov Reward Decision Process (NMRDPs), restraining bolts, reward machines (Patrizi)
Material: Video Slides (1) Slides (2) |
||
Monday 14th November |
Non-Markov Decision Processes (NMDPs), Regular Decision Processes (RDPs) and their relationship with Partially-Observable Markov Decision Processes (POMDPs) (Ronca and Cipollone)
Material: Video Slides (1) Slides (2) |
Automata learning and model learning (Paludo)
Material: Video Slides |
||
Friday 18th November |
Regular Decision Processes (RDPs), reinforcement learning in RDPs (Ronca)
Material: Video Slides |
Markov Deep Reinforcement Learning (Umili)
Material: Video Slides |
||
Monday 21st November |
Performance guarantees in non-Markov reinforcement learning (Ronca)
Material: Video Slides |
Non-Markov Deep Reinforcement Learning (Cipollone)
Material: Video Slides |
4-8 July 2022
All the students interested in attending the course are invited to register a the following link:
https://forms.gle/G6jj71kE92ZDqWU29
To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1
Lecture 1 - De Giacomo: Introduction to planning and synthesis Video - Slides, fixpoints-mucalc
Lecture 3 - De Giacomo: Planning on deterministic and nondeterministc domains Video - Slides
Lecture 5 - De Giacomo: Planning with LTLf goals Video - Slides
Lecture 7 - Di Stasio: Synthesis for LTLf goals under LTL specifications Video - Slides
Lecture 8 - Zhu: Solution for notable cases of LTLf goals under LTL specifications Video - Slides
This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
Planning: Introduction to planning and synthesis. Fixpoint calculations. Deterministic and nondeterministc domains. Planning with temporally extended goals. LTLf: linear-time temporal logic over finite traces. From LTLf o finite automata. Deterministic and nondeterministc domains with LTLf goals.
Linear Temporal Logic: Logic based language for the representation of programs. Model checking with LTL. The Synthesis problem with LTL goals. Automata-theoretic approach to Model Checking and Synthesis.
Games: Introduction to 2-player (turn-based) games on graphs and their relatioship with Planning and Synthesis. Solutions to simple objective games: Reachability, Safety, Safe-Reach, Buchi Games. Parity games: a more sophisticated game objective. Solution to parity games and their relationship with LTL synthesis.
Synthesis under environment specifications: The environment represented as an LTL specification. The synthesis under environment specifications solved as synthesis of an implication formula. Direct and more effective solutions to the synthesis under environment specifications for notable cases of LTLf formulas: Safety, Safety & Fairness, Safety & Stability, Safety & GR(1). Symbolic representation and techniques for Planning and Synthesis.
Slot 1 | Slot 2 | |
---|---|---|
Monday 4th July | Introduction to planning and synthesis, transition systems, and fixpoints (De Giacomo) | Linear Temporal Logic -- LTL (Perelli) |
Tuesday 5th July | Planning on deterministic and nondeterministic domains (De Giacomo) | LTL Model Checking and Synthesis (Perelli) |
Wednesday 6th July | Planning with LTLf goals (De Giacomo) | Games on Graphs (Perelli) |
Thursday 7th July | Synthesis for LTLf goals under LTL specifications (Di Stasio) | Solutions for notable cases of LTLf goals under LTL specifications (Zhu) |
Friday 8th July | Symbolic techniques (Zhu) | Parity Games: definitions and solutions (Di Stasio) |