Game-theoretic Approach to Planning and Synthesis (ESSAI Summer School)

24 – 28 July 2023


Registration

Please refer to the ESSAI Summer school website here.

Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.




Game-theoretic Approach to Planning and Synthesis 2023

April 13 – May 8 2023

  • Instructors: Giuseppe De Giacomo
  • Host Institutions: Sapienza University & ICT-48 TAILOR
  • Level: Graduate & Postgraduate.
  • Course duration: 20 hours.
  • Course Type: Lecture series.
  • Registration: Free but required (Please register here)
  • Class delivery modality: Blended (Online and In Presence).
  • In presence location: Rooms A2 and A4, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
  • Schedule: Mondays from 13:00 to 15:00 CEST (Room A4) and Thursdays from 13:00 to 17:00 CEST (Room A2)
  • Language: English
  • Website: https://sites.google.com/diag.uniroma1.it/game-based-planning-synthesis

Registration

All the students interested in attending the course are invited to register at the link here.


Lecture recordings and slides


Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and in particular its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in Nondeterministic Domains, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Course content

Introduction. Motivations and aims of the course

Reasoning about Actions. SitCalc, Reiter’s Basic Action Theories, Reasoning by regression in SitCalc, Finite Transitions Systems induced by Propositional Basic Action Theories.

Planning. Planning in deterministic domains. Transition System induced by a deterministic planning domain. Planning as search. Planning as fixpoint computation over a transition system. Planning in nondeterministic domains. Transition System, or game arena, induced by a nondeterministic planning domain. Planning as fixpoint computation over a game arena.

Transition system and fixpoints. Reactive/interactive programs (finite state), transition systems, reachability, bisimulation. Fixpoints, Knaster-Tarski theorem on least and greatest fixpoint, approximates for least and greatest fixpoint.

Games on Graphs. Transition systems as game arenas. Reachability, Safety, Buchi Objectives.

Linear Temporal Logics. Linear Temporal Logic on infinite traces from Formal Methods. Linear Temporal Logics on finite traces. LTLf to Automata. Reasoning in LTLf, verification in LTLf.

Program synthesis. Automated program synthesis, synthesis from specifications in Linear Temporal Logic on finite traces, game-theoretic view, arena and objective, adversarial reachability. Synthesis under environment specifications. Planning for LTLf goals in nondeterministic domains as synthesis under environment specifications.

Reasoning and learning. Merging Reasoning on Actions and Reinforcement Learning. RL for non-Morkovin rewards expressed in LTLf. Restraining Bolts and Sheilds for RL.




Foundations of Self-Programming Agents: 2022-2023

Jan 15 2023 – March 11 2023

  • Instructors: Giuseppe De Giacomo, Antonio Di Stasio
  • Host Institutions: Oxford University, UK
  • Level: Graduate.
  • Course duration: 20 lectures.
  • Course Type: Lecture series.
  • Class delivery modality: In Presence.
  • In presence location: Department of Computer Science, Oxford University, UK
  • Language: English

Abstract

This course studies the foundations of how to build autonomous agents that have the ability to self-programming their behavior to accomplish desired tasks in their environment. This is related to Reasoning about Actions and Planning in AI and to Program Synthesis in Formal Methods but focuses on tasks (goals) expressed over finite traces. Specifically, borrowing from Formal Methods the course will consider tasks and environment specifications expressed in LTL, as well as its finite trace variant LTLf. It will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, it will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: planning in nondeterministic domains, temporal logics, LTL, LTLf, game-theoretic techniques, safety games, reachability games, games for LTLf objectives, LTLf reactive synthesis. Also, connections with decision processes and reinforcement learning will be considered.

Course content

Part 1: Exploiting models of the environment to reason on consequences of actions (3 lectures) In this part of the course, we study classical Reasoning about Actions developed in Knowledge Representation. The notion of acting in an environment, how classically the environment is specified, how actions and their effects are modeled, how to reason to understand if a certain course of action can indeed be executed and what effects it will bring about.

Part 2: Planning courses of action in the environment (3 lectures + 1 class) In this part of the course, we study advanced forms of planning in AI, in particular, planning in so-called “nondeterministic domains”. We show how languages for expressing planning domains are based on the insight gained by studying Reasoning about Actions. We consider what kind of solution concepts are needed when planning in nondeterministic domains, and how these are related to strategic reasoning in 2-player games.

Part 3: LTL and LTLf as a specification language (2 lectures) In this part of the course, we review Linear Time Logic on infinite and finite traces used in Formal Methods for specifying dynamic systems. We intend to use it as a specification language for specifying complex tasks as well as environment properties. We review Safety Properties, Guarantee Properties, and forms of Fairness and Stability, including GR(1). We also review connections to automata and their use in model checking.

Part 4: LTL and LTLf Reactive Synthesis (4 lectures + 1 class) In this part of the course, we introduce synthesis from LTL and LTLf specifications. The idea is that, given a logical specification, we can automatically synthesize a program that is guaranteed to fulfill the specification. We focus specifically on “reactive synthesis”, i.e., synthesize programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. We consider mainly synthesis for LTLf specifications, but we will also study synthesis for some restricted forms of LTL specifications, such as safety properties, and limited forms of guarantee, reactivity, fairness, including GR(1), and stability.

Part 5: LTL and LTLf Reactive Synthesis under Environment Specifications (4 lectures + 1 class) In this part of the course, we introduce Reactive Synthesis under Environment Specifications. That is, we take advantage of the knowledge about the environment’s behavior in achieving the agent tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviors, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also discuss how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy to do so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on Safety specifications and limited forms of guarantee, reactivity, fairness, and stability.

Part 6: LTL and LTLf Reactive Synthesis Techniques (2 lectures + 1 class) In this part of the course, we further examine synthesis algorithms based on 2-player game-theoretic techniques. Specifically, we consider both symbolic algorithms and algorithms that explore the search space on-the-fly while synthesizing the program.

Part 7: Learning environment behaviors and performing tasks (2 lectures + 1 class) In this last part of the course, we make connections with Decision Processes and Reinforcement Learning. The point is that in some cases the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the kind of properties we are often interested in (the ones that we specify in LTL or LTLf ) are non-Markovian, and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.




Non-Markov Decision Processes and Reinforcement Learning

7-21 November 2022


Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/Nua6Sef2xgmtsbGs8


Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09


Abstract

We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.


Calendar and course content

Date Slot 1 (11:00 to 13:00 CET) Slot 2 (14:00 to 16:00 CET)
Monday 7th November Markov Decision Processes, discounting, absorbing states, value iteration, policy iteration (Patrizi)
Material: Video Slides
Markov Reinforcement Learning, exploration/exploitation, Q-learning, SARSA (Iocchi)
Material: Video Slides
Friday 11th November Linear Temporal Logic on Finite Traces (LTLf) and automata (De Giacomo)
Material: Video Slides
Non-Markov Reward Decision Process (NMRDPs), restraining bolts, reward machines (Patrizi)
Material: Video Slides (1) Slides (2)
Monday 14th November Non-Markov Decision Processes (NMDPs), Regular Decision Processes (RDPs) and their relationship with Partially-Observable Markov Decision Processes (POMDPs) (Ronca and Cipollone)
Material: Video Slides (1) Slides (2)
Automata learning and model learning (Paludo)
Material: Video Slides
Friday 18th November Regular Decision Processes (RDPs), reinforcement learning in RDPs (Ronca)
Material: Video Slides
Markov Deep Reinforcement Learning (Umili)
Material: Video Slides
Monday 21st November Performance guarantees in non-Markov reinforcement learning (Ronca)
Material: Video Slides
Non-Markov Deep Reinforcement Learning (Cipollone)
Material: Video Slides




Game-Theoretic Approach to Planning and Synthesis 2022

4-8 July 2022

  • Instructors: Giuseppe De Giacomo, Antonio Di Stasio, Giuseppe Perelli, Shufang Zhu (Sapienza University of Rome)
  • Host Institutions: Sapienza University & ICT-48 TAILOR
  • Level: Postgraduate.
  • Course duration: 20 hours.
  • Course Type: Lecture series.
  • Registration: Free but required (Please register here: https://forms.gle/G6jj71kE92ZDqWU29)
  • Class delivery modality: Blended (Online and In Presence).
  • In presence location: Room A5, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
  • Online Location: Zoom (link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1)
  • Schedule: From Monday 4th July to Friday 8th July from 14:00 to 18:00 CEST (Central European Summer Time)
  • Language: English
  • Notes: We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/G6jj71kE92ZDqWU29


Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1


Lecture recordings and slides

  • Lecture 1 - De Giacomo: Introduction to planning and synthesis Video - Slides, fixpoints-mucalc

  • Lecture 2 - Perelli: Linear Temporal Logic Video - Slides

  • Lecture 3 - De Giacomo: Planning on deterministic and nondeterministc domains Video - Slides

  • Lecture 4 - Perelli: LTL Model Checking Video - Slides

  • Lecture 5 - De Giacomo: Planning with LTLf goals Video - Slides

  • Lecture 6 - Perelli: Games on graphs Video - Slides

  • Lecture 7 - Di Stasio: Synthesis for LTLf goals under LTL specifications Video - Slides

  • Lecture 8 - Zhu: Solution for notable cases of LTLf goals under LTL specifications Video - Slides

  • Lecture 9 - Zhu: Symbolic techniques Video - Slides

  • Lecture 10 - Di Stasio: Parity Games Video - Slides


Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Course content

Planning: Introduction to planning and synthesis. Fixpoint calculations. Deterministic and nondeterministc domains. Planning with temporally extended goals. LTLf: linear-time temporal logic over finite traces. From LTLf o finite automata. Deterministic and nondeterministc domains with LTLf goals.

Linear Temporal Logic: Logic based language for the representation of programs. Model checking with LTL. The Synthesis problem with LTL goals. Automata-theoretic approach to Model Checking and Synthesis.

Games: Introduction to 2-player (turn-based) games on graphs and their relatioship with Planning and Synthesis. Solutions to simple objective games: Reachability, Safety, Safe-Reach, Buchi Games. Parity games: a more sophisticated game objective. Solution to parity games and their relationship with LTL synthesis.

Synthesis under environment specifications: The environment represented as an LTL specification. The synthesis under environment specifications solved as synthesis of an implication formula. Direct and more effective solutions to the synthesis under environment specifications for notable cases of LTLf formulas: Safety, Safety & Fairness, Safety & Stability, Safety & GR(1). Symbolic representation and techniques for Planning and Synthesis.


Calendar

Slot 1 Slot 2
Monday 4th July Introduction to planning and synthesis, transition systems, and fixpoints (De Giacomo) Linear Temporal Logic -- LTL (Perelli)
Tuesday 5th July Planning on deterministic and nondeterministic domains (De Giacomo) LTL Model Checking and Synthesis (Perelli)
Wednesday 6th July Planning with LTLf goals (De Giacomo) Games on Graphs (Perelli)
Thursday 7th July Synthesis for LTLf goals under LTL specifications (Di Stasio) Solutions for notable cases of LTLf goals under LTL specifications (Zhu)
Friday 8th July Symbolic techniques (Zhu) Parity Games: definitions and solutions (Di Stasio)




WhiteMech Logo AIDA Logo TAILOR Logo Sapienza Logo