Non-Markov Decision Processes and Reinforcement Learning

7-21 November 2022


Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/Nua6Sef2xgmtsbGs8


Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09


Abstract

We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.


Calendar and course content

Date Slot 1 (11:00 to 13:00 CET) Slot 2 (14:00 to 16:00 CET)
Monday 7th November Markov Decision Processes, discounting, absorbing states, value iteration, policy iteration (Patrizi)
Material: Video Slides
Markov Reinforcement Learning, exploration/exploitation, Q-learning, SARSA (Iocchi)
Material: Video Slides
Friday 11th November Linear Temporal Logic on Finite Traces (LTLf) and automata (De Giacomo)
Material: Video Slides
Non-Markov Reward Decision Process (NMRDPs), restraining bolts, reward machines (Patrizi)
Material: Video Slides (1) Slides (2)
Monday 14th November Non-Markov Decision Processes (NMDPs), Regular Decision Processes (RDPs) and their relationship with Partially-Observable Markov Decision Processes (POMDPs) (Ronca and Cipollone)
Material: Video Slides (1) Slides (2)
Automata learning and model learning (Paludo)
Material: Video Slides
Friday 18th November Regular Decision Processes (RDPs), reinforcement learning in RDPs (Ronca)
Material: Video Slides
Markov Deep Reinforcement Learning (Umili)
Material: Video Slides
Monday 21st November Performance guarantees in non-Markov reinforcement learning (Ronca)
Material: Video Slides
Non-Markov Deep Reinforcement Learning (Cipollone)
Material: Video Slides




Game-Theoretic Approach to Planning and Synthesis

4-8 July 2022

  • Instructors: Giuseppe De Giacomo, Antonio Di Stasio, Giuseppe Perelli, Shufang Zhu (Sapienza University of Rome)
  • Host Institutions: Sapienza University & ICT-48 TAILOR
  • Level: Postgraduate.
  • Course duration: 20 hours.
  • Course Type: Lecture series.
  • Registration: Free but required (Please register here: https://forms.gle/G6jj71kE92ZDqWU29)
  • Class delivery modality: Blended (Online and In Presence).
  • In presence location: Room A5, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
  • Online Location: Zoom (link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1)
  • Schedule: From Monday 4th July to Friday 8th July from 14:00 to 18:00 CEST (Central European Summer Time)
  • Language: English
  • Notes: We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/G6jj71kE92ZDqWU29


Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1


Lecture recordings and slides

  • Lecture 1 - De Giacomo: Introduction to planning and synthesis Video - Slides, fixpoints-mucalc

  • Lecture 2 - Perelli: Linear Temporal Logic Video - Slides

  • Lecture 3 - De Giacomo: Planning on deterministic and nondeterministc domains Video - Slides

  • Lecture 4 - Perelli: LTL Model Checking Video - Slides

  • Lecture 5 - De Giacomo: Planning with LTLf goals Video - Slides

  • Lecture 6 - Perelli: Games on graphs Video - Slides

  • Lecture 7 - Di Stasio: Synthesis for LTLf goals under LTL specifications Video - Slides

  • Lecture 8 - Zhu: Solution for notable cases of LTLf goals under LTL specifications Video - Slides

  • Lecture 9 - Zhu: Symbolic techniques Video - Slides

  • Lecture 10 - Di Stasio: Parity Games Video - Slides


Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Course content

Planning: Introduction to planning and synthesis. Fixpoint calculations. Deterministic and nondeterministc domains. Planning with temporally extended goals. LTLf: linear-time temporal logic over finite traces. From LTLf o finite automata. Deterministic and nondeterministc domains with LTLf goals.

Linear Temporal Logic: Logic based language for the representation of programs. Model checking with LTL. The Synthesis problem with LTL goals. Automata-theoretic approach to Model Checking and Synthesis.

Games: Introduction to 2-player (turn-based) games on graphs and their relatioship with Planning and Synthesis. Solutions to simple objective games: Reachability, Safety, Safe-Reach, Buchi Games. Parity games: a more sophisticated game objective. Solution to parity games and their relationship with LTL synthesis.

Synthesis under environment specifications: The environment represented as an LTL specification. The synthesis under environment specifications solved as synthesis of an implication formula. Direct and more effective solutions to the synthesis under environment specifications for notable cases of LTLf formulas: Safety, Safety & Fairness, Safety & Stability, Safety & GR(1). Symbolic representation and techniques for Planning and Synthesis.


Calendar

Slot 1 Slot 2
Monday 4th July Introduction to planning and synthesis, transition systems, and fixpoints (De Giacomo) Linear Temporal Logic -- LTL (Perelli)
Tuesday 5th July Planning on deterministic and nondeterministic domains (De Giacomo) LTL Model Checking and Synthesis (Perelli)
Wednesday 6th July Planning with LTLf goals (De Giacomo) Games on Graphs (Perelli)
Thursday 7th July Synthesis for LTLf goals under LTL specifications (Di Stasio) Solutions for notable cases of LTLf goals under LTL specifications (Zhu)
Friday 8th July Symbolic techniques (Zhu) Parity Games: definitions and solutions (Di Stasio)




WhiteMech Logo AIDA Logo TAILOR Logo WhiteMech Logo