7-21 November 2022
All the students interested in attending the course are invited to register a the following link:
https://forms.gle/Nua6Sef2xgmtsbGs8
To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09
We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
Date | Slot 1 (11:00 to 13:00 CET) | Slot 2 (14:00 to 16:00 CET) | ||
---|---|---|---|---|
Monday 7th November |
Markov Decision Processes, discounting, absorbing states, value iteration, policy iteration (Patrizi)
Material: Video Slides |
Markov Reinforcement Learning, exploration/exploitation, Q-learning, SARSA (Iocchi)
Material: Video Slides |
||
Friday 11th November |
Linear Temporal Logic on Finite Traces (LTLf) and automata (De Giacomo)
Material: Video Slides |
Non-Markov Reward Decision Process (NMRDPs), restraining bolts, reward machines (Patrizi)
Material: Video Slides (1) Slides (2) |
||
Monday 14th November |
Non-Markov Decision Processes (NMDPs), Regular Decision Processes (RDPs) and their relationship with Partially-Observable Markov Decision Processes (POMDPs) (Ronca and Cipollone)
Material: Video Slides (1) Slides (2) |
Automata learning and model learning (Paludo)
Material: Video Slides |
||
Friday 18th November |
Regular Decision Processes (RDPs), reinforcement learning in RDPs (Ronca)
Material: Video Slides |
Markov Deep Reinforcement Learning (Umili)
Material: Video Slides |
||
Monday 21st November |
Performance guarantees in non-Markov reinforcement learning (Ronca)
Material: Video Slides |
Non-Markov Deep Reinforcement Learning (Cipollone)
Material: Video Slides |
4-8 July 2022
All the students interested in attending the course are invited to register a the following link:
https://forms.gle/G6jj71kE92ZDqWU29
To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1
Lecture 1 - De Giacomo: Introduction to planning and synthesis Video - Slides, fixpoints-mucalc
Lecture 3 - De Giacomo: Planning on deterministic and nondeterministc domains Video - Slides
Lecture 5 - De Giacomo: Planning with LTLf goals Video - Slides
Lecture 7 - Di Stasio: Synthesis for LTLf goals under LTL specifications Video - Slides
Lecture 8 - Zhu: Solution for notable cases of LTLf goals under LTL specifications Video - Slides
This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.
Planning: Introduction to planning and synthesis. Fixpoint calculations. Deterministic and nondeterministc domains. Planning with temporally extended goals. LTLf: linear-time temporal logic over finite traces. From LTLf o finite automata. Deterministic and nondeterministc domains with LTLf goals.
Linear Temporal Logic: Logic based language for the representation of programs. Model checking with LTL. The Synthesis problem with LTL goals. Automata-theoretic approach to Model Checking and Synthesis.
Games: Introduction to 2-player (turn-based) games on graphs and their relatioship with Planning and Synthesis. Solutions to simple objective games: Reachability, Safety, Safe-Reach, Buchi Games. Parity games: a more sophisticated game objective. Solution to parity games and their relationship with LTL synthesis.
Synthesis under environment specifications: The environment represented as an LTL specification. The synthesis under environment specifications solved as synthesis of an implication formula. Direct and more effective solutions to the synthesis under environment specifications for notable cases of LTLf formulas: Safety, Safety & Fairness, Safety & Stability, Safety & GR(1). Symbolic representation and techniques for Planning and Synthesis.
Slot 1 | Slot 2 | |
---|---|---|
Monday 4th July | Introduction to planning and synthesis, transition systems, and fixpoints (De Giacomo) | Linear Temporal Logic -- LTL (Perelli) |
Tuesday 5th July | Planning on deterministic and nondeterministic domains (De Giacomo) | LTL Model Checking and Synthesis (Perelli) |
Wednesday 6th July | Planning with LTLf goals (De Giacomo) | Games on Graphs (Perelli) |
Thursday 7th July | Synthesis for LTLf goals under LTL specifications (Di Stasio) | Solutions for notable cases of LTLf goals under LTL specifications (Zhu) |
Friday 8th July | Symbolic techniques (Zhu) | Parity Games: definitions and solutions (Di Stasio) |