Hybrid Neurosymbolic Techniques for Non-Markovian Reinforcement Learning

4 – 14 March 2025

Instructors: Elena Umili (Sapienza University, Italy), Fabio Patrizi (Sapienza University, Italy), Roberto Capobianco (Sony AI, Sapienza University, Italy), Giuseppe De Giacomo, (Oxford University, UK), Luca Iocchi (Sapienza University, Italy)
Guest Instructors: Gabriel Paludo Licks (Sapienza University, Italy), Emanuele Macronato (Università di Pisa, Univeristà degli Studi di Trento, Italy), Gregory Hyde (Dartmouth College, US)
Lecture modality: Hybrid (all the lectures will be streamed online)
Prerequisites: No prerequisite is required for this course.
Evaluation: PhD students are required to develop a small-scale project based on the course topics and discuss the outcomes (whether positive or negative). Students may propose their own project ideas inspired by their research domains or choose from topics assigned by the instructors. Projects can be developed individually or in groups.
Google Classroom: https://classroom.google.com/c/NzQ0NTQyMDI3MDc4?cjc=nwb3n2e
Website: https://sites.google.com/diag.uniroma1.it/nesy-for-non-markovian-rl/home
Contact email: Elena Umili

Abstract

In this course, we will explore both classic logic-based techniques and more recent neural network (NN)-based methods, examining how they can be combined to enhance the capabilities of intelligent agents. Our focus will be on applying these approaches to non-Markovian Reinforcement Learning (RL). In this context, Linear Temporal Logic (LTL) has proven to be a powerful tool for specifying tasks and environments that do not conform to the Markovian property, addressing the expressive limitations of Markov Decision Processes (MDPs). However, solving such environments remains challenging, particularly when observations are non-symbolic—such as images, continuous sensor readings, etc—or when there is incomplete or missing prior knowledge about the task. In this course, we will review recent advancements in the field, with a special emphasis on works that integrate logic and neural networks. Topics covered will include Deep RL, temporal RL, Restraining Bolts and Reward Machines, deep learning for sequential data, automata learning, Neurosymbolic AI, Neurosymbolic Reward Machines, Symbol Grounding, transfer learning across LTL tasks, LTL and natural language, integrating LTL knowledge into generative NNs, and more.

Game-theoretic Approach to Planning and Synthesis (ESSAI Summer School)

24 – 28 July 2023

Instructors: Antonio Di Stasio (Oxford University, UK), Giuseppe Perelli (Sapienza University, Italy), Shufang Zhu (Oxford University, UK)
Host Institutions: ESSAI Summer School, Ljubljana, Slovenia
Level: Graduate.
Course Type: Lecture series.
Registration: See the ESSAI website.
Class delivery modality: In Presence.
Language: English
Website: https://essai.si/game-theoretic-approach-to-planning-and-synthesis/

Registration

Please refer to the ESSAI Summer school website here.

Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-players games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in nondeterministic domain, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Game-theoretic Approach to Planning and Synthesis 2023

April 13 – May 8 2023

Instructors: Giuseppe De Giacomo
Host Institutions: Sapienza University & ICT-48 TAILOR
Level: Graduate & Postgraduate.
Course duration: 20 hours.
Course Type: Lecture series.
Registration: Free but required (Please register here)
Class delivery modality: Blended (Online and In Presence).
In presence location: Rooms A2 and A4, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
Schedule: Mondays from 13:00 to 15:00 CEST (Room A4) and Thursdays from 13:00 to 17:00 CEST (Room A2)
Language: English
Website: https://sites.google.com/diag.uniroma1.it/game-based-planning-synthesis

Registration

All the students interested in attending the course are invited to register at the link here.

Lecture recordings and slides

Lecture 1 – 4: Video
Lecture 5 – 6 Video
Lecture 7 – 10 Video
Lecture 11 – 14 Video
Lecture 15 – 18 Video
Lecture 19 – 20 Video

Abstract

This course introduces AI planning and program synthesis for tasks (goals) expressed over finite traces instead of states. Specifically, borrowing from Formal Methods, we will consider tasks and environment specifications expressed in LTL and in particular its finite trace variant LTLf. We will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, we will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: Planning in Nondeterministic Domains, Temporal Logics, LTL, LTLf, Game-theoretic Techniques, Safety Games, Reachability Games, Games for LTL/LTLf objectives, and Reactive Synthesis. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Course content

Introduction. Motivations and aims of the course

Reasoning about Actions. SitCalc, Reiter’s Basic Action Theories, Reasoning by regression in SitCalc, Finite Transitions Systems induced by Propositional Basic Action Theories.

Planning. Planning in deterministic domains. Transition System induced by a deterministic planning domain. Planning as search. Planning as fixpoint computation over a transition system. Planning in nondeterministic domains. Transition System, or game arena, induced by a nondeterministic planning domain. Planning as fixpoint computation over a game arena.

Transition system and fixpoints. Reactive/interactive programs (finite state), transition systems, reachability, bisimulation. Fixpoints, Knaster-Tarski theorem on least and greatest fixpoint, approximates for least and greatest fixpoint.

Games on Graphs. Transition systems as game arenas. Reachability, Safety, Buchi Objectives.

Linear Temporal Logics. Linear Temporal Logic on infinite traces from Formal Methods. Linear Temporal Logics on finite traces. LTLf to Automata. Reasoning in LTLf, verification in LTLf.

Program synthesis. Automated program synthesis, synthesis from specifications in Linear Temporal Logic on finite traces, game-theoretic view, arena and objective, adversarial reachability. Synthesis under environment specifications. Planning for LTLf goals in nondeterministic domains as synthesis under environment specifications.

Reasoning and learning. Merging Reasoning on Actions and Reinforcement Learning. RL for non-Morkovin rewards expressed in LTLf. Restraining Bolts and Sheilds for RL.

Foundations of Self-Programming Agents: 2022-2023

Jan 15 2023 – March 11 2023

Instructors: Giuseppe De Giacomo, Antonio Di Stasio
Host Institutions: Oxford University, UK
Level: Graduate.
Course duration: 20 lectures.
Course Type: Lecture series.
Class delivery modality: In Presence.
In presence location: Department of Computer Science, Oxford University, UK
Language: English

Abstract

This course studies the foundations of how to build autonomous agents that have the ability to self-programming their behavior to accomplish desired tasks in their environment. This is related to Reasoning about Actions and Planning in AI and to Program Synthesis in Formal Methods but focuses on tasks (goals) expressed over finite traces. Specifically, borrowing from Formal Methods the course will consider tasks and environment specifications expressed in LTL, as well as its finite trace variant LTLf. It will review the main results and algorithmic techniques to handle planning in nondeterministic domains. Then, it will draw connections with verification, and reactive synthesis, together with their game-theoretic solution techniques. The main catch is that working with these logics can be based on devising suitable 2-player games and finding strategies, i.e., plans, to win them. Specifically, we will cover the following topics: planning in nondeterministic domains, temporal logics, LTL, LTLf, game-theoretic techniques, safety games, reachability games, games for LTLf objectives, LTLf reactive synthesis. Also, connections with decision processes and reinforcement learning will be considered.

Course content

Part 1: Exploiting models of the environment to reason on consequences of actions (3 lectures) In this part of the course, we study classical Reasoning about Actions developed in Knowledge Representation. The notion of acting in an environment, how classically the environment is specified, how actions and their effects are modeled, how to reason to understand if a certain course of action can indeed be executed and what effects it will bring about.

Part 2: Planning courses of action in the environment (3 lectures + 1 class) In this part of the course, we study advanced forms of planning in AI, in particular, planning in so-called “nondeterministic domains”. We show how languages for expressing planning domains are based on the insight gained by studying Reasoning about Actions. We consider what kind of solution concepts are needed when planning in nondeterministic domains, and how these are related to strategic reasoning in 2-player games.

Part 3: LTL and LTLf as a specification language (2 lectures) In this part of the course, we review Linear Time Logic on infinite and finite traces used in Formal Methods for specifying dynamic systems. We intend to use it as a specification language for specifying complex tasks as well as environment properties. We review Safety Properties, Guarantee Properties, and forms of Fairness and Stability, including GR(1). We also review connections to automata and their use in model checking.

Part 4: LTL and LTLf Reactive Synthesis (4 lectures + 1 class) In this part of the course, we introduce synthesis from LTL and LTLf specifications. The idea is that, given a logical specification, we can automatically synthesize a program that is guaranteed to fulfill the specification. We focus specifically on “reactive synthesis”, i.e., synthesize programs for interactive/reactive ongoing computations (protocols, operating systems, controllers, robots, etc.). This kind of synthesis is deeply related to planning in nondeterministic environments. We consider mainly synthesis for LTLf specifications, but we will also study synthesis for some restricted forms of LTL specifications, such as safety properties, and limited forms of guarantee, reactivity, fairness, including GR(1), and stability.

Part 5: LTL and LTLf Reactive Synthesis under Environment Specifications (4 lectures + 1 class) In this part of the course, we introduce Reactive Synthesis under Environment Specifications. That is, we take advantage of the knowledge about the environment’s behavior in achieving the agent tasks. We consider how to obtain the game arena out of the LTL and LTLf specifications of both the agent task and the environment behaviors, and how to solve safety games, reachability games, games for LTLf objectives, and for objectives expressed in fragments of LTL. We also discuss how winning regions of such arenas are related to the notion of “being able” to achieve desired properties (without necessarily committing to a specific strategy to do so). We focus on agent tasks that eventually terminate and hence are specified in LTLf. While for the environment we focus on Safety specifications and limited forms of guarantee, reactivity, fairness, and stability.

Part 6: LTL and LTLf Reactive Synthesis Techniques (2 lectures + 1 class) In this part of the course, we further examine synthesis algorithms based on 2-player game-theoretic techniques. Specifically, we consider both symbolic algorithms and algorithms that explore the search space on-the-fly while synthesizing the program.

Part 7: Learning environment behaviors and performing tasks (2 lectures + 1 class) In this last part of the course, we make connections with Decision Processes and Reinforcement Learning. The point is that in some cases the agent has a simulator of the environment instead of a formal specification, so it needs to learn its strategies to achieve its task in the environment. Sometimes even the task is only implicitly specified through rewards. The key issue is that the kind of properties we are often interested in (the ones that we specify in LTL or LTLf ) are non-Markovian, and hence we need to introduce non-Markovian characteristics in decision processes and reinforcement learning.

Non-Markov Decision Processes and Reinforcement Learning

7-21 November 2022

Instructors: Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Alessandro Ronca (Sapienza University of Rome)
Guest Lecturers: Roberto Cipollone, Gabriel Paludo Licks, Elena Umili (Sapienza University of Rome)
Host Institutions: Sapienza University with the support of ICT-48 TAILOR and AIDA
Level: Ph.D.
Course duration: 20 hours.
Course Type: Lecture series.
Registration: Free but required (Please register here: https://forms.gle/Nua6Sef2xgmtsbGs8)
Class delivery modality: Blended (Online and In Presence).
In presence location: Room B203, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
Online Location: Zoom (link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09)
Schedule: Every Monday and Friday from 11:00 to 13:00 CET (Slot 1) and from 14:00 to 16:00 CET (Slot 2).
Language: English
Notes: We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/Nua6Sef2xgmtsbGs8

Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/95223920362?pwd=b3N3ZkE4Qmxjd0J5UDRoNW1OQU1SZz09

Abstract

We present non-Markov decision processes, where rewards and dynamics can depend on the history of events. This is contrast with Markov Decision Processes, where the dependency is limited to the last state and action. We study how to specify non-Markov reward functions and dynamics functions using Linear Temporal Logic on finite traces. The resulting decision processes are called Regular Decision Processes, and we show how to solve them by extending solution techniques for Markov Decision Processes. Then, we turn to Reinforcement Learning. First, we study the Restraining Bolt, a device that enables an agent to learn a specified non-Markov behaviour while relying on the Markov property. Second, we study how an agent can achieve an optimal behaviour in a non-Markov domain, by learning a finite-state automaton that describes rewards and dynamics. Specifically we will cover the following topics: MDP with Non-Markov Rewards, Non-Markov Dynamics, Regular Decision Processes, Restraining Bolts, Linear Time Logic on finite traces as a reward/dynamics specification language, Reinforcement Learning, Deep Reinforcement Learning, Automata Learning. This course is partially based on the work carried out in ERC Advanced Grant WhiteMech and EU ICT-48 TAILOR.

Calendar and course content

Date	Slot 1 (11:00 to 13:00 CET)	Slot 2 (14:00 to 16:00 CET)
Monday 7th November	Markov Decision Processes, discounting, absorbing states, value iteration, policy iteration (Patrizi) Material: Video Slides	Markov Reinforcement Learning, exploration/exploitation, Q-learning, SARSA (Iocchi) Material: Video Slides
Friday 11th November	Linear Temporal Logic on Finite Traces (LTLf) and automata (De Giacomo) Material: Video Slides	Non-Markov Reward Decision Process (NMRDPs), restraining bolts, reward machines (Patrizi) Material: Video Slides (1) Slides (2)
Monday 14th November	Non-Markov Decision Processes (NMDPs), Regular Decision Processes (RDPs) and their relationship with Partially-Observable Markov Decision Processes (POMDPs) (Ronca and Cipollone) Material: Video Slides (1) Slides (2)	Automata learning and model learning (Paludo) Material: Video Slides
Friday 18th November	Regular Decision Processes (RDPs), reinforcement learning in RDPs (Ronca) Material: Video Slides	Markov Deep Reinforcement Learning (Umili) Material: Video Slides
Monday 21st November	Performance guarantees in non-Markov reinforcement learning (Ronca) Material: Video Slides	Non-Markov Deep Reinforcement Learning (Cipollone) Material: Video Slides

Game-Theoretic Approach to Planning and Synthesis 2022

4-8 July 2022

Instructors: Giuseppe De Giacomo, Antonio Di Stasio, Giuseppe Perelli, Shufang Zhu (Sapienza University of Rome)
Host Institutions: Sapienza University & ICT-48 TAILOR
Level: Postgraduate.
Course duration: 20 hours.
Course Type: Lecture series.
Registration: Free but required (Please register here: https://forms.gle/G6jj71kE92ZDqWU29)
Class delivery modality: Blended (Online and In Presence).
In presence location: Room A5, Department of Computer, Automation, and Management Engineering, Sapienza University of Rome
Online Location: Zoom (link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1)
Schedule: From Monday 4th July to Friday 8th July from 14:00 to 18:00 CEST (Central European Summer Time)
Language: English
Notes: We, as instructors, will not give exams to other PhD curricula except Sapienza’s. Though we can informally discuss the topics in the course with anyone interested in them.

Registration

All the students interested in attending the course are invited to register a the following link:

https://forms.gle/G6jj71kE92ZDqWU29

Online Attendance

To attend the course online, please visit the following zoom link: https://uniroma1.zoom.us/j/81644497919?pwd=I2zu7FU2uauar9RYRgK1AJKr3Gydso.1

Lecture recordings and slides

Lecture 1 - De Giacomo: Introduction to planning and synthesis Video - Slides, fixpoints-mucalc
Lecture 2 - Perelli: Linear Temporal Logic Video - Slides
Lecture 3 - De Giacomo: Planning on deterministic and nondeterministc domains Video - Slides
Lecture 4 - Perelli: LTL Model Checking Video - Slides
Lecture 5 - De Giacomo: Planning with LTLf goals Video - Slides
Lecture 6 - Perelli: Games on graphs Video - Slides
Lecture 7 - Di Stasio: Synthesis for LTLf goals under LTL specifications Video - Slides
Lecture 8 - Zhu: Solution for notable cases of LTLf goals under LTL specifications Video - Slides
Lecture 9 - Zhu: Symbolic techniques Video - Slides
Lecture 10 - Di Stasio: Parity Games Video - Slides

Abstract

Course content

Planning: Introduction to planning and synthesis. Fixpoint calculations. Deterministic and nondeterministc domains. Planning with temporally extended goals. LTLf: linear-time temporal logic over finite traces. From LTLf o finite automata. Deterministic and nondeterministc domains with LTLf goals.

Linear Temporal Logic: Logic based language for the representation of programs. Model checking with LTL. The Synthesis problem with LTL goals. Automata-theoretic approach to Model Checking and Synthesis.

Games: Introduction to 2-player (turn-based) games on graphs and their relatioship with Planning and Synthesis. Solutions to simple objective games: Reachability, Safety, Safe-Reach, Buchi Games. Parity games: a more sophisticated game objective. Solution to parity games and their relationship with LTL synthesis.

Synthesis under environment specifications: The environment represented as an LTL specification. The synthesis under environment specifications solved as synthesis of an implication formula. Direct and more effective solutions to the synthesis under environment specifications for notable cases of LTLf formulas: Safety, Safety & Fairness, Safety & Stability, Safety & GR(1). Symbolic representation and techniques for Planning and Synthesis.

Calendar

	Slot 1	Slot 2
Monday 4th July	Introduction to planning and synthesis, transition systems, and fixpoints (De Giacomo)	Linear Temporal Logic -- LTL (Perelli)
Tuesday 5th July	Planning on deterministic and nondeterministic domains (De Giacomo)	LTL Model Checking and Synthesis (Perelli)
Wednesday 6th July	Planning with LTLf goals (De Giacomo)	Games on Graphs (Perelli)
Thursday 7th July	Synthesis for LTLf goals under LTL specifications (Di Stasio)	Solutions for notable cases of LTLf goals under LTL specifications (Zhu)
Friday 8th July	Symbolic techniques (Zhu)	Parity Games: definitions and solutions (Di Stasio)