A Unifying Framework for Observer-Aware Planning: Non-Markovian Reward Perspective

When: Thursday, May 27, 2021, 3pm (CEST).

Where: Zoom online, check the address in the Google Calendar Event.

Topic: A Unifying Framework for Observer-Aware Planning: Non-Markovian Reward Perspective.

Speaker: Shuwa Miura, PhD student at the University of Massachusetts Amherst, supervised by Shlomo Zilberstein.

Abstract

Being aware of observers and the inferences they make about an agent’s behavior is crucial for successful multi-agent interaction. Existing works on observer-aware planning use different assumptions and techniques to produce observer-aware behaviors. We introduce a unifying framework for producing observer-aware behaviors called Observer-Aware MDP (OAMDP). We provide initial empirical evidence that OAMDPs can be used to improve interpretability of agent behaviors and establish complexity results for polynomial-horizon OAMDPs. OAMDP defines a particular class of non-Markovian Reward Decision Processes. Most previous work on non-Markovian reward have used a temporal logic over finite histories to describe non-Markovian rewards. We show complexity results for polynomial-horizon non-Markovian Reward Decision Processes with temporal logic, analogous to the results for OAMDP. We then discuss possible directions for future research.