Quickstart¶

In this section, we will see the main use cases with the Sapientino environment.

The environment is supposed to be configurable. At the moment, there isn't a default goal to achieve. The reward should be customized before using the environment.

Using Gym registry¶

In [2]:

import gym
import gym_sapientino
env = gym.make("Sapientino-v0")

The initial state is:

Building the environment programmatically¶

First, we set up an agent configuration:

In [3]:

from gym_sapientino.core.configurations import SapientinoAgentConfiguration

agent_config = SapientinoAgentConfiguration(differential=False)

Next, we define the configuration for the environment:

In [4]:

from gym_sapientino.core.configurations import SapientinoConfiguration
from gym_sapientino import SapientinoDictSpace

agent_configs = [agent_config,]
environment_configuration = SapientinoConfiguration(
    agent_configs=agent_configs,
    reward_outside_grid=-1.0,
    reward_duplicate_beep=-1.0,
    reward_per_step=-0.01
)

The description of the arguments:

agent_configs: the list of agent configurations (provide more than one for multi-agent setting)
reward_outside_grid: the reward to give when the robot tries to go outside the grid.
reward_duplicate_beep: the reward to give when the robot does a beep in a cell where the beep has been already done.
reward_per_step: the reward to give at each step.

Then, instantiate the environment:

In [5]:

env = SapientinoDictSpace(environment_configuration)
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
initial_state = env.reset()
print(f"Initial state: {initial_state}")

Observation space: Tuple(Dict(beep:Discrete(2), color:Discrete(8), x:Discrete(7), y:Discrete(5)))
Action space: Tuple(Discrete(6))
Initial state: ({'x': 1, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0},)

The observation space of the wrapper SapientinoDictSpace is a tuple of dictionaries of the following form:

x, the $x$-coordinate of the robot in the grid
y, the $y$-coordinate of the robot in the grid
theta, the orientation of the robot in the grid (that is, either $0^\circ$, $90^\circ$, $180^\circ$ or $270^\circ$, discretized so to be between $0$ and $3$). This attribute is only present in the differential mode (see below).
beep, a boolean that tells whether the last action was a beep.
color, the currently observed color (blank color is $0$).

In the multi-agent configuration, there would be a tuple of such observations.

The action space is either "directional" (up, down, left, right) or "differential" ("turn left", "turn right", "forward", "backward"), plus a "nop" action and a "beep" action. The boolean argument differential in the agent configuration controls the action spaces of the associated agent.

Example of directional agent:

Exmaple of differential agent:

Multiagent setup¶

It is possible to have multiple agents in the same grid.

In [6]:

a1 = SapientinoAgentConfiguration(differential=False)
a2 = SapientinoAgentConfiguration(differential=True)
a3 = SapientinoAgentConfiguration(differential=True)
agent_configs = [a1, a2, a3]
environment_configuration = SapientinoConfiguration(
    agent_configs=agent_configs,
)
env = SapientinoDictSpace(environment_configuration)
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
initial_state = env.reset()
print(f"Initial state: {initial_state}")

Observation space: Tuple(Dict(beep:Discrete(2), color:Discrete(8), x:Discrete(7), y:Discrete(5)), Dict(beep:Discrete(2), color:Discrete(8), theta:Discrete(4), x:Discrete(7), y:Discrete(5)), Dict(beep:Discrete(2), color:Discrete(8), theta:Discrete(4), x:Discrete(7), y:Discrete(5)))
Action space: Tuple(Discrete(6), Discrete(6), Discrete(6))
Initial state: ({'x': 1, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0}, {'x': 3, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0}, {'x': 5, 'y': 2, 'theta': 1, 'beep': 0, 'color': 2})

Here an example of a random run:

In [ ]: