Quickstart¶
In this section, we will see the main use cases with the Sapientino environment.
The environment is supposed to be configurable. At the moment, there isn't a default goal to achieve. The reward should be customized before using the environment.
Using Gym registry¶
import gym
import gym_sapientino
env = gym.make("Sapientino-v0")
The initial state is:
Building the environment programmatically¶
First, we set up an agent configuration:
from gym_sapientino.core.configurations import SapientinoAgentConfiguration
agent_config = SapientinoAgentConfiguration(differential=False)
Next, we define the configuration for the environment:
from gym_sapientino.core.configurations import SapientinoConfiguration
from gym_sapientino import SapientinoDictSpace
agent_configs = [agent_config,]
environment_configuration = SapientinoConfiguration(
agent_configs=agent_configs,
reward_outside_grid=-1.0,
reward_duplicate_beep=-1.0,
reward_per_step=-0.01
)
The description of the arguments:
agent_configs
: the list of agent configurations (provide more than one for multi-agent setting)reward_outside_grid
: the reward to give when the robot tries to go outside the grid.reward_duplicate_beep
: the reward to give when the robot does a beep in a cell where the beep has been already done.reward_per_step
: the reward to give at each step.
Then, instantiate the environment:
env = SapientinoDictSpace(environment_configuration)
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
initial_state = env.reset()
print(f"Initial state: {initial_state}")
Observation space: Tuple(Dict(beep:Discrete(2), color:Discrete(8), x:Discrete(7), y:Discrete(5))) Action space: Tuple(Discrete(6)) Initial state: ({'x': 1, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0},)
The observation space of the wrapper SapientinoDictSpace
is a tuple of dictionaries of the following form:
x
, the $x$-coordinate of the robot in the gridy
, the $y$-coordinate of the robot in the gridtheta
, the orientation of the robot in the grid (that is, either $0^\circ$, $90^\circ$, $180^\circ$ or $270^\circ$, discretized so to be between $0$ and $3$). This attribute is only present in thedifferential
mode (see below).beep
, a boolean that tells whether the last action was a beep.color
, the currently observed color (blank color is $0$).
In the multi-agent configuration, there would be a tuple of such observations.
The action space is either "directional" (up, down, left, right)
or "differential" ("turn left", "turn right", "forward", "backward"),
plus a "nop" action and a "beep" action.
The boolean argument differential
in the agent configuration
controls the action spaces of the associated agent.
Example of directional agent:
Exmaple of differential agent:
Multiagent setup¶
It is possible to have multiple agents in the same grid.
a1 = SapientinoAgentConfiguration(differential=False)
a2 = SapientinoAgentConfiguration(differential=True)
a3 = SapientinoAgentConfiguration(differential=True)
agent_configs = [a1, a2, a3]
environment_configuration = SapientinoConfiguration(
agent_configs=agent_configs,
)
env = SapientinoDictSpace(environment_configuration)
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
initial_state = env.reset()
print(f"Initial state: {initial_state}")
Observation space: Tuple(Dict(beep:Discrete(2), color:Discrete(8), x:Discrete(7), y:Discrete(5)), Dict(beep:Discrete(2), color:Discrete(8), theta:Discrete(4), x:Discrete(7), y:Discrete(5)), Dict(beep:Discrete(2), color:Discrete(8), theta:Discrete(4), x:Discrete(7), y:Discrete(5))) Action space: Tuple(Discrete(6), Discrete(6), Discrete(6)) Initial state: ({'x': 1, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0}, {'x': 3, 'y': 2, 'theta': 1, 'beep': 0, 'color': 0}, {'x': 5, 'y': 2, 'theta': 1, 'beep': 0, 'color': 2})
Here an example of a random run: