Agent_config

class conformer_rl.config.agent_config.Config

Configuration object for agents.

Specifies parameters and hyperparameters for agents. See attributes below for details.

tag

Used to identify the training run in saved log files and in Tensorboard.

Type: str, required for all agents

train_env

Wrapper for environments used to train the agents.

Type: wrapper for environments from Task(), required for all agents

eval_env

Wrapper for environment used to evaluate the agent.

Type: wrapper for environments from Task(), optional

network

Neural network to be used by the agent.

Type: pytorch neural network module (torch.nn.Module), required for all agents

optimizer_fn

Lambda function that maps the parameters of a torch.nn.module (as obtained by calling the .parameters() method on the module) to a torch.optim.Optimizer function by passing in the parameters to the constructor of the optimizer function. For example:

config.optimizer_fn = lambda params : torch.optim.Adam(params, lr=0.001)

Type: lambda(iterable) -> torch.optim.Optimizer, required for all agents

num_workers

Number of parallel environments to sample from during training.

Type: int, required by all agents

rollout_length

Number of environment steps taken by each worker during each sampling iteration.

Type: int, required by all agents

max_steps

Number of environment steps to take before ending agent training.

Type: int, required by all agents

save_interval

How often (in environment steps) to save neural network parameters. If set to 0, parameters will not be saved.

Type: int, required by all agents

eval_interval

How often to evaluate the agent on the eval environment.

Type: int, required by all agents

eval_episodes

How many episodes to evaluate the agent during each evaluation.

Type: int, required by all agents

recurrence

Number of steps taken before resetting recurrent states when training agent/updating network weights.

Type: int, required by recurrent agents

optimization_epochs

Number of epochs for training each minibatch. Used for PPO and PPORecurrent agents.

Type: int

mini_batch_size

Size of each mini batch to train on. Used for PPO and PPORecurrent agents.

Type: int

discount

Discount factor (often denoted by γ) used for advantage estimation.

Type: float, required by all agents.

use_gae

Determines whether to use generalized advantage estimation (GAE) for estimating advantages, or SARSA update.

Type: bool, required by all agents.

gae_lambda

The λ parameter used by the generalized advantage estimator (gae). See 1 for details.

Type: float, required by all agents if use_gae is True

entropy_weight

Coefficient for the entropy when calculating total loss.

Type: float, required by all agents

value_loss_coefficient

Coefficient for the value loss when calculating total loss.

Type: float, required by all agents

gradient_clip

Max norm for clipping gradients for neural network.

Type: float, required by all agents

ppo_ratio_clip

Clipping parameter ε for PPO algorithm, see 2 for details.

Type: float, required by PPO and PPORecurrent agents.

curriculum_agent_buffer_len

The number of most recent completed episodes in which to evaluate the agent on for curriculum learning. See update_curriculum() for more details on how curriculum learning is implemented.

Type: int, required by all curriculum agents

curriculum_agent_reward_thresh

The reward threshold for considering the agent to have “succeeded” in an episode. Used for evaluating the agent for curriculum learning. See update_curriculum() for more details on how curriculum learning is implemented.

Type: float, required by all curriculum agents

curriculum_agent_success_rate

The minimum success rate for the agent to signal the environment to increase the level/difficulty for the curriculum. See update_curriculum() for more details on how curriculum learning is implemented.

Type: float, required by all curriculum agents

curriculum_agent_fail_rate

The maximum success rate for the agent to signal the environment to decrease the level/difficulty of the curriculum. See update_curriculum() for more details on how curriculum learning is implemented.

Type: float, required by all curriculum agents

data_dir

Directory path for saving log files.

Type: str, required by all agents

use_tensorboard

Whether or not to save agent information to Tensorboard.

Type: bool, required by all agents

References

1: Generalized Advantage Estimation (GAE) paper
2: PPO Paper