Agent_config

class conformer_rl.config.agent_config.Config

Configuration object for agents.

Specifies parameters and hyperparameters for agents. See attributes below for details.

tag

Used to identify the training run in saved log files and in Tensorboard.

Type

str, required for all agents

train_env

Wrapper for environments used to train the agents.

Type

wrapper for environments from Task(), required for all agents

eval_env

Wrapper for environment used to evaluate the agent.

Type

wrapper for environments from Task(), optional

network

Neural network to be used by the agent.

Type

pytorch neural network module (torch.nn.Module), required for all agents

optimizer_fn

Lambda function that maps the parameters of a torch.nn.module (as obtained by calling the .parameters() method on the module) to a torch.optim.Optimizer function by passing in the parameters to the constructor of the optimizer function. For example:

config.optimizer_fn = lambda params : torch.optim.Adam(params, lr=0.001)
Type

lambda(iterable) -> torch.optim.Optimizer, required for all agents

num_workers

Number of parallel environments to sample from during training.

Type

int, required by all agents

rollout_length

Number of environment steps taken by each worker during each sampling iteration.

Type

int, required by all agents

max_steps

Number of environment steps to take before ending agent training.

Type

int, required by all agents

save_interval

How often (in environment steps) to save neural network parameters. If set to 0, parameters will not be saved.

Type

int, required by all agents

eval_interval

How often to evaluate the agent on the eval environment.

Type

int, required by all agents

eval_episodes

How many episodes to evaluate the agent during each evaluation.

Type

int, required by all agents

recurrence

Number of steps taken before resetting recurrent states when training agent/updating network weights.

Type

int, required by recurrent agents

optimization_epochs

Number of epochs for training each minibatch. Used for PPO and PPORecurrent agents.

Type

int

mini_batch_size

Size of each mini batch to train on. Used for PPO and PPORecurrent agents.

Type

int

discount

Discount factor (often denoted by γ) used for advantage estimation.

Type

float, required by all agents.

use_gae

Determines whether to use generalized advantage estimation (GAE) for estimating advantages, or SARSA update.

Type

bool, required by all agents.

gae_lambda

The λ parameter used by the generalized advantage estimator (gae). See 1 for details.

Type

float, required by all agents if use_gae is True

entropy_weight

Coefficient for the entropy when calculating total loss.

Type

float, required by all agents

value_loss_coefficient

Coefficient for the value loss when calculating total loss.

Type

float, required by all agents

gradient_clip

Max norm for clipping gradients for neural network.

Type

float, required by all agents

ppo_ratio_clip

Clipping parameter ε for PPO algorithm, see 2 for details.

Type

float, required by PPO and PPORecurrent agents.

curriculum_agent_buffer_len

The number of most recent completed episodes in which to evaluate the agent on for curriculum learning. See update_curriculum() for more details on how curriculum learning is implemented.

Type

int, required by all curriculum agents

curriculum_agent_reward_thresh

The reward threshold for considering the agent to have “succeeded” in an episode. Used for evaluating the agent for curriculum learning. See update_curriculum() for more details on how curriculum learning is implemented.

Type

float, required by all curriculum agents

curriculum_agent_success_rate

The minimum success rate for the agent to signal the environment to increase the level/difficulty for the curriculum. See update_curriculum() for more details on how curriculum learning is implemented.

Type

float, required by all curriculum agents

curriculum_agent_fail_rate

The maximum success rate for the agent to signal the environment to decrease the level/difficulty of the curriculum. See update_curriculum() for more details on how curriculum learning is implemented.

Type

float, required by all curriculum agents

data_dir

Directory path for saving log files.

Type

str, required by all agents

use_tensorboard

Whether or not to save agent information to Tensorboard.

Type

bool, required by all agents

References

1

Generalized Advantage Estimation (GAE) paper

2

PPO Paper