Agent_config
- class conformer_rl.config.agent_config.Config
Configuration object for agents.
Specifies parameters and hyperparameters for agents. See attributes below for details.
- tag
Used to identify the training run in saved log files and in Tensorboard.
- Type
str, required for all agents
- train_env
Wrapper for environments used to train the agents.
- Type
wrapper for environments from
Task(), required for all agents
- eval_env
Wrapper for environment used to evaluate the agent.
- Type
wrapper for environments from
Task(), optional
- network
Neural network to be used by the agent.
- Type
pytorch neural network module (torch.nn.Module), required for all agents
- optimizer_fn
Lambda function that maps the parameters of a torch.nn.module (as obtained by calling the .parameters() method on the module) to a torch.optim.Optimizer function by passing in the parameters to the constructor of the optimizer function. For example:
config.optimizer_fn = lambda params : torch.optim.Adam(params, lr=0.001)
- Type
lambda(iterable) -> torch.optim.Optimizer, required for all agents
- num_workers
Number of parallel environments to sample from during training.
- Type
int, required by all agents
- rollout_length
Number of environment steps taken by each worker during each sampling iteration.
- Type
int, required by all agents
- max_steps
Number of environment steps to take before ending agent training.
- Type
int, required by all agents
- save_interval
How often (in environment steps) to save neural network parameters. If set to 0, parameters will not be saved.
- Type
int, required by all agents
- eval_interval
How often to evaluate the agent on the eval environment.
- Type
int, required by all agents
- eval_episodes
How many episodes to evaluate the agent during each evaluation.
- Type
int, required by all agents
- recurrence
Number of steps taken before resetting recurrent states when training agent/updating network weights.
- Type
int, required by recurrent agents
- optimization_epochs
Number of epochs for training each minibatch. Used for PPO and PPORecurrent agents.
- Type
int
- mini_batch_size
Size of each mini batch to train on. Used for PPO and PPORecurrent agents.
- Type
int
- discount
Discount factor (often denoted by γ) used for advantage estimation.
- Type
float, required by all agents.
- use_gae
Determines whether to use generalized advantage estimation (GAE) for estimating advantages, or SARSA update.
- Type
bool, required by all agents.
- gae_lambda
The λ parameter used by the generalized advantage estimator (gae). See 1 for details.
- Type
float, required by all agents if use_gae is
True
- entropy_weight
Coefficient for the entropy when calculating total loss.
- Type
float, required by all agents
- value_loss_coefficient
Coefficient for the value loss when calculating total loss.
- Type
float, required by all agents
- gradient_clip
Max norm for clipping gradients for neural network.
- Type
float, required by all agents
- ppo_ratio_clip
Clipping parameter ε for PPO algorithm, see 2 for details.
- Type
float, required by PPO and PPORecurrent agents.
- curriculum_agent_buffer_len
The number of most recent completed episodes in which to evaluate the agent on for curriculum learning. See
update_curriculum()for more details on how curriculum learning is implemented.- Type
int, required by all curriculum agents
- curriculum_agent_reward_thresh
The reward threshold for considering the agent to have “succeeded” in an episode. Used for evaluating the agent for curriculum learning. See
update_curriculum()for more details on how curriculum learning is implemented.- Type
float, required by all curriculum agents
- curriculum_agent_success_rate
The minimum success rate for the agent to signal the environment to increase the level/difficulty for the curriculum. See
update_curriculum()for more details on how curriculum learning is implemented.- Type
float, required by all curriculum agents
- curriculum_agent_fail_rate
The maximum success rate for the agent to signal the environment to decrease the level/difficulty of the curriculum. See
update_curriculum()for more details on how curriculum learning is implemented.- Type
float, required by all curriculum agents
- data_dir
Directory path for saving log files.
- Type
str, required by all agents
- use_tensorboard
Whether or not to save agent information to Tensorboard.
- Type
bool, required by all agents
References