Curriculum-Supported Agents

class conformer_rl.agents.curriculum_agents.ExternalCurriculumAgentMixin(config)

Bases: object

General mixin class to enable curriculum

Adds functionality to an existing agent for externally interacting with an environment supporting curriculum learning.

Parameters: config (Config) – Configuration object for the agent. See notes for a list of config parameters used by this agent.

Notes

In addition to the config parameters required for the base agent class, use of this mixin requires the following additional parameters in the Config object:

curriculum_agent_buffer_len
curriculum_agent_reward_thresh
curriculum_agent_success_rate
curriculum_agent_fail_rate

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.

update_curriculum() → None

Evaluates the current performance of the agent and signals the environment to increase the level (difficulty) or decrease it depending on the agent’s performance.

The agent is evaluated only when the number of episodes elapsed since the last evaluation has exceeded the parameter curriculum_agent_buffer_len assigned in the Config object. During the evaluation, the ratio of episodes (out of the last curriculum_agent_buffer_len episodes) which have a reward exceeding the curriculum_agent_reward_thresh parameter defined in the Config is calculated. If this ratio exceeds the curriculum_agent_success_rate parameter, the environment is signaled to increase the difficulty of the curriculum. This is done by calling the increase_level method of the environment. If the ratio is less than the curriculum_agent_fail_rate parameter, the environment is told to decrease the difficulty.

class conformer_rl.agents.curriculum_agents.PPOExternalCurriculumAgent(config)

Bases: conformer_rl.agents.curriculum_agents.ExternalCurriculumAgentMixin, conformer_rl.agents.PPO.PPO_agent.PPOAgent

Implementation of PPOAgent compatible with environments that use curriculum learning. See update_curriculum() for more details.

evaluate() → None

Evaluates the agent on the evaluation environment.

Information dict returned by the environment’s conformer_rl.environments.conformer_env.ConformerEnv.step() method is logged by the eval_logger and saved.

load(filename: str) → None

Loads the neural network with weights.

Parameters: filename (str) – The path where the neural network weights are saved.

run_steps() → None

Trains the agent.

Trains the agent until the maximum number of steps (specified by config) is reached. Also periodically saves neural network parameters and performs evaluations on the agent, if specified in the config.

save(filename: str) → None

Saves the neural network weights to a file.

Parameters: filename (str) – The path where the neural network weights are to be saved.

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.

update_curriculum() → None

Evaluates the current performance of the agent and signals the environment to increase the level (difficulty) or decrease it depending on the agent’s performance.

The agent is evaluated only when the number of episodes elapsed since the last evaluation has exceeded the parameter curriculum_agent_buffer_len assigned in the Config object. During the evaluation, the ratio of episodes (out of the last curriculum_agent_buffer_len episodes) which have a reward exceeding the curriculum_agent_reward_thresh parameter defined in the Config is calculated. If this ratio exceeds the curriculum_agent_success_rate parameter, the environment is signaled to increase the difficulty of the curriculum. This is done by calling the increase_level method of the environment. If the ratio is less than the curriculum_agent_fail_rate parameter, the environment is told to decrease the difficulty.

class conformer_rl.agents.curriculum_agents.PPORecurrentExternalCurriculumAgent(config)

Bases: conformer_rl.agents.curriculum_agents.ExternalCurriculumAgentMixin, conformer_rl.agents.PPO.PPO_recurrent_agent.PPORecurrentAgent

Implementation of PPORecurrentAgent compatible with environments that use curriculum learning. See update_curriculum() for more details.

evaluate() → None

Evaluates the agent on the evaluation environment.

Information dict returned by the environment’s conformer_rl.environments.conformer_env.ConformerEnv.step() method is logged by the eval_logger and saved.

load(filename: str) → None

Loads the neural network with weights.

Parameters: filename (str) – The path where the neural network weights are saved.

run_steps() → None

Trains the agent.

Trains the agent until the maximum number of steps (specified by config) is reached. Also periodically saves neural network parameters and performs evaluations on the agent, if specified in the config.

save(filename: str) → None

Saves the neural network weights to a file.

Parameters: filename (str) – The path where the neural network weights are to be saved.

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.

update_curriculum() → None

Evaluates the current performance of the agent and signals the environment to increase the level (difficulty) or decrease it depending on the agent’s performance.

The agent is evaluated only when the number of episodes elapsed since the last evaluation has exceeded the parameter curriculum_agent_buffer_len assigned in the Config object. During the evaluation, the ratio of episodes (out of the last curriculum_agent_buffer_len episodes) which have a reward exceeding the curriculum_agent_reward_thresh parameter defined in the Config is calculated. If this ratio exceeds the curriculum_agent_success_rate parameter, the environment is signaled to increase the difficulty of the curriculum. This is done by calling the increase_level method of the environment. If the ratio is less than the curriculum_agent_fail_rate parameter, the environment is told to decrease the difficulty.

class conformer_rl.agents.curriculum_agents.A2CExternalCurriculumAgent(config)

Bases: conformer_rl.agents.curriculum_agents.ExternalCurriculumAgentMixin, conformer_rl.agents.A2C.A2C_agent.A2CAgent

Implementation of A2CAgent compatible with environments that use curriculum learning. See update_curriculum() for more details.

evaluate() → None

Evaluates the agent on the evaluation environment.

Information dict returned by the environment’s conformer_rl.environments.conformer_env.ConformerEnv.step() method is logged by the eval_logger and saved.

load(filename: str) → None

Loads the neural network with weights.

Parameters: filename (str) – The path where the neural network weights are saved.

run_steps() → None

Trains the agent.

Trains the agent until the maximum number of steps (specified by config) is reached. Also periodically saves neural network parameters and performs evaluations on the agent, if specified in the config.

save(filename: str) → None

Saves the neural network weights to a file.

Parameters: filename (str) – The path where the neural network weights are to be saved.

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.

update_curriculum() → None

Evaluates the current performance of the agent and signals the environment to increase the level (difficulty) or decrease it depending on the agent’s performance.

The agent is evaluated only when the number of episodes elapsed since the last evaluation has exceeded the parameter curriculum_agent_buffer_len assigned in the Config object. During the evaluation, the ratio of episodes (out of the last curriculum_agent_buffer_len episodes) which have a reward exceeding the curriculum_agent_reward_thresh parameter defined in the Config is calculated. If this ratio exceeds the curriculum_agent_success_rate parameter, the environment is signaled to increase the difficulty of the curriculum. This is done by calling the increase_level method of the environment. If the ratio is less than the curriculum_agent_fail_rate parameter, the environment is told to decrease the difficulty.

class conformer_rl.agents.curriculum_agents.A2CRecurrentExternalCurriculumAgent(config)

Bases: conformer_rl.agents.curriculum_agents.ExternalCurriculumAgentMixin, conformer_rl.agents.A2C.A2C_recurrent_agent.A2CRecurrentAgent

Implementation of A2CRecurrentAgent compatible with environments that use curriculum learning. See update_curriculum() for more details.

evaluate() → None

Evaluates the agent on the evaluation environment.

Information dict returned by the environment’s conformer_rl.environments.conformer_env.ConformerEnv.step() method is logged by the eval_logger and saved.

load(filename: str) → None

Loads the neural network with weights.

Parameters: filename (str) – The path where the neural network weights are saved.

run_steps() → None

Trains the agent.

Trains the agent until the maximum number of steps (specified by config) is reached. Also periodically saves neural network parameters and performs evaluations on the agent, if specified in the config.

save(filename: str) → None

Saves the neural network weights to a file.

Parameters: filename (str) – The path where the neural network weights are to be saved.

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.

update_curriculum() → None

Evaluates the current performance of the agent and signals the environment to increase the level (difficulty) or decrease it depending on the agent’s performance.

The agent is evaluated only when the number of episodes elapsed since the last evaluation has exceeded the parameter curriculum_agent_buffer_len assigned in the Config object. During the evaluation, the ratio of episodes (out of the last curriculum_agent_buffer_len episodes) which have a reward exceeding the curriculum_agent_reward_thresh parameter defined in the Config is calculated. If this ratio exceeds the curriculum_agent_success_rate parameter, the environment is signaled to increase the difficulty of the curriculum. This is done by calling the increase_level method of the environment. If the ratio is less than the curriculum_agent_fail_rate parameter, the environment is told to decrease the difficulty.