Base_ac_agent_recurrent

class conformer_rl.agents.base_ac_agent_recurrent.BaseACAgentRecurrent(config: conformer_rl.config.agent_config.Config)

Bases: conformer_rl.agents.base_ac_agent.BaseACAgent, conformer_rl.agents.base_agent_recurrent.BaseAgentRecurrent

Base interface for building reinforcement learning agents that use actor-critic algorithms with support for recurrent neural networks”

Parameters: config (Config) – Configuration object for the agent. See notes for a list of config parameters used by specific pre-built agents.

_sample() → None: Collects samples from the training environment.

_calculate_advantages() → None

Performs advantage estimation.

Uses either SARSA or generalized advantage estimation (GAE) for estimating advantages, depending on the config.

_eval_episode() → dict

Evalutes the agent on a single episode of the evaluation environment.

Returns: Information from the evaluation environment to be logged by the eval_logger.
Return type: dict

_eval_step(state: object, rstates: Optional[Any] = None) → Tuple[Any, Any]

Evalutes the agent on a single step of an episode of the evaluation environment.

Parameters

state (object) – The current observation from the environment.
rstates (Any) – Recurrent states from the previous iteration of the neural network. If none are supplied, they will be automatically initialized by the neural network.

Returns

prediction[‘a’] (Any) – The action to be taken in the next step of the environment.
rstates (Any) – The next recurrent states to be inputted into the neural network.

evaluate() → None

Evaluates the agent on the evaluation environment.

Information dict returned by the environment’s conformer_rl.environments.conformer_env.ConformerEnv.step() method is logged by the eval_logger and saved.

load(filename: str) → None

Loads the neural network with weights.

Parameters: filename (str) – The path where the neural network weights are saved.

run_steps() → None

Trains the agent.

Trains the agent until the maximum number of steps (specified by config) is reached. Also periodically saves neural network parameters and performs evaluations on the agent, if specified in the config.

save(filename: str) → None

Saves the neural network weights to a file.

Parameters: filename (str) – The path where the neural network weights are to be saved.

step() → None: Performs one iteration of acquiring samples on the environment and then trains on the acquired samples.