Reward_mixins

Pre-built reward handlers.

class conformer_rl.environments.environment_components.reward_mixins.GibbsRewardMixin

Bases: object

Implements the Gibbs Score reward 1, but distance metric between conformers are judged by whether the conformers were produced by the same action input, instead of with TFD (Torsional Fingerprint Deviation).

References

1(1,2,3,4): TorsionNet paper

_reward() → float

Notes

Logged parameters:

energy (float): the energy of the current conformer
repeat (int): total number of repeated actions so far in the episode

class conformer_rl.environments.environment_components.reward_mixins.GibbsEndPruningRewardMixin

Bases: object

Implements the Gibbs Score reward 1, except overly similar conformers are only pruned at the end of an episode and therefore is only reflected in the final reward in each episode.

_reward() → float

Notes

Logged parameters:

energy (float): the energy of the current conformer

class conformer_rl.environments.environment_components.reward_mixins.GibbsPruningRewardMixin

Bases: object

Implements the Gibbs Score reward 1.

_reward() → float

Notes

Logged parameters:

energy (float): the energy of the current conformer

class conformer_rl.environments.environment_components.reward_mixins.GibbsLogPruningRewardMixin

Bases: conformer_rl.environments.environment_components.reward_mixins.GibbsPruningRewardMixin

Implements the log of the Gibbs Score reward 1.

_reward() → float

Notes

Logged parameters:

energy (float): the energy of the current conformer