Analysis

Functions for analyzing and visualizing (in Jupyter/IPython notebook) logged environment data. The functions for visualizations here provide a basic set of functionality to guide users in understanding the format of the logged environment data. Users are encouraged to generate their own plots and visualizations based on their specific needs.

conformer_rl.analysis.analysis._load_from_pickle(filename: str) → Any: Loads an object from a .pickle file.

conformer_rl.analysis.analysis.load_data_from_pickle(paths: List[str], indices: Optional[List[str]] = None) → dict

Loads saved pickled environment data from multiple runs into a combined data dict.

Parameters

paths (list of str) – List of paths to .pickle files corresponding to the environment data from the runs of interest.
indices (list of str, optional) – Specifies custom indices/labels to be displayed in generated Seaborn graphs for each run. Should be the same length as paths. If not specified, the labels default to test0, test1, test2, ....

Returns

The str corresponds to the key for the data in the original pickled dict object. The list contains the data for each of the environment data sets specified in paths, in the same order they were given in paths.

Return type

dict mapping from str to list

Notes

The .pickle files specified by paths should be dumped directly by EnvLogger, and should correspond to a single evaluation episode. See conformer_rl.logging.env_logger.EnvLogger.save_episode() for more details on the dumped format.

An example of how the function operates: Suppose that our paths are:

['data1.pickle', 'data2.pickle', 'data3.pickle']

And each pickle object contains corresponding data:

data1 = {
    'total_rewards': data1_total_rewards,
    'mol': data1_molecule,
    'rewards': [data1_step1_rewards, data1_step2_rewards, data1_step3_rewards, data1_step4_rewards]
}
data2 = {
    'total_rewards': data2_total_rewards,
    'mol': data2_molecule,
    'rewards': [data2_step1_rewards, data2_step2_rewards, data2_step3_rewards, data2_step4_rewards]
}
data3 = {
    'total_rewards': data3_total_rewards,
    'mol': data3_molecule,
    'rewards': [data3_step1_rewards, data3_step2_rewards, data3_step3_rewards, data3_step4_rewards]
}

Suppose that data1 corresponds to some eval data obtained from training with the PPO agent, data2 was obtained from the PPORecurrent agent, and data3 was obtained from training with the A2C agent. Then we can input custom indices to help us understand each dataset better:

indices = ['PPO', 'PPO_recurrent', 'A2C']

Given these data and indices, load_data_from_pickle() would return the following dict:

{
    'indices': ['PPO', 'PPO_recurrent', 'A2C'],
    'total_rewards': [
        data1_total_rewards,
        data2_total_rewards,
        data3_total_rewards
    ],
    'mol': [
        data1_molecule,
        data2_molecule,
        data3_molecule
    ],
    'rewards': [
        [data1_step1_rewards, data1_step2_rewards, data1_step3_rewards, data1_step4_rewards],
        [data2_step1_rewards, data2_step2_rewards, data2_step3_rewards, data2_step4_rewards],
        [data3_step1_rewards, data3_step2_rewards, data3_step3_rewards, data3_step4_rewards]
    ],
}

This format consolidates all the data into a single dict and is compatible with the other visualization functions in this module. Furthermore, it is also easy to convert a dict of this format into a Pandas dataframe or other tabular formats if needed.

conformer_rl.analysis.analysis.list_keys(data: dict) → List[str]

Return a list of all keys in a dict.

Parameters: data (dict) – The dictionary to retrieve keys from.

conformer_rl.analysis.analysis.bar_plot_episodic(key: str, data: dict) → matplotlib.axes._axes.Axes

Plots a bar plot comparing a scalar value across all episodes loaded in data.

Parameters

key (str) – The key for the values to be compared across all data sets/episodes.
data (dict) – Data dictionary generated by load_data_from_pickle().

conformer_rl.analysis.analysis.histogram_select_episodes(key: str, data: dict, episodes: Optional[List[int]] = None, binwidth: float = 10, figsize: Tuple[float, float] = (8.0, 6.0)) → matplotlib.axes._axes.Axes

Plots a single histogram where data for each episode in episodes are overlayed.

Parameters

key (str) – The key for the values to be compared across all data sets/episodes.
data (dict) – Data dictionary generated by load_data_from_pickle().
episodes (list of int, optional) – Specifies the indices in data for the episodes to be shown. If not specified, all episodes are shown.
binwidth (float) – The width of each bin in the histogram.
figsize (2-tuple of float) – Specifies the size of the plot.

conformer_rl.analysis.analysis.histogram_episodic(key: str, data: dict, binwidth: int = 10, figsize: Tuple[float, float] = (8.0, 6.0)) → matplotlib.axes._axes.Axes

Plots histogram on separate axes for each of the episode data sets in data.

Parameters

key (str) – The key for the values to be compared across all data sets/episodes.
data (dict) – Data dictionary generated by load_data_from_pickle().
binwidth (float) – The width of each bin in the histogram.
figsize (2-tuple of float) – Specifies the size of the plot.

conformer_rl.analysis.analysis.heatmap_episodic(key: str, data: dict, figsize: Tuple[float, float] = (8.0, 6.0)) → matplotlib.axes._axes.Axes

Plots heatmap(s) for matrix data corresponding to key across all episodes loaded in data.

Parameters

key (str) – The key for the values to be compared across all data sets/episodes.
data (dict) – Data dictionary generated by load_data_from_pickle().
figsize (2-tuple of float) – Specifies the size of the plot.

conformer_rl.analysis.analysis.calculate_tfd(data: str) → None

Updates data with the TFD (Torsion Fingerprint Deviation) matrix (with key ‘tfd_matrix’) and sum of the TFD matrix (with key ‘tfd_total’) for the molecule conformers across each episode loaded in data.

Parameters: data (dict) – Data dictionary generated by load_data_from_pickle().

conformer_rl.analysis.analysis.drawConformer(mol: rdkit.Chem.Mol, confId: int = - 1, size: Tuple[int, int] = (300, 300), style: str = 'stick') → py3Dmol.view

Displays interactive 3-dimensional representation of specified conformer.

Parameters

mol (RDKit Mol object) – The molecule containing the conformer to be displayed.
confId (int) – The ID of the conformer to be displayed.
size (Tuple[int, int]) – The size of the display (width, height).
style (str) – The drawing style for displaying the molecule. Can be sphere, stick, line, cross, cartoon, and surface.

conformer_rl.analysis.analysis.drawConformer_episodic(data: dict, confIds: List[int], size: Tuple[int, int] = (300, 300), style: str = 'stick') → py3Dmol.view

Displays a specified conformer for each episode loaded in data.

Parameters

data (dict from string to list) – Contains the loaded episode information. ‘mol’ must be a key in data and the corresponding list must contain RDKit Mol objects.
confIds (list of int) – The indices for the conformers to be displayed (for each episode loaded in data).
size (Tuple[int, int]) – The size of the display for each individual molecule (width, height).
style (str) – The drawing style for displaying the molecule. Can be sphere, stick, line, cross, cartoon, and surface.