Graph Syntax

FGUtils has its own graph description language. The syntax is closely related to the SMILES format for molecules and reactions. It is kind of an extenstion to SMILES to support modeling ITS graphs and reaction patterns. To convert the SMILES-like description into a graph object use the Parser class. The Caffeine molecular graph can be obtained as follows:

import matplotlib.pyplot as plt
from fgutils import Parser
from fgutils.vis import plot_as_mol

parser = Parser()
mol = parser("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")

fig, ax = plt.subplots(1, 1)
plot_as_mol(mol, ax)
plt.show()
_images/caffeine_example.png

Besides parsing common SMILES it is possible to generate molecule-like graphs with more abstract nodes, i.e., arbitrary node labels. Arbitrary node labels are surrounded by {} (e.g. {label}). This abstract labeling can be used to substitute nodes with specific patterns. In this context the labels are group names of ProxyGroup objects. A ProxyGroup defines a set of sub-graphs to be replaced for the labeled node. This can be done by using a Proxy. Propyl acetate can be created by replacing the labeled node with the propyl group:

import matplotlib.pyplot as plt
from fgutils import Parser
from fgutils.proxy import MolProxy, ProxyGroup
from fgutils.vis import GraphVisualizer

pattern = "CC(=O)O{propyl}"
propyl_group = ProxyGroup("propyl", "CCC")
parser = Parser()
proxy = MolProxy(pattern, propyl_group, parser=parser)

g = parser(pattern)
mol = next(proxy)

vis = GraphVisualizer()
fig, ax = plt.subplots(1, 2, dpi=100, figsize=(12, 3))
vis.plot(g, ax[0], title="Core Pattern")
vis.plot(mol, ax[1], title="Generated Molecule")
plt.show()
_images/labeled_node_example.png

Note

A node can have more than one label. This can be done by separating the labels with a comma, e.g.: {label_1,label_2}.

In the example above the ProxyGroup has only one subgraph pattern. In general, a ProxyGroup is a collection of several possible subgraphs from which one is selected when a new sample is instantiated. To get more information on how graphs are sample take a look at the GraphSampler class and the ProxyGroup constructor. By default a pattern has one anchor at index 0. If you need more control over how a subgraph is inserted into a parent graph you can instantiate the ProxyGraph class. For a ProxyGraph you can provide a list of anchor node indices. The insertion of the subgraph into the parent depends on the number of anchor nodes in the subgraph and the number of edges to the labeled node in the parent. The first edge in the parent connects to the first anchor node in the subgraph and so forth. The following example demonstrates the insertion with multiple anchor nodes:

import matplotlib.pyplot as plt
from fgutils.proxy import MolProxy, ProxyGroup, ProxyGraph, Parser
from fgutils.vis import GraphVisualizer

pattern = "N{g}C{g}(O)C"
g_1 = ProxyGroup("g", ProxyGraph("C1CCCCC1", anchor=[1, 3]))
g_2 = ProxyGroup("g", ProxyGraph("C1CCCCC1", anchor=[1, 3, 4]))

parser = Parser()
proxy1 = MolProxy(pattern, g_1)
proxy2 = MolProxy(pattern, g_2)

parent_graph = parser(pattern)
mol1 = next(proxy1)
mol2 = next(proxy2)

vis = GraphVisualizer(show_edge_labels=False)
fig, ax = plt.subplots(1, 3, dpi=200, figsize=(20, 3))
vis.plot(parent_graph, ax[0], title="parent")
vis.plot(mol1, ax[1], title="2 anchor nodes")
vis.plot(mol2, ax[2], title="3 anchor nodes")
plt.show()
_images/multiple_anchor_example.png

Another extension to the SMILES notation is the encoding of bond changes. This feature is required to model reaction mechanisms as ITS graph. Changing bonds are surrounded by <> (e.g. <1, 2> for the formation of a double bond from a single bond). The extended notation allows the automated generation of reaction examples with complete atom-to-atom maps. The following code snippet demonstrates the generation of a few Diels-Alder reactions. The diene and dienophile groups can of course be extended to increase varaity of the samples:

import random
import matplotlib.pyplot as plt
from fgutils.proxy import ProxyGroup, ProxyGraph, ReactionProxy
from fgutils.proxy_collection.common import common_groups
from fgutils.vis import plot_reaction, plot_its
from fgutils.chem.its import get_its


electron_donating_group = ProxyGroup(
    "electron_donating_group",
    ["{methyl}", "{ethyl}", "{propyl}", "{aryl}", "{amine}"],
)
electron_withdrawing_group = ProxyGroup(
    "electron_withdrawing_group",
    ["{alkohol}", "{ether}", "{aldehyde}", "{ester}", "{nitrile}"],
)
diene_group = ProxyGroup(
    "diene",
    ProxyGraph("C<2,1>C<1,2>C<2,1>C{electron_donating_group}", anchor=[0, 3]),
)
dienophile_group = ProxyGroup(
    "dienophile",
    ProxyGraph("C<2,1>C{electron_withdrawing_group}", anchor=[0, 1]),
)
groups = common_groups + [
    electron_donating_group,
    electron_withdrawing_group,
    diene_group,
    dienophile_group,
]

proxy = ReactionProxy("{diene}1<0,1>{dienophile}<0,1>1", groups)

n = 4
fig, ax = plt.subplots(n, 2, width_ratios=[2, 1], figsize=(20, n * 4))
for i, (g, h) in enumerate(random.sample(list(proxy), n)):
    plot_reaction(g, h, ax[i, 0], title="Reaction")
    plot_its(get_its(g, h), ax[i, 1], title="ITS Graph")
plt.tight_layout()
plt.show()
_images/diels_alder_example.png

This proxy can now generate Diels-Alder reaction samples. A few of the results are shown in the figure above. On the left side the reaction and on the right side the resulting ITS. The results are balanced and have complete atom-to-atom maps. The atom-to-atom maps are correct as long as the configuration makes sence in the chemical domain. Note that the synthesizability of the generated samples can not be guaranteed. It soley depends on what ProxyGroups and ProxyGraphs are configured. For a comprehensive Diels-Alder reaction proxy take a look at the DielsAlderProxy class and the section TODO. This class is also able to generate negative Diels-Alder reaction samples, i.e., reactions where a Diels-Alder graph transformation rule is theoretically applicable but the reaction will never happen in reality.

Note

The electron_donating_group and electron_withdrawing_group serve as a collection of other groups to simplify the notation. They consist of a single node with multiple labels. When iterating the next sample from the proxy the labeled nodes get replaced by the pattern from one of the groups. The group/label is chosen randomly with uniform distribution.

Warning

The call list(proxy) will generate all possible instantiations at once. Depending on the configuration this can take a long time to complete. If the core ProxyGroup graph sampling is not unique this can even result in an endless loop.