References

algorithm

fgutils.algorithm.subgraph.map_anchored_subgraph(graph: Graph, anchor: int, subgraph: Graph, subgraph_anchor: int, mapper: PermutationMapper) tuple[bool, list[tuple[int, int]], tuple[set[int], set[int]]]

Map an anchored subgraph into an anchored parent graph. The anchor in the subgraph and the anchor in the parent graph is the first fixed mapping. The subgraph is aligned to the parent graph based on this anchor. This function will only find one alignment and not all possible alignments. It will return once a solution is found.

Parameters:
  • graph – The paraent graph in which to align the subgraph.

  • anchor – The node index for the anchor in the parent graph.

  • subgraph – The subgraph to align in the parent.

  • subgraph_anchor – The node index for the anchor in the subgraph.

  • mapper – A PermutationMapper instance. The permutation mapper specifies which nodes in the subgraph can align with nodes in the parent.

Returns:

The function returns a 3-tuple of the form (is_valid, mapping, visited_nodes). The first entry is_valid is a boolean indicating if a mapping was found. The second entry mapping is a list of integer tuples. Each integer tuple encodes one node mapping where the first element is the node index in the parent graph and the second entry is the node index in the subgraph. The third element is a tuple of sets with node indices for the visited nodes during graph alignment in the subgraph and parent graph, respectively.

fgutils.algorithm.subgraph.map_subgraph(graph: Graph, anchor: int, subgraph: Graph, mapper: PermutationMapper, subgraph_anchor: None | int = None) list[tuple[bool, list[tuple[int, int]]]]

Finds an anchored map of a subgraph in a parent graph.

Parameters:
  • anchor – The node index in the parent graph where the subgraph is anchored.

  • subgraph – The subgraph to align at the anchor node.

  • mapper – A PermutationMapper instance. The permutation mapper specifies which nodes in the subgraph can align with nodes in the parent.

  • subgraph_anchor – (Optional) A node index in the subgraph for a fixed anchor. If provided this is the first fixed mapping between subgraph and parent.

Returns:

Returns a list of tuples: [(is_valid, mapping)]. Each list entry corresponds to one subgraph mapping. The first element in the tuple is a boolean indicating if a mapping was found. The second element is the mapping. The mapping is a list of integer 2-tuples where the first element is the node index in the subgraph and the second element is the node index in the parent graph.

fgutils.algorithm.subgraph_enumeration.node_induced_connected_subgraphs(G, anchor, DAG=None)

Iterates over all node induced anchored connected subgraphs. The generation of new subgraphs has linear delay.

Parameters:
  • G – The networkx.Graph to get connected subgraphs from.

  • anchor – The node index to use as anchor.

  • DAG – (optional) An empty networkx.DiGraph where the extension DAG is generated.

Returns:

Returns a generator to get all connected induced subgraphs. The return type of a single iteration is a list of node indices that form a connected induced subgraph.

data

class fgutils.data.loader.CSVLoader(file, columns: str | list[str], delimiter=',')

The base class for loading CSV files. This class is a generator and can directly be used in a for loop. The returned column values are in the order in which they were specified in the columns argument.

Parameters:
  • file – File to the CSV file.

  • columns – Single column or list of columns to read from CSV file.

  • delimiter – (optional) The CSV delimiter. Default: “,”

class fgutils.data.loader.CSVReactionLoader(file, reaction_col, id_col=None, mode='GH', delimiter=',')

Load SMILES reactions from CSV file.

Parameters:
  • file – The path to the CSV file.

  • reaction_col – The column where to find the SMILES reaction.

  • id_col – (optional) An id column. Default: None

  • mode

    (optional) Defines how reactions and molecules are loaded. Available modes are:

    • "GH": Loads the reactant and product molecules as graphs. The reaction is returned in the form G → H.

    • "SMILES": Loads the reaction as SMILES.

    • "ITS": Loads the reaction as ITS graph. The SMILES must have annotated atom-atom maps.

fgconfig

class fgutils.fgconfig.FGConfig(name: str | None = None, pattern: str | None = None, parser: Parser | None = None, group_atoms: list[int] | None = None, anti_pattern: str | list[str] = [], depth: int | None = None, len_exclude_nodes: list[str] = ['R'])

Functional group configuration class.

Parameters:
  • name – The name of the functional gruop.

  • pattern – The structural description of the functional group.

  • parser – (optional) A parser to use to convert the pattern into a structure.

  • group_atoms – (optional) A list of indices indicating with nodes in the pattern belong to the functional group. A pattern might have some wildcard nodes attached that are required to match but do not belong to the group. (Default = all nodes)

  • anti_pattern – (optional) A list of anti patterns that must not be matched. (Default = None)

  • depth – (optional) The maximal depth to check the patterns. (Default = max(pattern, anti_pattern)

  • len_exclude_nodes – (optional) Node types that should be excluded in the pattern length. (Default = [“R”] - wildcard pattern)

property pattern_len: int

The number of nodes of the functional group structure. Nodes specified in len_exclude_nodes are not included.

class fgutils.fgconfig.FGConfigProvider(config: FGConfig | list[dict] | list[FGConfig] | None = None, mapper: PermutationMapper | None = None)

Provider for functional group configs.

Parameters:
  • config – A FGConfig object or a list of config objects. The configurations can also be passed as dictionaries.

  • mapper – (optional) A PermutationMapper to use.

get_by_name(name: str) FGConfig

Get the functional group config by name.

Returns:

Returns the FGConfig instance.

get_tree() list[FGTreeNode]

Get the functional groups hirachically organized in a tree. Functional groups are ordered based on their structure. A group is another groups child if its structure is more specific, i.e., the parent structure is a subgraph of the child. A child can have multiple parents and a parent can have multiple childs.

Returns:

Returns the list of root groups.

its

class fgutils.its.ITS(graph: Graph)

Imaginary Transition State graph class. Superposition graph of reactants and products in a chemical reaction.

Parameters:

graph – The raw ITS graph.

classmethod from_smiles(smiles: str)

Construct an ITS graph from an atom-atom mapped reaction smiles.

Parameters:

smiles – An atom-atom mapped reaction smiles.

Returns:

Returns the ITS graph of the reaction.

prune(radius=1, insert_hydrogens=True)

Prune the ITS graph to its reaction center and some context. The context is defined by a radius. Radius 0 gives only the reaction center.

Parameters:
  • radius – The size of the context around the reaction center. (Default: 0)

  • insert_hydrogens – If true, removed nodes will be replaced by hydrogen atoms if were adjacent to kept nodes. This ensures that the result is still a valid molecular graph. (Default: True)

remove_reagents()

Remove all reagents from the ITS graph. These are all compounds that are not connected to the reaction center.

Returns:

Returns the new ITS graph.

split() tuple[Graph, Graph]

Split the ITS graph into reactant graph G and product graph H.

Returns:

Returns G and H as tuple.

standardize()

Convert ITS graph into standard format and fix all type issues.

to_smiles(ignore_aam=False, implicit_h=False) str

Convert the ITS graph into a reaction smiles.

Parameters:
  • ignore_aam – If set to True the returned SMILES has no atom-atom map.

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

Returns:

Returns the reaction smiles.

fgutils.its.get_its(G: Graph, H: Graph) Graph

Get the ITS graph of reaction G → H. G and H must be molecular graphs with node labels ‘aam’ and ‘symbol’ and bond label ‘bond’.

Parameters:
  • G – Reactant molecular graph.

  • H – Product molecular graph.

Returns:

Returns the ITS graph.

fgutils.its.get_rc(ITS: Graph) Graph

Get the reaction center (RC) graph from an ITS graph.

Parameters:
  • ITS – The ITS graph to get the RC from.

  • symbol_key – (optional) The node label that encodes atom symbols on the molecular graph.

  • bond_key – (optional) The edge label that encodes bond order on the molecular graph.

Returns:

Returns the reaction center (RC) graph.

fgutils.its.prune_its_to_rc(its, radius=0, insert_hydrogens=True)

Prune an ITS graph to its reaction center and some context. The context is defined by a radius. Radius 0 gives only the reaction center.

Parameters:
  • its – The ITS graph to prune.

  • radius – The size of the context around the reaction center. (Default: 0)

  • insert_hydrogens – If true, removed nodes will be replaced by hydrogen atoms if were adjacent to kept nodes. This ensures that the result is still a valid molecular graph. (Default: True)

Returns:

Returns the pruned ITS graph.

fgutils.its.remove_reagents(its)

Remove all reagents from the ITS graph. These are all compounds that are not connected to the reaction center.

Returns:

Returns the new ITS graph.

fgutils.its.split_its(graph: Graph) tuple[Graph, Graph]

Split an ITS graph into reactant graph G and product graph H. Required labels on the ITS graph are BOND_KEY.

Parameters:

graph – ITS graph to split up.

Returns:

Tuple of two graphs (G, H).

parse

class fgutils.parse.Parser(use_multigraph=False, init_aam=False, verbose=False)

Class to convert a SMILES like graph description into a NetworkX graph.

Example for parsing acetic acid:

>>> parser = Parser()
>>> g = parser("CC(O)=O")
Graph with 4 nodes and 3 edges
Parameters:
  • use_multigraph – Flag to specify if the resulting graph object should be of type networkx.MultiGraph or networkx.Graph. The difference is that a MultiGraph can have more than one edge between two nodes. For parsing molecule like graphs this is not necessary because bond types are encoded as edge labels. (Default = False)

  • verbose – Flag to print information during parsing. (Default = False)

parse(pattern: str, idx_offset: int = 0)

Method to parse a SMILES like graph pattern.

Parameters:
  • pattern – The pattern to convert into a graph. The pattern is a tree-like description of the graph. It is strongly oriented at the SMILES notation.

  • idx_offset – The index offset argument provides the starting value for the consecutive node numbering. (Default = 0)

Returns:

Returns the converted graph object.

permutation

class fgutils.permutation.MappingMatrix(pattern_symbols: list[str], structure_symbols: list[str], mapper: PermutationMapper)

A class for fast label mapping evaluation according to a PermutationMapper.

Parameters:
  • pattern_symbols – A complete list of all possible pattern labels.

  • structure_symbols – A complete list of all possible symbol labels.

  • mapper – A PermutationMapper instance. The matrix is constructed based on these mappings.

is_mapping(pattern_symbol: str, structure_symbol: str) bool

Check if two symbols can match, i.e., two nodes with respective label can be mapped.

Parameters:
  • pattern_symbol – The pattern symbol to check if it matches the structure symbol.

  • structure_symbol – The structure symbol to check if it matches the pattern symbol.

Returns:

True if the symbols can be mapped and False otherwise.

min_mapping_symbol(pattern_symbols: list[str], structure_symbols: list[str]) tuple[str, str] | None

This function finds the symbol with the lowest permutation count. Take [A, A, B] and [A, A, A, B, B] as pattern and symbol lists for example. The function will return B, because there are only 2 possible mappings whereas for A there are 6. This is useful to find suitable anchor nodes in pattern matching. It is 3x as fast to anchor B and check the alighment then anchoring A. The min mapping excludes zero mappings. If there is no mapping possible at all None is returned.

Parameters:
  • pattern_symbols – The list of pattern symbols.

  • structure_symbols – The list of structure symbols.

Returns:

The tuple of symbols with the lowest permutation count. The first entry is the pattern symbol and the second entry is the structure symbol. If no mapping is found at all None is returned.

class fgutils.permutation.PermutationMapper(wildcard=None, ignore_case=False, can_map_to_nothing=[])

The Permutation Mapper class specifies how nodes can match. The wildcard is specified for the pattern characters and not the structure characters. This means that the wildcard will only match in the direction Pattern -> Structure. The R as wildcard and C as normal label for example. R -> C will match but C -> R will not.

Parameters:
  • wildcard – (optional) The wildcard is a specific pattern node label that can match to any other node. In the chemical notation this would be the R group, e.g., RC(=O)OH for an acid.

  • ignore_case – (optional) Flag if the matching of node labels is case insensitive. Default: False

  • can_map_to_nothing – (optional) A list of characters that can map to nothing. A node with a label in this list does not need a mapping at all. For example optional nodes that can or can not have a mapping. This is different to the wildcard in the sense that a wildcard must map to node (with an arbitrary label though).

proxy

class fgutils.proxy.GraphSampler(unique=False)

Base class for sampling ProxyGraphs.

Parameters:

unique – If set to true each graph is only returned once. (Default = False)

sample(graphs: list[ProxyGraph], group_name=None) list[ProxyGraph] | None

Method to retrive a new sample from a list of graphs. If unique is set to true the first graph that was not yet returned is selected. None is returned if all graphs have been selected. If unique is false all graphs are returned each time the function is called.

Parameters:
  • graphs – A list of graphs to sample from.

  • group_name – (optional) The group name is an optional argument. It’s not necessary to specify it if it’s not needed.

Returns:

Returns one or more graphs from the list or None if sampling should stop.

class fgutils.proxy.MolProxy(core: str | list[str] | ProxyGroup, groups: ProxyGroup | list[ProxyGroup] | dict[str, ProxyGroup], parser: Parser | None = None)

Proxy to generate molecules.

Parameters:
  • core – A pattern string or ProxyGroup representing the core graph. For example a specific functional group.

  • groups – A list of groups to expand the core graph with.

  • parser – (optional) The parser to convert patterns into structures.

class fgutils.proxy.Proxy(core: str | list[str] | ProxyGroup, groups: ProxyGroup | list[ProxyGroup] | dict[str, ProxyGroup], enable_aam: bool = True, parser: Parser | None = None)

Proxy is a generator class. It extends a specific core graph by a set of subgraphs (groups). This class implements the iterator interface so it can be used in a for loop to generate samples:

>>> proxy = Proxy("C{g}", ProxyGroup("g", ["C", "O", "N"]))
>>> for graph in proxy:
>>>    print([d["symbol"] for n, d in graph.nodes(data=True)])
['C', 'C']
['C', 'O']
['C', 'N']
Parameters:
  • core – A pattern string or ProxyGroup representing the core graph. For example a specific functional group or a reaction center.

  • groups – A list of groups to expand the core graph with.

  • enable_aam – Flag to specify if the ‘aam’ label is set in the result graph. (Default = True)

  • parser – (optional) The parser to convert patterns into structures.

static from_dict(config: dict) Proxy

Load Proxy from dict config. The expected JSON format is:

{
    "core": <pattern>,
    "groups": <groups> # for expected format take a
    # look at ProxyGroup.from_dict()
}
Parameters:

config – The configuration dictionary. E.g. loaded from JSON

Returns:

The instantiated Proxy.

get_next()

Get the next sample.

Returns:

A generated graph.

property groups: list[ProxyGroup]

The list of ProxyGroups. This can be set with values of the following type: ProxyGroup | list[ProxyGroup] | dict[str, ProxyGroup]

class fgutils.proxy.ProxyGraph(pattern: str, anchor: list[int] = [0], name: str | None = None, **kwargs)

ProxyGraph is essentially a subgraph used to expand molecules. If the node that is replaced by the pattern has more edges than the pattern has anchors, the last anchor will be used multiple times. In the default case the first node in the pattern will connect to all the neighboring nodes of the replaced node.

Parameters:
  • pattern – String representation of the graph.

  • anchor – A list of indices in the pattern that are used to connect to the parent graph. (Default = [0])

  • name – A name for the graph. This is just for visualization and debugging.

  • kwargs – Keyword arguments are used as graph properties. Specify whatever you need.

class fgutils.proxy.ProxyGroup(name, graphs: str | list[str] | ProxyGraph | list[ProxyGraph], sampler=None, unique=False)

ProxyGroup is a collection of patterns that can be replaced for a labeled node in a graph. The node label is the respective group name where one of the patterns will be replaced

Parameters:
  • name – The name of the group.

  • graphs – (optional) A list of subgraphs or a list of graph descriptions. The patterns are converted to ProxyGraphs with one anchor at index 0. Use ProxyGraph objects if you need more control over how subgraphs are instantiated.

  • sampler – (optional) An object or a function to retrive individual graphs from the list. The expected function interface is: func(list[ProxyGraph]) -> list[ProxyGraph]. Implement the __call__ method if you use a class. The function can have an optional keyword argument group_name.

  • unique – Argument to specify if graphs can be returned multiple times. This only takes effect if sampler is not set. (Default = False)

static from_dict(config: dict) dict[str, ProxyGroup]

Load ProxyGroups from dict config. The expected JSON format is:

{
    "group_name_1": <pattern>,
    "group_name_2": [<pattern>],
    "group_name_3": <group_config> # for expected format take a
    # look at ProxyGroup.from_dict_single()
}
Parameters:

config – The configuration dictionary. E.g. loaded from JSON file.

Returns:

Returns a mapping dictionary of ProxyGroups where the key is the group name and the value is the ProxyGroup object.

static from_dict_single(name: str, config: dict) ProxyGroup

Load a single ProxyGroup object from configuration. The expected JSON format is one of the following:

{
    # short form
    "graphs": <pattern>,
    "graphs": [<pattern>],
    "graphs": {"pattern": <pattern>, "anchor": list[int]},
    # complete config
    "graphs": [{
            "pattern": <pattern>,
            "anchor": list[int],
            <any_key>: <any_value>
        }],

    # <pattern> is the SMILES-like graph description of type str
}

It’s possible to specify additional properties on graphs. These can be used for example by custom samplers to implement some logic or dependencies between groups.

Parameters:
  • name – The name of the ProxyGroup.

  • config – The configuration dictionary. E.g. loaded from JSON file.

Returns:

The instantiated ProxyGroup.

property graphs: list[ProxyGraph]

The list of ProxyGraphs. This can be set with values of the following type: str | list[str] | ProxyGraph | list[ProxyGraph]

sample_graphs() list[ProxyGraph] | None

Sample graphs. This method uses the sampler to select a list of graphs.

Returns:

A list of ProxyGraph objects or None if there is nothing more to sample.

class fgutils.proxy.ReactionProxy(core: str | list[str] | ProxyGroup, groups: ProxyGroup | list[ProxyGroup] | dict[str, ProxyGroup], enable_aam: bool = True, parser: Parser | None = None)

Proxy to generate reactions.

Parameters:
  • core – A pattern string or ProxyGroup representing the core graph. For example a specific functional group or a reaction center.

  • groups – A list of groups to expand the core graph with.

  • enable_aam – Flag to specify if the ‘aam’ label is set in the result graph. (Default = True)

  • parser – (optional) The parser to convert patterns into structures.

get_next()

Generate a new reaction sample. The reaction proxy returns two graphs G and H. G is the reactant graph and H is the product graph.

Returns:

A tuple of two graphs (G, H) representing the reaction G → H.

fgutils.proxy.build_graphs(core: ProxyGraph, groups: dict[str, ProxyGroup], parser: Parser)

Replace labeled nodes in the core graph with groups. For each labeled node the respective group is used to replace the node by the specified subgraphs.

Parameters:
  • core – The parent graph with labeled nodes.

  • groups – A list of groups to replace the labeled nodes in the core graph with. The dictionary keys must be the group names.

  • parser – The parser that is used to convert graph patterns into graphs.

Returns:

Returns a list of graphs with replaced nodes.

fgutils.proxy.build_group_tree(core: ProxyGroup, groups: ProxyGroup | list[ProxyGroup], parser=None) Graph

Constructs a tree of all possible graph instantiations. The number of leave nodes in this tree is the number of possible samples.

Parameters:
  • core – The ProxyGroup that serves as core group.

  • groups – A list of groups to replace labeled nodes.

  • parser – (optional) A parser to use for conversion from pattern to structures.

Returns:

Returns the construction tree as nx.Graph object.

fgutils.proxy.replace_next_node(graph, groups: dict[str, ProxyGroup], parser: Parser)

Replace the next labeled node in graph with the respective group.

Parameters:
  • graph – The graph where a node should be replace by a subgraph.

  • groups – A mapping dictionary of groups to replace the labeled nodes in the parent with. The dictionary keys must be the group name.

  • parser – The parser to use to convert the pattern into structure.

Returns:

Returns a list of new graphs where the first labeled node is replaced. None is returned if no replaceable labeled node is left.

fgutils.proxy.replace_node(graph, node, replacement_graph: ProxyGraph, parser: Parser)

Replace node in graph with replacement_graph converted by the parser.

Parameters:
  • graph – The graph where a node should be replace by a subgraph.

  • node – The node to replace in graph.

  • replacement_graph – The subgraph that is inserted instead of the node.

  • parser – The parser to convert the pattern into structure.

Returns:

Returns a new graph with node replace by replacement_graph.

proxy_collection

class fgutils.proxy_collection.diels_alder_proxy.DielsAlderProxy(enable_aam=True, neg_sample=False)

A proxy for the generation of Diels-Alder reaction samples and counter-samples. The proxy returns two graphs G and H as tuple. G is the reactant graph and H is the product graph. For a comprehensive description of the proxy configuration read section Diels-Alder Reaction Proxy.

Parameters:
  • enable_aam – Flag to specify if the aam label is set in the result graphs. (Default = True)

  • neg_sample – If set to true the proxy will exclusively generate negative samples, i.e., reactions where a Diels-Alder graph transformation rule is theoretically applicable but the reaction will never happen in reality. (Default = False)

query

class fgutils.query.FGQuery(mapper: PermutationMapper | None = None, config: FGConfig | list[FGConfig] | FGConfigProvider | None = None, require_implicit_hydrogen: bool = True)

Class to get functional groups from a molecule.

Parameters:
  • mapper – (optional) The permutation mapper to use.

  • config_provider – (optional) A functional group config provider. If not specified the default functional group collection will be used.

  • require_implicit_hydrogen – Flag to specify if implicit hydrogens are required for the query. This usually depends on the provided FGConfigs. If the configured functional group patterns do not require hydrogens this can be set to false. (Default = True)

get(value) list[tuple[str, list[int]]]

Get all functional groups from a molecule. The query returns two functional groups for acetylsalicylic acid:

>>> smiles = "O=C(C)Oc1ccccc1C(=O)O" # acetylsalicylic acid
>>> query = FGQuery()
>>> query.get(smiles)
[('ester', [0, 1, 3]), ('carboxylic_acid', [10, 11, 12])]
Parameters:

value – This is either a graph or SMILES as string.

Returns:

Returns a list of tuples. The first element in a tuple is the functional group name and the second element is a list of node indices that belong to this functional group (functional_group_name, [idx_1, idx_2, ...]).

rdkit

fgutils.rdkit.get_mol_coords(g: Graph, scale=1) dict[int, tuple[float, float]]

Try to get a molecule like coordinate representation of the graph.

Parameters:
  • g – The graph to get the coordinates for.

  • scale – (optional) A scale for the coordinates. (Default: 1)

Returns:

Returns a dict of coordinates. The keys are the node indices and the values are the 2 coordinates x and y.

fgutils.rdkit.graph_to_mol(g: Graph, ignore_aam=False) Mol

Convert a graph to an RDKit molecule.

Parameters:
  • g – The molecule as node and edge labeled graph. The graph requires SYMBOL node labels and BOND_KEY edge labels. The node label AAM_KEY is optional to annotate the molecule with an atom-atom map.

  • ignore_aam – If set to true the atom-atom map will not be initialized.

Returns:

Returns the graph as RDKit molecule.

fgutils.rdkit.graph_to_smiles(g: Graph, implicit_h=False, ignore_aam=False) str

Convert a molecular graph into a SMILES string. This function uses RDKit for SMILES generation.

Parameters:
  • g – Graph to convert to SMILES representation.

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

  • ignore_aam – If set to True the returned SMILES has no atom-atom map.

Returns:

Returns the SMILES.

fgutils.rdkit.mol_smiles_to_graph(smiles: str, implicit_h=False) Graph

Converts a SMILES to a graph.

Parameters:
  • smiles – SMILES to convert to graph(s).

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

Returns:

A node and edge labeled molecular graph.

fgutils.rdkit.mol_to_graph(mol: Mol, implicit_h=False) Graph

Convert an RDKit molecule to a graph.

Parameters:
  • mol – An RDKit molecule.

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

Returns:

The molecule as node and edge labeled graph.

fgutils.rdkit.reaction_smiles_to_graph(smiles: str, implicit_h=False) tuple[Graph, Graph]

Converts a reaction SMILES to the graph representation G → H, where G is the reactant graph and H is the product graph.

Parameters:
  • smiles – Reaction SMILES to convert to graph tuple.

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

Returns:

Returns the graphs G and H as tuple.

fgutils.rdkit.smiles_to_graph(smiles: str, implicit_h=False) Graph | tuple[Graph, Graph]

Converts a SMILES to a graph. If the SMILES encodes a reaction a graph tuple is returned.

Parameters:
  • smiles – SMILES to convert to graph(s).

  • implicit_h – Flag to add all Hydrogen atoms. (Default: False)

Returns:

A molecular graph or graph tuple if SMILES is a reaction SMILES.

synthesis

class fgutils.synthesis.rule_application.DPORule(rule_id: str, left: Graph, context: Graph, right: Graph)

Double Pushout Rule class.

Parameters:
  • rule_id – The id of the rule. This can also be seen as the rule name.

  • left – The left graph L in the DPO rule L C R.

  • context – The context graph C in the DPO rule L C R.

  • right – The right graph R in the DPO rule L C R.

to_rc_graph() Graph

Method to convert the DPO rule representation to a reaction center graph, i.e., the superposition of L and R on C. The reaction center graph is a node and edge labeled graph. The BOND_KEY edge label encodes the bond change in the reaction.

Returns:

Returns the reaction center graph.

class fgutils.synthesis.rule_application.ReactionRule(rc_graph: Graph, name='unnamed rule')

Class describing a reaction rule.

Parameters:
  • rc_graph – A reaction center graph.

  • name – (optional) The name of the reaction rule.

static from_gml(src: str)

Load reaction rule from a GML file. Supported format is the MØD GML rule format.

Parameters:

data – This can be either a file path or a GML string.

Returns:

Returns an instance of the ReactionRule class.

fgutils.synthesis.rule_application.apply_rule(g: Graph, rule: ReactionRule, n: int | None = None, unique=True, connected_only=False) list[ITS]

Apply a reaction rule to a reactant graph G. The function returns the generated ITS graphs.

Parameters:
  • g – The reactant graph. This can be a disconnected graph for multiple reactant molecules.

  • rule – The reaction rule to apply to the reactant graph.

  • n – (optional) Limits the maximum number of solutions. If n is set to an integer it will return once n solutions are found. (Default: None)

  • unique – (optional) Flag to specify if isomorphic solutions should be returned as one solution. Isomorphism is checked with 3 iterations WL.

  • connected_only – (optional) Flag to specify if the ITS graph must be connected. Setting this to true means that all reactants in g must take part in the reaction.

Returns:

Returns a list of ITS graphs.

fgutils.synthesis.rule_application.parse_gml_dpo_rule(lines: list[str]) DPORule

Parse a GML string into a DPO rule object.

Parameters:

lines – The lines of the GML string.

Returns:

Returns the parsed DPO rule.

torch

class fgutils.torch.ITSDataset.ITSDataset(data: list[Graph | ITS], y: list[int] | None = None, ids: list[int] | None = None, node_feature_transform=None, edge_feature_transform=None, pre_transform=None, transform=None)

Instantiates a torch.Dataset from a list of ITS graphs.

Parameters:
  • data – The list of ITS graphs to create the dataset from.

  • y – (optional) The list of ITS labels. These are the prediction targets. This list must be of the same length as the ITS graphs.

  • ids – (optional) A list of IDs. The number of IDs must be equal to the number of graphs. Default: None

  • node_feature_transform – (optional) A transform function to convert node attributes to a node feature vector. Default: None

  • edge_feature_transform – (optional) A transform function to convert edge attributes to an edge feature vector. Default: None

  • pre_transform – (optional) A callback function to transform the Tensor representation of the ITS graph before it is stored. This function is executed only once on initialization.

  • transform – (optional) A callback function to transform the Tensor representation of the ITS graph when accessed. This function is executed on every access.

class fgutils.torch.TUDataset.TUDataset(root, name, transform=None, pre_transform=None, pre_filter=None, use_node_attr: bool = True, use_edge_attr: bool = True)

Class for reading datasets in the [TUDataset](https://chrsmrrs.github.io/datasets/docs/format/) file format. This implementation is copied and modified from [torch_geometric.datasets.TUDataset](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.TUDataset.html) to also support instance IDs for easier identification of samples. IDs are loaded from an additional ids.txt file where the i-th line is the id of the i-th graph.

Parameters:
  • root – Root directory where the dataset is saved.

  • name – The name of the dataset.

  • transform – A function/transform that takes in a data object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform – A function/transform that takes in a data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter – A function that takes in an data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • force_reload – Whether to re-process the dataset. (default: False)

  • use_node_attr – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

download()

Downloads the dataset to the self.raw_dir folder.

process() None

Processes the dataset to the self.processed_dir folder.

property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

fgutils.torch.graph.edge_induced_subgraph(graph, edges)

Get an edge induced subgraph form the torch graph representation. Keep in mind that the returned node indices (e.g. in the edge_index tensor) are not the same as in the input graph. This is because the node indices are implicit in the row number of the node features and hence a relabeling is necessary.

Parameters:
  • graph – The graph from which to get a edge induced subgraph.

  • edges – A list of edge indices for the induced subgraph.

Returns:

Node induced subgraph.

fgutils.torch.graph.node_induced_subgraph(graph, nodes)

Get a node induced subgraph form the torch graph representation. Keep in mind that the returned node indices (e.g. in the edge_index tensor) are not the same as in the input graph. This is because the node indices are implicit in the row number of the node features and hence a relabeling is necessary.

Parameters:
  • graph – The graph from which to get a node induced subgraph.

  • nodes – A list of node indices for the induced subgraph.

Returns:

Node induced subgraph.

fgutils.torch.utils.get_adjacency_matrix(sample: Data) Tensor

Get the adjacency matrix from a torch data sample. The sample needs property x and edge_index.

Parameters:

sample – The torch data sample.

Returns:

Returns the adjacency matrix as tensor.

fgutils.torch.utils.its_from_torch(data, node_feature_transform: Callable[[Tensor], str] | None = None)

Convert an ITS in PyTorch data format back into the NetworkX ITS format.

Parameters:
  • data – The PyTorch data to convert. This can be a single sample or a batch of samples.

  • node_feature_transform – (optional) A transform function to convert the node features back into the atom symbol. The function gets the node feature tensor as argument and is expected to return the atom symbol for that node.

Returns:

The sample ITS graph or a list of ITS graph if data is a batch.

fgutils.torch.utils.its_to_torch(its: Graph | ITS | list[Graph | ITS], node_feature_transform=None, edge_feature_transform=None) Data | Batch

Convert an ITS graph or a list of ITS graphs to the PyTorch data format.

Parameters:
  • its – The ITS graph to convert. If it is a single ITS graph the return type will be Data. For a list of ITS graphs a Batch is returned.

  • node_feature_transform – (optional) A transform function to convert the node annotations into a node feature vector. The function gets the node dict as argument and is expected to return a torch.Tensor node feature vector.

  • edge_feature_transform – (optional) A transform function to convert the edge annotations into an edge feature vector. The function gets the edge dict as argument and is expected to return a torch.Tensor edge feature vector.

Returns:

The sample ITS graph or a list of ITS graph if data is a batch.

fgutils.torch.utils.prune(sample: Data, start_nodes: Tensor, radius=1) Data

Prune the graph represented as torch data sample. This function removes all nodes that are radius farther away from a set of start_nodes.

Parameters:
  • sample – The torch data sample.

  • start_nodex – A list of start node indices.

  • radius – (optional) The maximum path length of nodes to keep around the start nodes. Every node with a longer shortes path to any start node is removed from the graph. Default: 1

Returns:

Returns the pruned graph as torch data sample.

fgutils.torch.utils.prune_rc(sample, radius=1)

Prune the torch data ITS graph represented. This function removes all nodes that are radius farther away from the reaction center.

Parameters:
  • sample – The torch data sample.

  • radius – (optional) The maximum path length of nodes to keep around the reaction center. Every node with a longer shortes path to any node in the reaction center is removed from the graph. Default: 1

Returns:

Returns the pruned graph as torch data sample.

utils

fgutils.utils.add_implicit_hydrogens(graph: Graph, inplace=True) Graph

Add implicit hydrogens as dedicated nodes to graph.

Parameters:
  • graph – The mol graph to add hydrogens to.

  • inplace – (optional) Inplace will change the input instance. When set to False the returned instance is a copy.

Returns:

Returns the molecular graph with implicit hydrogens or a copy if inplace is False.

fgutils.utils.complete_aam(graph: Graph, offset: None | int | str = None)

Complete the atom-atom map on a graph based on node indices. This function does not override an existing atom-atom map. It extends the existing atom-atom map to all nodes. The numbering of the new nodes starts at 1 or offset and skipps all existing mapping numbers.

Parameters:
  • graph – The graph where to complete the atom-atom map.

  • offset – (optional) The mapping offset. Offset is the first value used for numbering. If set to "min" the offset will be set to the lowest existing number.

fgutils.utils.get_unreachable_nodes(g, start_nodes, radius=1)

Get the list of nodes that can not be reached from start nodes within a given distance.

Parameters:
  • g – The graph for which to get unreachable nodes.

  • start_nodes – A list of nodes to start from. Start nodes count as radius 0. For a reachable node there must exist a path from a start node with at most radius number of steps.

  • radius – The maximum number of hops from start_nodes. (Default: 1)

Returns:

Returns the list of unreachable nodes.

fgutils.utils.initialize_aam(graph: Graph, offset=1)

Initialize atom-atom map on a graph based on node indices.

Parameters:
  • graph – The graph where to initialize the atom-atom map.

  • offset – (optional) The mapping offset. Offset is the first value used for numbering.

fgutils.utils.mol_equal(candidate: Graph, target: Graph, compare_mode='both', iterations=3, ignore_hydrogens=False, min_atoms=0) bool

Compare a candidate molecule to a specific target molecule. The result is true if the candidate matches the target. The comparison is done with 3 iterations WL.

Parameters:
  • candidate – A candidate molecule.

  • target – A target molecule to compare to.

  • compare_mode – (optional) The compare mode: “target”, “candidate”, “largest_target”, “largest_candidate”, or “both”. For “target” its sufficient that the candidate matches all components in the target, for “candidate” its sufficient that the target matches all candidate components and for “both” a complete match must be found. For the two largest_* compare modes only the respective largest component needs to find a match. This is helpful to ignore additional small compounds. (Default: both)

  • iterations – (optional) The number of Weisfeiler-Leman iterations. (Default: 3)

  • ignore_hydrogens – (optional) Flag to ignore hydrogens in the comparison. Molecules with unequal hydrogen configurations will count as identical. (Default: False)

  • min_atoms – (optional) The minimum number of atoms a component needs to have to be considered in the comparison. This can be useful to exlude small molecules like water. (Default: 0)

Returns:

True if the candidate matches the target and else otherwise.

fgutils.utils.remove_implicit_hydrogens(graph: Graph, inplace=True) Graph

Remove implicit hydrogens as dedicated nodes to graph.

Parameters:
  • graph – The mol graph to remove hydrogens to.

  • inplace – (optional) Inplace will change the input instance. When set to False the returned instance is a copy.

Returns:

Returns the molecular graph with implicit hydrogens removed or a copy if inplace is False.

fgutils.utils.split_its(graph: Graph) tuple[Graph, Graph]

Warning

Deprecated since v0.1.6. This function was moved to module fgutils.its. It will be removed from fgutils.utils in the future.

Split an ITS graph into reactant graph G and product graph H. Required labels on the ITS graph are BOND_KEY.

Parameters:

graph – ITS graph to split up.

Returns:

Tuple of two graphs (G, H).

vis