ProtoCaller.Wrappers.rdkitwrapper module

This is an extensive wrapper around RDKit’s functionality. Some of the featured functions are high-level wrappers around RDKit’s file parsers. Most of the functionality in this module is centered around refining RDKit’s maximum common substructure (MCS) algorithm and using a custom alignment algorithm. Since this is the most challenging part of the code, the user is strongly encouraged to always validate the resulting mappings and alignments.

ProtoCaller.Wrappers.rdkitwrapper.AssignBondOrdersFromTemplate(ref, mol)

A modification of rdkit.Chem.AllChem.AssignBondOrdersFromTemplate()

Parameters
  • ref (rdkit.Chem.rdchem.Mol) – The template molecule.

  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

Returns

mol – The new molecule.

Return type

rdkit.Chem.rdchem.Mol

ProtoCaller.Wrappers.rdkitwrapper.alignTwoMolecules(ref, mol, n_min=- 1, two_way_matching=True, mcs=None, minimise_score=False, mcs_parameters=None, minimiser_parameters=None)

Aligns two molecules based on an input MCS. The algorithm uses atom freezing of the common core and force field minimisation of the rest. Additional minimisation using minimiseAlignmentScore() is then performed.

If there is a choice between several equally long MCS’s the ones with the highest atom-atom matches are first selected (e.g. C->C mapping trumps C->H mapping). After that the first structure with the lowest MSD is selected.

Parameters
  • ref (rdkit.Chem.rdchem.Mol) – The reference molecule.

  • mol (rdkit.Chem.rdchem.Mol) – The molecule to be aligned.

  • two_way_matching (bool) – Whether to treat ref and mol equally in terms of matching.

  • n_min (int) – Minimum number of force field minimisation iterations. -1 is no limit.

  • mcs ([tuple] or None) – The maximum common substucture. None means the one generated from getMCSMap.

  • minimise_score (bool) – Whether to minimise an additional score, which can be passed to the minimiser_parameters. For more information, look at the docstrings for minimiseAlignmentScore.

  • mcs_parameters (dict) – A dictionary of the parameters to be passed on to getMCSMap().

  • minimiser_parameters (dict) – A dictionary of the parameters to be passed on to minimiseAlignmentScore().

Returns

  • mol (rdkit.Chem.rdchem.Mol) – The aligned molecule.

  • mcs ([tuple]) – The maximum common substructure.

ProtoCaller.Wrappers.rdkitwrapper.getAlignmentScore(ref, mol, mcs=None, confId1=- 1, confId2=- 1)

Returns the alignment score between two molecules. The way this is done is by calculating all possible distances between atom i of ref and atom j of mol and computing the sum of their squares. All atoms from mol that are within 1 Angstrom of atom j contribute inverse square distances to the score in order to prevent unfavourable clashes.

Parameters
  • ref (rdkit.Chem.rdchem.Mol) – The reference molecule.

  • mol (rdkit.Chem.rdchem.Mol) – The molecule to be aligned.

  • mcs ([(tuple)] or None) – The maximum common substructure of the two molecules.

  • confId1 (int) – The conformer number of ref.

  • confId2 (int) – The conformer number of mol.

Returns

alignment_score – The alignment score of the two conformers.

Return type

float

ProtoCaller.Wrappers.rdkitwrapper.getEZStereochemistry(mol, carbonify=True, confId=- 1, extra_bonds=None)

Determines the stereochemistry of double bonds, esters and amides. Note that the corresponding E/Z labels might not correspond to the IUPAC E/Z labels. None corresponds to a symmetric bond.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • carbonify (bool) – Whether to ignore atom type when determining the stereochemistry.

  • confId (int) – The conformer ID to be used for stereochemistry determination.

  • extra_bonds ([(int, int)]) – Extra bonds to be treated as double.

Returns

bonds – The relevant double bonds and their labels.

Return type

dict(frozenset(int, int), str)

ProtoCaller.Wrappers.rdkitwrapper.getFixedMCS(ref, mol, match_ref, match_mol, break_recursively=True, valid_mcs=False, two_way_matching=True, keep_EZ=True, keep_stereo=True, **kwargs)

A helper function to getMCSMap which expands on RDKit’s strict MCS algorithm. The first addition is recursive bond breaking to find a connected MCS where aliphatic atoms can be mapped to ring atoms. The second addition is the detection of atoms with different stereochemistry and finding the biggest fragment that doesn’t contain them. This algorithm still doesn’t support the functionality of mapping two very similar atoms of different stereochemistry. This might be added in the future.

Parameters
  • ref (rdkit.Chem.Mol) – The reference molecule.

  • mol (rdkit.Chem.Mol) – The molecule to be aligned.

  • match_ref (list) – A list of indices corresponding to the MCS for the reference molecule.

  • match_mol (list) – A list of indices corresponding to the MCS for the molecule to be aligned.

  • break_recursively (bool) – Whether to use the recursive bond breaking algorithm to improve on RDKit’s functionality.

  • keep_EZ (bool) – Whether to only keep substructures that satisfy E/Z geometries.

  • keep_stereo (bool) – Whether to only keep substructures that satisfy R/S isomers.

  • valid_mcs (bool) – Whether the input MCS is ordered, unique and valid.

  • two_way_matching (bool) – This algorithm matches certain aliphatic mol atoms to certain ring ref atoms by default. This parameter determines if the opposite is also true. A value of True is good for matching Ligand B to ligand A and a value of False is good for matching Ligand A to the binding pocket ligand.

  • kwargs – Keyword arguments to be supplied to _matchAndReturnMatches()

Returns

matches – A set of frozensets corresponding to the largest unique MCS’s given the input.

Return type

{frozenset([tuple])}

ProtoCaller.Wrappers.rdkitwrapper.getMCSMap(ref, mol, atomCompare='any', bondCompare='any', **kwargs)

Generates the Maximum Common Substructure (MCS) mapping between two molecules. This algorithm calls the getFixedMCS() function which improves on RDKit’s default MCS algorithm. It does so recursively until it is certain that the MCS has definitely been found.

Parameters
  • ref (rdkit.Chem.rdchem.Mol) – The reference molecule.

  • mol (rdkit.Chem.rdchem.Mol) – The molecule to be aligned.

  • atomCompare (str) – One of “any” (matches any pairs of atoms) and “elements” (matches only the same atoms).

  • bondCompare (str) – One of “any” (matches any bonds) and “elements” (matches only the same bonds).

  • kwargs – Additional keyword arguments passed on to getFixedMCS() or to rdkit.Chem.MCS.FindMCS().

Returns

mcs – A list of lists of tuples corresponding to the atom index matches between the reference and the other molecule.

Return type

[[tuple]]]

ProtoCaller.Wrappers.rdkitwrapper.getMatchingAtomScore(mol1, mol2, matches)

Returns the matching atom score between two molecules. This is simply the number of atoms that are mapped onto the same element from the other molecule.

Parameters
  • mol1 (rdkit.Chem.rdchem.Mol) – The first molecule.

  • mol2 (rdkit.Chem.rdchem.Mol) – The second molecule.

  • matches ([tuple]) – The maximum common substructure matches.

Returns

matching_atoms – The matching atom score of the two molecules.

Return type

float

ProtoCaller.Wrappers.rdkitwrapper.minimiseAlignmentScore(ref, mol, mcs=None, confId1=-1, confId2=-1, minimisation_algorithm=scipy.optimize.minimize, scoring_algorithm=<function getAlignmentScore>, **kwargs)

Rotates all of the rotatable dihedrals outside of the MCS until a balance between good alignment to the reference and minimal clashing with other atoms in the same molecule is achieved.

Parameters
  • ref (rdkit.Chem.rdchem.Mol) – The reference molecule.

  • mol (rdkit.Chem.rdchem.Mol) – The molecule to be aligned.

  • mcs ([(tuple)] or None) – The maximum common substructure of the two molecules.

  • confId1 (int) – The conformer number of ref.

  • confId2 (int) – The conformer number of mol.

  • minimisation_algorithm (function) – A scipy algorithm to be used for minimisation. Default is regular local minimisation using scipy.optimize.minimize. Custom algorithms can also be used as long as they obey the same format as the ones in scipy.optimize.

  • kwargs – Keyword arguments to be supplied to the minimiser algorithm.

Returns

  • mol (rdkit.Chem.rdchem.Mol) – The aligned molecule.

  • alignment_score (float) – The alignment score after alignment obtained from getAlignmentScore().

ProtoCaller.Wrappers.rdkitwrapper.openAsRdkit(val, minimise=None, template=None, **kwargs)

A general wrapper which can convert a variety of representations for a molecule into an rdkit.Chem.rdchem.Mol object.

Parameters
  • val (str) – Input value - SMILES, InChI strings or a filename.

  • minimise (bool or None) – Whether to perform a GAFF minimisation using OpenBabel. None means minimisation for molecules initialised from strings and no minimisation for molecules initialised from files.

  • template (str) – Input value - SMILES, InChI strings or a filename for a template from which bonds will be assigned. Only used when needed.

  • kwargs – Keyword arguments to be passed to the more specialsied RDKit functions.

Returns

mol – The input string opened as an RDKit Mol object.

Return type

rdkit.Chem.rdchem.Mol

ProtoCaller.Wrappers.rdkitwrapper.openFileAsRdkit(filename, **kwargs)

Opens an input file and returns an RDKit molecule.

Parameters
  • filename – The name of the input file.

  • kwargs – Keyword arguments to be supplied to the more specialised RDKit functions.

Returns

mol – The file loaded as an RDKit Mol object.

Return type

rdkit.Chem.rdchem.Mol

ProtoCaller.Wrappers.rdkitwrapper.saveFromRdkit(mol, filename, **kwargs)

Saves an RDKit Mol object to a file.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • filename (str) – The name of the output file.

  • kwargs – Keyword arguments to be passed to more specialised RDKit functions.

Returns

filename – The absolute path to the written file.

Return type

str

ProtoCaller.Wrappers.rdkitwrapper.translateMolecule(mol, vector)

Translates an input molecule.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The molecule to be translated.

  • vector ((float, float, float)) – A 3D vector.

Returns

mol – The translated molecule.

Return type

rdkit.Chem.rdchem.Mol