- Poster presentation
- Open Access
Two-step hierarchical assignments on molecular graphs
Chemistry Central Journalvolume 3, Article number: P13 (2009)
Measures for the similarity of molecules are of interest for several in silico based tasks like virtual screening or de novo structure design. The Optimal Assignment Kernel (OAK)  is a successful similarity measure, although it is not a valid kernel, since the function is not positive definite . Careful investigations of the assignment on the atom level disclose that the optimal assignment with the Hungarian algorithm may result in topological errors. These errors are mappings of atoms from chemical substructures like ring systems to atoms of the other molecule, which belong to different substructures or even can be scattered among the molecule. This yields an overall higher similarity score but is problematic from a chemical point of view. To avoid these topological errors we developed a two-step hierarchical assignment method and compared it with the OAK.
As a pre-processing step, our method separates the molecules in disjunctive molecular fragments like aromatic systems, rings and conjugated environments. The first assignment step maps these fragments of the molecules to corresponding fragments and guarantees a substructure preserving mapping at the atom level in the second assignment step. A special penalty function penalizes mappings between atoms from different substructures, which were not mapped in the first step. This function also adjusts the similarity score of mappings from atoms included in fragments to atoms, which were not part of a fragment. These modifications reduce the probability of topological errors and produce assignments with a reasonable mapping between molecular substructures.
Virtual screening results of the OAK and the hierarchical assignment approach on several datasets from the directory of useful decoys (DUD)  showed that the hierarchical assignment achieved better BEDROC scores.  This performance gain is the result of the penalty of topological errors and ensures an improved distinction between biologically active and inactive compounds.
Fröhlich H, et al: QSAR Comb Sci. 2006, 25: 317-326. 10.1002/qsar.200510135.
Vert J-P: Technical Report HAL-00218278. 2008
Huang N, et al: J Med Chem. 2006, 49: 6789-6801. 10.1021/jm0608356.
Truchon J-F, Bayly CI: J Chem Inf Model. 2007, 41: 488-508. 10.1021/ci600426e.