- Poster presentation
- Open Access
- Published:
Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches
Chemistry Central Journal volume 3, Article number: P28 (2009)
Measuring the (dis)similarity of molecules is, besides descriptor selection, an important factor for many cheminformatics applications like compound ranking, clustering, and, property prediction. In this work, we focus on real-valued vector spaces (as opposed to the binary spaces of, e.g., fingerprints). We demonstrate the severe influence the choice of (dis)similarity measure can have on the results of cheminformatics applications, and provide recommendations for such choices.
We briefly review the mathematical concepts [1] used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products and similarity coefficients, and the relationships between them, employing commonly used [2][3] (dis)similarity measures in cheminformatics as examples.
Then, we present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration [4][5][6]) in high-dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated with both artificial and real (bioactivity) data examples.
References
Meyer C: Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia. 2001
Leach A, Gillet V: An Introduction to Chemoinformatics. 2003, Springer Netherlands
Willett P: J Chem Inf Comput Sci. 1998, 38: 983-996.
Aggarwal C, Hinneburg A, Keim D: ICDT 2001 Proceedings, 2001, LNCS. 1973, 420-434.
Beyer K, Goldstein J, Ramakrishnan R, Shaft U: ICDT 1999 Proceedings, LNCS 1540. 1999, 217-235.
Francois D, Wertz V, Verleysen M: IEEE Trans Knowl Data Eng. 2007, 19: 873-886. 10.1109/TKDE.2007.1037.
Author information
Authors and Affiliations
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Rupp, M., Schneider, G. Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches. Chemistry Central Journal 3 (Suppl 1), P28 (2009). https://doi.org/10.1186/1752-153X-3-S1-P28
Published:
DOI: https://doi.org/10.1186/1752-153X-3-S1-P28
Keywords
- Vector Space
- Similarity Measure
- Similarity Coefficient
- Mathematical Concept
- Empty Space