Exploring benchmark dataset bias in ligand based virtual screening

Baumann, Knut; Rohrer, SG

doi:10.1186/1752-153X-2-S1-P1

Volume 2 Supplement 1

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

Poster presentation
Open access
Published: 26 March 2008

Exploring benchmark dataset bias in ligand based virtual screening

Knut Baumann¹ &
SG Rohrer¹

Chemistry Central Journal volume 2, Article number: P1 (2008) Cite this article

3055 Accesses
1 Citations
Metrics details

A common finding of many reports evaluating VS methods is that validation results vary considerably with changing datasets, i.e. chemical space of the active ligands. It is assumed that these dataset specific effects are caused by the self-similarity and cluster structure inherent to these datasets.

As a first step, an experimental setup was developed that isolated dataset composition as the sole factor of variance influencing VS performance. The Hert-Willet benchmark datasets have been widely used for the validation of ligand based VS protocols. Various sampling strategies (D-optimum design, Onion-design, minimum distance design) were employed to generate archetypal subsamples from these datasets: (1) maximum diversity subsets, (2) space filling samples and (3) subsets with the minimum intra-set diversity. The analysis of the varying VS performance on these prototype datasets showed that dataset composition does indeed exert a critical influence on VS validation and identified local clustering and global spread of the datasets with respect to the set of decoys as the factors with the highest impact on VS performance.

Keeping the concept of chemical space in mind, it is reasonable to make use of the field of spatial statistics, which offers a wealth of methods for the analysis of clustering, patchiness and dispersion of datasets. By employing these, we were able to analyse the spatial composition of the benchmark datasets in more detail and derive several rules of thumb for choosing unbiased datasets for evaluating ligand based VS methods.

Author information

Authors and Affiliations

Institute of Pharmaceutical Chemistry, Braunschweig University of Technology, Beethovenstr. 55, 38106, Braunschweig, Germany
Knut Baumann & SG Rohrer

Authors

Knut Baumann
View author publications
You can also search for this author in PubMed Google Scholar
SG Rohrer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Baumann, K., Rohrer, S. Exploring benchmark dataset bias in ligand based virtual screening. Chemistry Central Journal 2 (Suppl 1), P1 (2008). https://doi.org/10.1186/1752-153X-2-S1-P1

Download citation

Published: 26 March 2008
DOI: https://doi.org/10.1186/1752-153X-2-S1-P1

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

Exploring benchmark dataset bias in ligand based virtual screening

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

BMC Chemistry

Contact us

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

Exploring benchmark dataset bias in ligand based virtual screening

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Chemistry

Contact us