In silico prediction of aqueous solubility – classification models

Kramer, C; Beck, B; Clark, T

doi:10.1186/1752-153X-2-S1-P23

Volume 2 Supplement 1

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

Poster presentation
Open access
Published: 26 March 2008

In silico prediction of aqueous solubility – classification models

C Kramer^1,2,
B Beck² &
T Clark¹

Chemistry Central Journal volume 2, Article number: P23 (2008) Cite this article

2645 Accesses
Metrics details

Solubility is a very important parameter in pharmaceutical research, especially for the early phase of drug discovery in fully automatized high throughput screening, compound pool extension and SAR and ADME-Tox parameter measurement. In recent years a multitude of models has been published concerned with the exact prediction of aqueous solubility. Still, almost all in the meantime commercially available tools suffer from comparably bad R²y values for the prediction of solubility of pharmaceutically relevant molecules [1]. First, this might be attributed either to a bad data situation, as the reaction conditions for obtaining solubility data published in the literature are quite different. Second, many compounds with solubility values extracted from literature are not druglike. But even with high quality data measured in one lab, R²y values derived from that data with the latest high-end algorithms are often not satisfying. In a very careful study recently published by Müller et al, with a Gaussian process model they got an R²y value of 0.53 on a separate dataset derived from inhouse shake-flask experiments [1].

However, knowing the exact value is not really important for many applications; it is rather important to know whether a certain compound will be insoluble under the used test-conditions and should thus be excluded from the experiment.

In order to address this question we built classification models based on two datasets measured inhouse at Boehringer-Ingelheim at pH 7.4: one kinetic set of solubility measurements based on nephelometry and one thermodynamic set of solubility measurements based on shake-flask experiments. The datasets were divided into three classes, one well soluble class, one insoluble class and a buffer class in between to compensate for noisy data. For these datasets, we built classification models using support-vector machines (SVM) and Bayesian regularized neural networks (BRANN), trying several different descriptor sets. In each case, MOE2D descriptors and a SVM model gives the best raw results with an overall accuracy of ~70% for triple crossvalidation. Leaving out the predictions for and of the buffer class i.e. only considering strong outliers, the overall accuracy is ~88.5 %.

We evaluated classifier fusion and model applicability domain (MAD) considerations for this dataset. Applying these, we achieved accuracies of ~93 % for ~80 % of the dataset.

References

Schwaighofer A, Schroeter T, Mika S, Laub J, ter Laak A, Sülzle D, Ganzer U, Heinrich N, Müller K-R: J Chem Inf Model. 2007, 47: 407-424. 10.1021/ci600205g.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Computer-Chemie-Centrum and Interdisciplinary Center for Molecular Materials Friedrich-Alexander Universität Erlangen-Nürnberg, Nägelsbachstrasse 25, 91052, Erlangen, Germany
C Kramer & T Clark
Boehringer-Ingelheim Pharma GmbH&Co KG, Department of Lead Discovery, Birkendorferstr. 65, 88397, Biberach, Germany
C Kramer & B Beck

Authors

C Kramer
View author publications
You can also search for this author in PubMed Google Scholar
B Beck
View author publications
You can also search for this author in PubMed Google Scholar
T Clark
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kramer, C., Beck, B. & Clark, T. In silico prediction of aqueous solubility – classification models. Chemistry Central Journal 2 (Suppl 1), P23 (2008). https://doi.org/10.1186/1752-153X-2-S1-P23

Download citation

Published: 26 March 2008
DOI: https://doi.org/10.1186/1752-153X-2-S1-P23

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

In silico prediction of aqueous solubility – classification models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

BMC Chemistry

Contact us

3rd German Conference on Chemoinformatics: 21. CIC-Workshop

In silico prediction of aqueous solubility – classification models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Chemistry

Contact us