Skip to main content

Advertisement

A benchmark data set for in silico prediction of ames mutagenicity

In silico prediction tools for Ames mutagenicity (Salmonella typhimurium reverse mutation assay) represent a cost-effective high throughput approach for the prioritization of compounds before submission to experimental testing. Various modeling approaches have been pursued in this field during the last few years. However, the publicly available data sets used for modeling are mostly very limited in terms of size and chemical coverage. Hence, a reasonable comparison of the different modeling methodologies is so far – as for most QSAR problems – impossible.

In this work we describe a collection of about 6000 non-confidential compounds together with their biological activity in the Ames mutagenicity test. This very large, unique and valuable data set built from public sources is made available in machine-readable form (smiles strings) to be used as a benchmark by other researchers. Based on these data we built three statistical prediction models for Ames mutagenicity based on CORINA and DRAGON descriptors. The methods used are a support vector machine, a random forest and Gaussian processes. All three approaches are evaluated within the same cross-validation setting. To facilitate this valuable benchmark, the exact validation protocol including the exact random splits will be made publicly available. The results show that all three methods yield satisfactory results, reaching sensitivity and specificity values of greater than 70% or 80%, respectively. The application of Gaussian processes, previously not applied to Ames mutagenicity prediction proves slightly superior to the other two methods.

Author information

Correspondence to K Hansen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Keywords

  • Support Vector Machine
  • Random Forest
  • Gaussian Process
  • Mutagenicity Test
  • Mutation Assay