Skip to main content

Comparison of various methods for validity evaluation of QSAR models



Quantitative structure–activity relationship (QSAR) modeling is one of the most important computational tools employed in drug discovery and development. The external validation of QSAR models is the main point to check the reliability of developed models for the prediction activity of not yet synthesized compounds. It was performed by different criteria in the literature.


In this study, 44 reported QSAR models for biologically active compounds reported in scientific papers were collected. Various statistical parameters of external validation of a QSAR model were calculated, and the results were discussed.


The findings revealed that employing the coefficient of determination (r2) alone could not indicate the validity of a QSAR model. The established criteria for external validation have some advantages and disadvantages which should be considered in QSAR studies.


This study showed that these methods alone are not only enough to indicate the validity/invalidity of a QSAR model.

Peer Review reports


Quantitative structure–activity relationship (QSAR) is a numerical method for finding the relationships between chemical structure and drug properties i.e., biological activity in drug discovery processes [1]. Developing a QSAR model composed of different stages i.e., (1) collecting data from the literature, (2) calculation of parameters performed by different software packages such as Dragon software or image analysis (2D-QSAR), force field calculations based on three-dimensional structures (3D-QSAR) and etc., (3) developing the QSAR model by various statistical technique e.g. multiple linear regression, artificial neural network and partial least square, and (4) validation of the model by internal (leave one out and leave many out) and external validation [2]. There are various critical points in QSAR studies that should be considered by researchers [3]. However, the challenges on selecting appropriate parameters for external validation have been seen in the literature [4, 5].

In QSAR studies, training a model by linear and non-linear models is not enough to confirm the prediction capability. The developed model should be applied to other data sets which did not synthesize in virtual screening and designing new drug compounds. On the way, whenever we can say a QSAR model is acceptable that it could predict the activity of other compounds with reasonable accuracy. Therefore, external validation (splitting data into training and test sets) is one of the major challenges in QSAR studies [6,7,8]. Various types of cross validation analysis i.e., leave one out, leave many out and repeated double cross validation are recommended in QSAR studies especially when the available sample size is small [9, 10]. However, external validation is one of the most common criteria for evaluating the validity of a QSAR model [11,12,13].

Different criteria and rules were proposed for evaluating the validity of the QSAR models, which most of them focused on the external validation [13, 14]. Five criteria proposed in authentic journals were selected in this study and details have been described in method section. They are highly cited and several researchers were used them to evaluate validity of QSAR models [15,16,17,18]. Designers of each criterion have been shown advantages of them in comparison with others for external validation of QSAR models [5, 6, 19,20,21]. Some models have certain defects from the statistical viewpoint and various results are observed based on the applied software e.g. the correlation coefficient (r2) of regression through origin [5]. Nevertheless, there is no comprehensive comparison between them for the evaluation of the external validity of QSAR models. The aim of this study is the comparison of external validation of QSAR models by them to find advantages and disadvantages of each method.


Forty-four data sets (training and test sets) composed of experimental biological activity and corresponding calculated activity (re-substitution value for training data set) using QSAR models with various statistical approaches were collected from the published articles [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48] indexed in Scopus database (see Additional file 1 and Table 1). The absolute error (AE) of each datum (absolute difference between experimental and calculated data) was calculated. External validation of these data set was assessed with the following methods:

Table 1 The numerical values of statistical parameters which need to calculate mentioned criteria for external validation for 44 developed QSAR models

Proposed criteria by Golbraikh and Tropsha

I. r2 > 0:6, r2 is the coefficient of determination between the experimental activity and predicted values based on regression analysis.

II. 0.85 < K < 1.15 or 0.85 < K' < 1.15.

K and K' are slopes of regression lines through the origin between the experimental activity and predicted, and vice versa, respectively.

III. \(\frac{{\text{r}}^{2}-{\text{r}}_{0}^{2}}{{\text{r}}^{2}}\text{<0.1 or }\frac{{\text{r}}^{2}-{\text{r}}_{0}^{^{\prime}2}}{{\text{r}}^{2}}\text{<0.1}\)

r0.2 and \({\text{r}}_{0}^{^{\prime}2}\) is the coefficient of determination between the experimental activity and predicted values and predicted versus experimental activity, respectively, based on regression through origin analysis (linear regression by least square method without a constant term) [19].

Proposed criteria by Roy based on regression through origin (RTO)

Roy and coworkers suggested \({\text{r}}_{{\text{m}}}^{{2}}\) which calculated by Eq. 1, and it is one of the most famous equations which used by QSAR experts in literature [20, 49]:

$$r_{m}^{2} = r^{2} \left( {1 - \sqrt {r^{2} - r_{0}^{2} } } \right)$$

In this equation,\(r_{0}^{2}\) value computed using regression through origin (RTO) and RTO referred to linear regression by least square method without a constant term.

Concordance correlation coefficient (CCC)

Gramatica and coworker [4] suggested the concordance correlation coefficient (CCC) for external validation of a QSAR model:

$${\text{CCC}} = \frac{{2\sum\limits_{{{\text{i}} = 1}}^{{{\text{n}}_{{{\text{EXT}}}} }} {\left( {{\text{Y}}_{i} - \overline{{\text{Y}}} } \right)\left( {{\text{Y}}_{{{\text{i}}^{\prime } }} - \overline{{\text{Y}}}_{{{\text{i}}^{\prime } }} } \right)} }}{{\sum\limits_{{{\text{i}} = 1}}^{{{\text{n}}_{{{\text{EXT}}}} }} {\left( {{\text{Y}}_{{\text{i}}} - \overline{{\text{Y}}} } \right)^{2} } + \sum\limits_{{{\text{i}} = 1}}^{{{\text{n}}_{{{\text{EXT}}}} }} {\left( {{\text{Y}}_{{{\text{i}}^{\prime } }} - \overline{{\text{Y}}}_{{{\text{i}}^{\prime } }} } \right)^{2} + {\text{n}}_{{{\text{EXT}}}} \left( {{\text{Y}}_{{{\text{i}}^{\prime } }} - \overline{{\text{Y}}}_{{{\text{i}}^{\prime } }} } \right)^{2} } }}$$

Yi is the experimental value, \(\mathop {\text{Y}}\limits^{ - }\) is the average of experimental values, \({\text{Y}}_{{{\text{i}}^{\prime } }}\) is the predicted value of activity and \({\overline{\text{Y}}}_{{\text{i}}}\) is the average of the predicted value of the activity. EXT is external prediction set or test set. CCC > 0.8 accounts as a valid model.

Statistical significant between deviation of experimental activity and calculated data

In 2014, our research group challenged the regression through origin and proposed the calculation of model errors for training and test sets and comparison of them as a reliable method to external validation of QSAR models [5].

Criteria based on training set range and the deviation between experimental and calculated data

Roy and coworkers [21] similar to our method (method 4) proposed new principles based on training set range and absolute average error (AAE) i.e., the difference between experimental and the predicted values of test set, and corresponding standard deviation (SD) for training and test sets as follows:

Good prediction: AAE ≤ 0.1 × training set range and AAE + 3 × SD ≤ 0.2 × traning set range

Bad prediction: AAE > 0.15 × training set range or AAE + 3 × SD > 0.25 × traning set range

A good model should be passed both above criteria. However, the predictions which fall into one of the conditions could be considered as of moderately acceptable model.

Results and discussion

Table 1 listed the numerical values of statistical parameters that need to calculate the mentioned criteria for external validation of 44 developed QSAR models.

The main factor in the validation of QSAR models from a statistical point is different equations even to calculate simple parameters such as r2 and r02 [22, 50]. These different equations will affect the comparison. The r2 in this work was calculated by SPSS software based correlation between experimental and calculated values. However, in the studied criteria in this work, there is a controversy in the calculation of r02. The following equations were applied to the calculation of r02 and in method 1, 2 and Excel software [21]

$${\text{r}}_{0}^{2} = 1 - \frac{{\sum {\left( {Y_{i} - \left( {Y_{fit} = KY_{{i^{\prime}}} } \right)} \right)^{2} } }}{{\sum {\left( {Y_{i} - \overline{Y}_{i} } \right)^{2} } }}$$
$${\text{r}}_{{0}}^{{^{\prime}2}} = 1 - \frac{{\sum {\left( {{\text{Y}}_{{\text{i}}} - \left( {{\text{Y}}_{{{\text{fit}}}} = {\text{K}}^{\prime } \;{\text{Y}}_{{{\text{i}}^{\prime } }} } \right)} \right)^{2} } }}{{\sum {\left( {{\text{Y}}_{{\text{i}}} - \overline{{\text{Y}}}_{{\text{i}}} } \right)^{2} } }}$$

Instead, the alternative formula was proposed instead of the Eqs. 3 and 4 because of statistical defects to the calculation of r2 of RTO [5, 22] which recommended by statistical books in the literature [51, 52]:

$${\text{r}}_{{0}}^{{2}} = {\text{r}}_{{0}}^{{^{\prime}2}} { = }\frac{{\sum {\mathop {{\text{Y}}_{{{\text{fit}}}}^{{2}} }\limits^{{}} } }}{{\sum {\mathop {{\text{Y}}_{{\text{i}}}^{{2}} }\limits^{{}} } }}$$

In addition to statistical defects in Eq. (3) and (4) for the calculation of r02 and r0′2, QSAR researchers, may apply Eq. (5) which proposed as an appropriate equation for r02 and officinal statistical package such as SPSS, and do not give reasonable results. Calculation of \({\text{r}}_{{\text{m}}}^{{2}}\) based on computed \(r_{0}^{2}\) by Eq. (5) (or SPSS software) is not possible because of r2 is commonly less than \(r_{0}^{2}\) and therefore \({\text{r}}^{{2}} {\text{ - r}}_{{0}}^{{2}} { < 0}\). This is the most defect of methods 1 and 2 for the external validation of QSAR models.

Seven of the studied models have r2 < 0.6 (Table 2). Therefore, they could not account as valid models. r2 is simple parameter to evaluate the correlation between experimental and predicted values in QSAR studies and for estimating the correlation between concentration and response in analytical chemistry. It is a primary criterion, and a QSAR model or a developed analytical method with a high r2 value does not necessarily have an acceptable validity [53, 54]. In addition, the squared factors e.g. r2, negatively affects the possibility to distinguish errors in one or in another direction: overpredicted or underpredicted values; these two kinds of errors have a huge different in toxicity and regulatory evaluation.

Table 2 Values of the proposed criteria (method 1–5) for external validation of QSAR models

The numerical values of other proposed criteria in method 1 show that all models have K or K' between 0.85 and 1.15. The third rule (\(\frac{{\text{r}}^{2}-{\text{r}}_{0}^{2}}{{\text{r}}^{2}}\text{<0.1 or }\frac{{\text{r}}^{2}-{\text{r}}_{0}^{^{\prime}2}}{{\text{r}}^{2}}\text{<0.1}\)) is only non-acceptable for 7 models which 3 of them have r2 < 0.6. Therefore, based on the suggested principles in method 1, 11 models are not valid.

Method 2 proposed based on RTO and r02 calculated by Eq. (3). Twenty-six models have \({\text{r}}_{{\text{m}}}^{{2}}\) > 0.5, and the results are similar to method 1 (both of them are based on RTO). The valid models based on method 1 with r2 > 0.75 have \({\text{r}}_{{\text{m}}}^{{2}}\) > 0.5 except model 27 with r02 = 0.101 (close to threshold, 0.1).

The third studied method was proposed by Gramatica and named CCC [4]. Twenty-nine models have CCC > 0.8. All of them are valid models based on method 1. The results of methods 2 and 3 are very similar. Two models (20 and 27) only have CCC > 0.8 while the defined values near to threshold i.e., 0.4 < \({\text{r}}_{{\text{m}}}^{{2}}\) < 0.5. Method 3 is comparable to developed methods based on RTO. However, it has not statistical defects and non-identical datum for r02 based on proposed equations (Eq. (3) and (4) or Eq. (5)) or software (e.g. Excel or SPSS).

Method 4 is based on the calculation of model errors for training and test sets and compares them as a possible reliable method to external validation for models with r2 > 0.6 for test set. The aim of developing a QSAR model is the prediction and elucidation of mechanisms of drug action. It is obvious that the prediction capability of training and test sets should be identical. Without considering the training set, it possible statistical parameters for external validation of test set could be acceptable but a significant difference (independent t-test) between prediction power of training and test set might be a weakness for the model. Twenty-six models have r2 > 0.6 and no significant difference between absolute error (AE) of training and test sets (p > 0.05). Twenty-three models of them have been selected by CCC as a valid model (CCC > 0.8 and p > 0.05). Model 16 has a CCC = 0.55, and AAE of training and test sets are 0.412 ± 0.352 and 0.645 ± 0.489 (p = 0.16), respectively. High values for SD because of outlier data, is the possible reason for non-significant difference between AEs and it could not account validity of the developed model. On the other hand, models 5, 24 and 25 have CCC > 0.9 and p < 0.01. The relative frequencies of AEs for models 5, 24 and 25 sorted in three subgroups, < 0.1, 0.1–0.2 and > 0.2 and illustrated in Figure 1. In these models, AAE values are low; however, there is 50–250% difference between AAE of training and test sets. On the other hand, in model 5, 48% of the training set and 10% of test sets have AE less than 0.1 while 15% of the training set and 60% of test set have AE more than 0.2. Similar patterns are observed in models 24 and 25. In addition, for those models, residual plots have been illustrated in Figure 2. These plots confirm that there is a significant difference between the prediction capability of developed models for training and test sets and it could not be acceptable for a QSAR model to approve prediction capability.

Fig. 1
figure 1

Relative frequency of individual deviation (absolute error) for model 5 (a), model 24 (b) and model 25 (c)

Fig. 2
figure 2

Residual plots for model 5 (a), model 24 (b) and model 25 (c)

The last method (method 5) proposed by Roy’s research group based on the training set range and mean and standard deviation of test set data [21]. The models could be classified as GOOD, MODERATELY GOOD and BAD according to their proposed parameters. Most of the models were categorized as BAD (45%) and GOOD (39%) and a few models were MODERATELY GOOD models (Table 2). The first point that should be considered is r2 > 0.6 as a necessary criterion. All models which have r2 < 0.6 classified as BAD model. Moreover, a good correlation is observed between CCC and GOOD model based on method 5. However, model 11 is a GOOD model while CCC = 0.75 and there is a significant difference between AE of training and test set (AAE of training and test sets are 0.05 and 0.13, respectively and p = 0.01). In comparison with method 4, models 5, 24 and 25 (GOOD models) have a vast difference between AAE of training and test set (Figure 1), although the proposed principles in method 5 could not detect it. A model with a statistically significant difference between the AE of training and test sets might not confirm developing a valid model.

Furthermore, model 3 is a BAD model while CCC = 0.84 and p-value for the difference between AE of training and test is 0.18. AAE of the training set is 0.167 ± 0.171 and 0.266 ± 0.244 (AE ± SD), respectively. High values for SD of training and test sets indicate that there are outlier data which could be considered using statistical parameters e.g. SD of mean errors, in the external validation of QSAR models.

Typographic errors and un-uniformity of applied data set for QSAR modeling or mistake in the determination of the biological activity of studied compounds are a common reason for outlier data, which can decrease the prediction capability of a model. Docking study of outlier cases and comparison with other compounds can help researchers to detect outlier data in developing a QSAR model [55].

These results confirm the results of previous studies which more than a single criterion is recommended to assess the real external predictivity of QSAR models [56]. Moreover, other recommended guidelines in developing QSAR models such as cross validation, appropriate splitting training and test sets variable allocation and correlation coefficients adjusted by degrees of freedom, are other important issues in QSAR studies which should be considered by researchers [10, 57,58,59]. In addition, cross (internal) validation analysis e.g., leave many out and leave one out are recommended in QSAR studies especially when the sample size is small [9, 10], and some reports showed its superiority in external validation [60]. Therefore, both internal and external validation analysis with considering various criteria are necessary to check the validity of a QSAR model.


The aim of developing a QSAR model is an acceptable prediction of activity of a compound before synthesis and biological evaluation. Therefore, external validation is necessary. All of the developed methods for external validation of a QSAR model are useful and a good correlation was observed between the studied methods for the selected models. However, some differences were detected between established methods. Methods 1 and 2 are valuable but they are some questionable points in the applied equation for \(r_{0}^{2}\) calculation. CCC is a valuable parameter, though in some cases, it cannot detect outlier data. Similar to methods 1 and 2, training data set are not included in CCC. Method 4 and 5 established based on training and test sets. They detected most invalid models, but method 5 considered some model as a GOOD model while the difference between AE of training and test sets are substantial (p < 0.05). On the other way, high SD value in both of training and test sets may pass proposed criterion of method 4 while accounted as a invalid model because of outlier data in training and test sets. Finally, evaluation of a model with either established method is useful, but they did not necessarily mean validity/invalidity of a QSAR model. The results of this study show the importance of calculation error of training and test sets and detection of outliers for checking the validity of a model.

Availability of data and materials

All data is available as supplementary.



Quantitative structure–activity relationship


Absolute error


Regression through origin


Concordance correlation coefficient


Absolute average error


Standard deviation


  1. Norouzi S, Farahani M, Nejad Ebrahimi S. The Integration of pharmacophore-based 3D-QSAR modeling and virtual screening in identification of natural product inhibitors against SARS-CoV-2. Pharm Sci. 2021;27:S94–108.

    CAS  Google Scholar 

  2. Dearden JC. Whither QSAR? Pharm Sci. 2017;23(2):82–3.

    Article  Google Scholar 

  3. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, et al. QSAR modeling: Where have you been? Where are you going to? J Med Chem. 2014;57(12):4977–5010.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Chirico N, Gramatica P. Real external predictivity of QSAR models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J Chem Inf Model. 2011;51(9):2320–35.

    CAS  PubMed  Article  Google Scholar 

  5. Shayanfar A, Shayanfar S. Is regression through origin useful in external validation of QSAR models? Eur J Pharm Sci. 2014;59(1):31–5.

    CAS  PubMed  Article  Google Scholar 

  6. Gramatica P, Cassani S, Roy PP, Kovarich S, Yap CW, Papa E. QSAR modeling is not “push a button and find a correlation”: a case study of toxicity of (Benzo-)triazoles on Algae. Mol Informatics. 2012;31(11–12):817–35.

    CAS  Article  Google Scholar 

  7. Veselinović JB, Veselinović AM, Toropova AP, Toropov AA. The Monte Carlo technique as a tool to predict LOAEL. Eur J Med Chem. 2016;116:71–5.

    PubMed  Article  CAS  Google Scholar 

  8. Zivkovic M, Zlatanovic M, Zlatanovic N, Golubović M, Veselinović AM. The application of the combination of monte carlo optimization method based QSAR modeling and molecular docking in drug design and development. Mini-Rev Med Chem. 2020;20(14):1389–402.

    CAS  PubMed  Article  Google Scholar 

  9. Hawkins DM, Basak SC, Mills D. Assessing model fit by cross-validation. J Chem Inf Comput Sci. 2003;43(2):579–86.

    CAS  PubMed  Article  Google Scholar 

  10. Gütlein M, Helma C, Karwath A, Kramer S. A large-scale empirical evaluation of cross-validation and external test set validation in (Q)SAR. Mol Informatics. 2013;32(5–6):516–28.

    Article  CAS  Google Scholar 

  11. Filzmoser P, Liebmann B, Varmuza K. Repeated double cross validation. J Chemometr. 2009;23(4):160–71.

    CAS  Article  Google Scholar 

  12. Esbensen KH, Geladi P. Principles of proper validation: use and abuse of re-sampling for validation. J Chemometr. 2010;24(3–4):168–87.

    CAS  Article  Google Scholar 

  13. Gramatica P. External evaluation of QSAR models, in addition to cross-validation: verification of predictive capability on totally new chemicals. Mol Informatics. 2014;33(4):311–4.

    CAS  Article  Google Scholar 

  14. Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, et al. QSAR without borders. Chem Soc Rev. 2020;49(11):3525–64.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. Đorđević V, Pešić S, Živković J, Nikolić GM, Veselinović AM. Development of novel antipsychotic agents by inhibiting dopamine transporter: in silico approach. New J Chem. 2022;46(6):2687–96.

    Article  Google Scholar 

  16. Perić V, Golubović M, Lazarević M, Marjanović V, Kostić T, Đorđević M, Milić D, Veselinović AM. Development of potential therapeutics for pain treatment by inducing Sigma 1 receptor antagonism: in silico approach. New J Chem. 2021;45(27):12286–95.

    Article  Google Scholar 

  17. Živković JV, Trutić NV, Veselinović JB, Nikolić GM, Veselinović AM. Monte Carlo method based QSAR modeling of maleimide derivatives as glycogen synthase kinase-3β inhibitors. Comput Biol Med. 2015;64:276–82.

    PubMed  Article  CAS  Google Scholar 

  18. Hamzeh-Mivehroud M, Khoshravan-Azar Z, Dastmalchi S. QSAR and molecular docking studies on non-imidazole-based histamine h3 receptor antagonists. Pharm Sci. 2020;26(2):165–74.

    CAS  Article  Google Scholar 

  19. Golbraikh A, Tropsha A. Beware of q2! J Mol Graph Model. 2002;20(4):269–76.

    CAS  PubMed  Article  Google Scholar 

  20. Roy PP, Roy K. On some aspects of variable selection for partial least squares regression models. QSAR Comb Sci. 2008;27(3):302–13.

    CAS  Article  Google Scholar 

  21. Roy K, Das RN, Ambure P, Aher RB. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometr Intell Lab Syst. 2016;152:18–33.

    CAS  Article  Google Scholar 

  22. Eisenhauer JG. Regression through the origin. Teach Stat. 2003;25(3):76–80.

    Article  Google Scholar 

  23. Zhang X, Zhang H. 3D-QSAR studies on 1,2,4-triazolyl 5-azaspiro [2.4]-heptanes as D3R antagonists. Chem Phys Lett. 2018;704:11–20.

    CAS  Article  Google Scholar 

  24. Patil RB, Barbosa EG, Sangshetti JN, Sawant SD, Zambre VP. LQTA-R: A new 3D-QSAR methodology applied to a set of DGAT1 inhibitors. Comput Biol Chem. 2018;74:123–31.

    CAS  PubMed  Article  Google Scholar 

  25. Aouidate A, Ghaleb A, Ghamali M, Ousaa A, Choukrad M, Sbai A, Bouachrine M, Lakhlifi T. 3D QSAR studies, molecular docking and ADMET evaluation, using thiazolidine derivatives as template to obtain new inhibitors of PIM1 kinase. Comput Biol Chem. 2018;74:201–11.

    CAS  PubMed  Article  Google Scholar 

  26. Gao J, Sun J, Wang T, Sheng S, Huang T. Combined 3D-QSAR modeling and molecular docking study on spiro-derivatives as inhibitors of acetyl-CoA carboxylase. Med Chem Res. 2017;26(2):361–71.

    CAS  Article  Google Scholar 

  27. Arthur DE, Uzairu A, Mamza P, Abechi SE, Shallangwa G. Activity and toxicity modelling of some NCI selected compounds against leukemia P388ADR cell line using genetic algorithm-multiple linear regressions. J King Saud Univ Sci. 2020;32(1):324–31.

    Article  Google Scholar 

  28. González MP, Teran Moldes MDC, Fall Y, Dias LC, Helguera AM. A topological sub-structural approach to the mutagenic activity in dental monomers. 3. Heterogeneous set of compounds. Polymer. 2005;46(8):2783–90.

    Article  CAS  Google Scholar 

  29. Xu F, Yang ZZ, Ke ZL, Xi LM, Yan QD, Yang WQ, Zhu LQ, Lin FL, Lv WK, Wu HG, et al. Synthesis, antitumor evaluation and 3D-QSAR studies of [1,2,4]triazolo[4,3-b][1,2,4,5]tetrazine derivatives. Bioorg Med Chem Lett. 2016;26(19):4580–6.

    CAS  PubMed  Article  Google Scholar 

  30. Ugale VG, Patel HM, Surana SJ. Molecular modeling studies of quinoline derivatives as VEGFR-2 tyrosine kinase inhibitors using pharmacophore based 3D QSAR and docking approach. Arab J Chem. 2017;10:S1980–2003.

    CAS  Article  Google Scholar 

  31. Arthur DE, Uzairu A, Mamza P, Abechi SE, Shallangwa G. Insilco study on the toxicity of anti-cancer compounds tested against MOLT-4 and p388 cell lines using GA-MLR technique. BeniSuef Univ J Basic Appl Sci. 2016;5(4):320–33.

    Google Scholar 

  32. Bhatia MS, Pakhare KD, Choudhari PB, Jadhav SD, Dhavale RP, Bhatia NM. Pharmacophore modeling and 3D QSAR studies of aryl amine derivatives as potential lumazine synthase inhibitors. Arab J Chem. 2017;10:S100–4.

    CAS  Article  Google Scholar 

  33. Aouidate A, Ghaleb A, Ghamali M, Chtita S, Ousaa A, Choukrad M, Sbai A, Bouachrine M, Lakhlifi T. QSAR study and rustic ligand-based virtual screening in a search for aminooxadiazole derivatives as PIM1 inhibitors. Chem Cent J. 2018;12:32.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Sharma MC, Jain S, Sharma R. Trifluorophenyl-based inhibitors of dipeptidyl peptidase-IV as antidiabetic agents: 3D-QSAR COMFA, CoMSIA methodologies. Netw Model Anal Health Inform Bioinform. 2018;7:1.

    Article  Google Scholar 

  35. Tong J, Lei S, Qin S, Wang Y. QSAR studies of TIBO derivatives as HIV-1 reverse transcriptase inhibitors using HQSAR, CoMFA and CoMSIA. J Mol Struct. 2018;1168:56–64.

    CAS  Article  Google Scholar 

  36. Liu G, Wang W, Wan Y, Ju X, Gu S. Application of 3D-QSAR, pharmacophore, and molecular docking in the molecular design of diarylpyrimidine derivatives as HIV-1 nonnucleoside reverse transcriptase inhibitors. Int J Mol Sci. 2018;19(5):1436.

    PubMed Central  Article  CAS  Google Scholar 

  37. Behgozin SM, Fatemi MH. 3D-QSAR modeling of maximum steady-state fluxes of some substituted benzenes and quinolone derivatives through polydimethylsiloxane membrane. J Iran Chem Soc. 2018;15(6):1293–300.

    CAS  Article  Google Scholar 

  38. Kaczor AA, Żuk J, Matosiuk D. Comparative molecular field analysis and molecular dynamics studies of the dopamine D2 receptor antagonists without a protonatable nitrogen atom. Med Chem Res. 2018;27(4):1149–66.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. Wang ZZ, Ma CY, Yang J, Gao QB, Sun XD, Ding L, Liu HM. Investigating the binding mechanism of (4-Cyanophenyl)glycine derivatives as reversible LSD1 by 3D-QSAR, molecular docking and molecular dynamics simulations. J Mol Struct. 2019;1175:698–707.

    CAS  Article  Google Scholar 

  40. Singh U, Gangwal RP, Dhoke GV, Prajapati R, Damre M, Sangamwar AT. 3D-QSAR and molecular docking analysis of (4-piperidinyl)-piperazines as acetyl-CoA carboxylases inhibitors. Arab J Chem. 2017;10:S617–26.

    CAS  Article  Google Scholar 

  41. Türkmenoğlu B, Güzel Y. Molecular docking and 4D-QSAR studies of metastatic cancer inhibitor thiazoles. Comput Biol Chem. 2018;76:327–37.

    PubMed  Article  CAS  Google Scholar 

  42. Chun-Zhi H, Shu-Wei X, Hu W, Jun X, Liangmin Y. Using 3D-QSAR and molecular docking insight into inhibitors binding with complex-associated kinases CDK8. J Mol Struct. 2018;1173:498–511.

    Article  CAS  Google Scholar 

  43. Ajay Kumar TV, Athavan AAS, Loganathan C, Saravanan K, Kabilan S, Parthasathy V. Design, 3D QSAR modeling and docking of TGF-β type I inhibitors to target cancer. Comput Biol Chem. 2018;76:232–44.

    CAS  PubMed  Article  Google Scholar 

  44. Ounissi M, Kameli A, Tigrine C, Rachedi FZ. Computer-aided identification of natural lead compounds as cyclooxygenase-2 inhibitors using virtual screening and molecular dynamic simulation. Comput Biol Chem. 2018;77:1–16.

    CAS  PubMed  Article  Google Scholar 

  45. Ghasemi JB, Davoudian V. 3D-QSAR and docking studies of a series of β -carboline derivatives as antitumor agents of PLK1. J Chem. 2014;2014:10.

    Article  CAS  Google Scholar 

  46. Zheng J, Kong H, Wilson JM, Guo J, Chang Y, Yang M, Xiao G, Sun P. Insight into the interactions between novel isoquinolin-1,3-dione derivatives and cyclin-dependent kinase 4 combining QSAR and molecular docking. PLoS ONE. 2014;9(4): e93704.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  47. Li Y, Ning J, Wang Y, Wang C, Sun C, Huo X, Yu Z, Feng L, Zhang B, Tian X, et al. Drug interaction study of flavonoids toward CYP3A4 and their quantitative structure activity relationship (QSAR) analysis for predicting potential effects. Toxicol Lett. 2018;294:27–36.

    CAS  PubMed  Article  Google Scholar 

  48. Hao M, Ren H, Luo F, Zhang S, Qiu J, Ji M, Si H, Li G. A computational study on thiourea analogs as potent MK-2 inhibitors. Int J Mol Sci. 2012;13(6):7057–79.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Ojha PK, Mitra I, Das RN, Roy K. Further exploring rm 2 metrics for validation of QSPR models. Chemometr Intell Lab Syst. 2011;107(1):194–205.

    CAS  Article  Google Scholar 

  50. Avdeef A. Do you know your r2? ADMET DMPK. 2021;9(1):69–74.

    PubMed  Google Scholar 

  51. Chattefuee S, Hadi AS. Regression analysis by example. 4th ed. Hoboken: Wiley; 2006.

    Book  Google Scholar 

  52. Hulsizer MR, Woolf LM. a guide to teaching statistics: innovations and best practices. Oxford: Wiley; 2009.

    Google Scholar 

  53. Kaneko H. Beware of r2 even for test datasets: using the latest measured y-values (r2 LM) in time series data analysis. J Chemometr. 2019;33(2): e3093.

    Article  CAS  Google Scholar 

  54. Shayanfar A, Ershadi S. Developing new criteria for validity evaluation of analytical methods. J AOAC Int. 2019;102(6):1908–16.

    CAS  PubMed  Article  Google Scholar 

  55. Ghandadi M, Shayanfar A, Hamzeh-Mivehroud M, Jouyban A. Quantitative structure activity relationship and docking studies of imidazole-based derivatives as P-glycoprotein inhibitors. Med Chem Res. 2014;23(11):4700–12.

    CAS  Article  Google Scholar 

  56. Gramatica P, Sangion A. A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J Chem Inf Model. 2016;56(6):1127–31.

    CAS  PubMed  Article  Google Scholar 

  57. Rácz A, Bajusz D, Héberger K. Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters. SAR QSAR Environ Res. 2015;26(7–9):683–700.

    PubMed  Article  CAS  Google Scholar 

  58. Tóth G, Király P, Kovács D. Effect of variable allocation on validation and optimality parameters and on cross-optimization perspectives. Chemometr Intelligent Lab Syst. 2020;204:104106.

    Article  CAS  Google Scholar 

  59. Dearden JC, Cronin MTD, Kaiser KLE. How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res. 2009;20(3–4):241–66.

    CAS  PubMed  Article  Google Scholar 

  60. Majumdar S, Basak SC. Beware of external validation!-a comparative study of several validation techniques used in qsar modelling. Curr Comput-Aided Drug Des. 2018;14(4):284–91.

    CAS  PubMed  Article  Google Scholar 

Download references


Not applicable.


The authors would like to thanks from Tabriz University of Medical Sciences for the financial support (65369) of the project.

Author information

Authors and Affiliations



SS and AS performed data collecting, analysis and manuscript writing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ali Shayanfar.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Forty-four data sets (training and test sets) composed of experimental biological activity and corresponding calculated activity.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shayanfar, S., Shayanfar, A. Comparison of various methods for validity evaluation of QSAR models. BMC Chemistry 16, 63 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Biological activity
  • External validation
  • QSAR
  • Statistical parameters