Optimal partner wavelength combination method applied to NIR spectroscopic analysis of human serum globulin

Han, Yun; Zhong, Yun; Zhou, Huihui; Kuang, Xuesong

doi:10.1186/s13065-020-00689-z

Research article
Open access
Published: 24 May 2020

Optimal partner wavelength combination method applied to NIR spectroscopic analysis of human serum globulin

Yun Han¹,
Yun Zhong²,
Huihui Zhou¹ &
…
Xuesong Kuang¹

BMC Chemistry volume 14, Article number: 37 (2020) Cite this article

1734 Accesses
3 Citations
2 Altmetric
Metrics details

Abstract

Human serum globulin (GLB), which contains various antibodies in healthy human serum, is of great significance for clinical trials and disease diagnosis. In this study, the GLB in human serum was rapidly analyzed by near infrared (NIR) spectroscopy without chemical reagents. Optimal partner wavelength combination (OPWC) method was employed for selecting discrete information wavelength. For the OPWC, the redundant wavelengths were removed by repeated projection iteration based on binary linear regression, and the result converged to stable number of wavelengths. By the way, the convergence of algorithm was proved theoretically. Moving window partial least squares (MW-PLS) and Monte Carlo uninformative variable elimination PLS (MC-UVE-PLS) methods, which are two well-performed wavelength selection methods, were also performed for comparison. The optimal models were obtained by the three methods, and the corresponding root-mean-square error of cross validation and correlation coefficient of prediction (SECV, R_P,CV) were 0.813 g L⁻¹ and 0.978 with OPWC combined with PLS (OPWC-PLS), and 0.804 g L⁻¹ and 0.979 with MW-PLS, and 1.153 g L⁻¹ and 0.948 with MC-UVE-PLS, respectively. The OPWC-PLS and MW-PLS methods achieved almost the same good results. However, the OPWC only contained 28 wavelengths, so it had obvious lower model complexity. Thus it can be seen that the OPWC-PLS has great prediction performance for GLB and its algorithm is convergent and rapid. The results provide important technical support for the rapid detection of serum.

Introduction

Near infrared (NIR) spectroscopy is a green and developing analytical technique, which has been widely used in life sciences [1,2,3,4,5,6,7], agricultural products and food [8,9,10,11], soil [12,13,14], and other fields [15, 16]. For NIR spectroscopic analysis of complex system, wavelength selection is necessary and difficult. So far, many methods including continuous mode and discrete mode of wavelength selection have been successfully used in NIR spectroscopy analysis, but a general and effective method has not been found. Moving window partial least squares (MW-PLS) is a widely used and well performed wavelength selection method, which uses a moving window whose position and size can be changed to identify and select continuous wavebands in terms of the prediction effect, and such waveband can correspond to absorption of specific functional groups [13, 15, 16]. This method can achieve high prediction effect on most spectral data sets, so it often presents as the comparison method of new method to evaluate the performance of the new method. However, it can be seen from the papers [16,17,18], as a traversal algorithm for continuous wavebands, all possible continuous bands are screened, this method is time-consuming when encountering a large dataset. Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) is a popular method for discrete wavelength selection [19], which creatively introduced noise to eliminate uninformative variables, but it cannot achieve satisfactory prediction results for some data sets.

Serum globulin (GLB), which is synthesized by human monocyte-phagocyte system, contains various antibodies in the serum of healthy people, so it can enhance the body’s resistance to prevent infection. It is mainly used for immunodeficiency diseases as well as prevention and treatment of viral infections and bacterial infections such as infectious hepatitis, measles, chickenpox, mumps and herpes zoster. In addition, it can also be used in asthma, allergic rhinitis, eczema and other endogenous allergic diseases. Therefore, the GLB in human serum is very important for clinical trials and disease diagnosis. In previous studies [20, 21], FTIR/ATR spectroscopy was used for determination of GLB. The study found that for blood index, the NIR has higher quantitative analysis accuracy than the FTIR/ATR spectroscopy [6, 22]. The experimental results show that the molecular absorption information of GLB can be captured by NIR spectroscopy without reagent.

Optimal partner wavelength combination (OPWC) is a method of selecting discrete information wavelength by iteration. For the method, the best partner of each wavelength in a predetermined wavelength region was determined based on binary linear regression (BLR), and a partner wavelength subset (PWS) was obtained; then the best partner of each wavelength in the PWS was obtained with the same method. The iterative process may be continued until convergence was met, and the last obtained wavelength subset was called OPWC. On the basis of the OPWC, PLS model was established. In order to make full use of the samples, the leave-one-out cross validation (LOOCV) was adopted.

Because human serum is a complex multi-component system and the absorption interference of other components is very complex, it is difficult to extract the characteristic information of GLB. Therefore, OPWC-PLS method was employed to remove redundant wavelength and establish a high precision quantitative model. MW-PLS and MC-UVE-PLS methods were also performed for comparison. Experimental results showed that the OPWC-PLS has great prediction performance and the algorithm is convergent and rapid.

Materials and methods

Experiment

A total of 230 human serum samples were collected in this experiment and their GLB values were determined using routine clinical biochemical tests. This work was supported by Youth Innovation Talents Project of Colleges and Universities in Guangdong Province (No. Q18285), and all individual participants provided written informed consent. The study protocol was performed in accordance with relevant laws and institutional guidelines and was approved by local medical institutions and ethics committee. The obtained results were used as reference values in NIR spectroscopy analysis. The statistical analysis of the measured GLB values of 230 samples is given in Table 1.

Table 1 Statistical analysis of measured GLB values of 230 samples

Full size table

The spectroscopy instrument was an XDS Rapid Content™ Liquid Grating Spectrometer (FOSS, Denmark) equipped with a transmission accessory and a 2 mm cuvette. The spectral scanning range was 780-2498 nm with a 2 nm wavelength gap; the detector were Si (780–1100 nm) and Pbs (1100–2498 nm). The temperature and relative humidity of the laboratory were 25 ± 1 °C and 46 ± 1% RH, respectively. Each sample was measured three times, and the mean value of the three measurements was used for modeling.

Modeling process

Leave-one-out cross validation (LOOCV) is commonly used as the object function for model selection, which aims to make full use of the samples information. In this study, LOOCV was conducted for modeling process, as described below. Only one sample was left out from modeling samples for the prediction, and the other samples were used as calibration set. This process was repeated until the prediction value of every modeling sample was obtained. The measured and predicted values of ith sample in modeling set were denoted as $ C_{{{\text{M}},{\kern 1pt} {\kern 1pt} i}} , $$ \tilde{C}_{{{\text{M}},{\kern 1pt} {\kern 1pt} i}} , $$ i = 1,{\kern 1pt} {\kern 1pt} \;2, \ldots ,\;n_{\text{M}} , $$ n_{\text{M}} $ was the number of modeling samples. For all samples, the mean measured value was denoted as $ C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} , $ and the mean predicted value was denoted as $ \tilde{C}_{{{\text{M}},{\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} $. The prediction accuracy was evaluated by the root-mean-square errors of cross validation and the predicted correlation coefficients, and denoted as SECV and R_P,CV, respectively. The calculation formulas were as the follows:

$$ {\text{SECV }} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{{n_{M} }} {(\tilde{C}_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} - C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} )^{2} } }}{{n_{M} }}} , $$

(1)

$$ {\text{R}}_{\text{P, CV}} = \frac{{\sum\nolimits_{i = 1}^{{n_{M} }} {(C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} - C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} )(\tilde{C}_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} - \tilde{C}_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} )} }}{{\sqrt {\sum\nolimits_{i = 1}^{{n_{M} }} {(C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} - C_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} )^{2} (\tilde{C}_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} i}}^{{}} - \tilde{C}_{{{\text{M,}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\text{Ave}}}}^{{}} )^{2} } } }} $$

(2)

The model parameters were selected to achieve minimum SECV.

MW-PLS method

MW-PLS is a time-tested and popular method for screening continuous wavebands. This method uses several continuous wavelengths as a window, the size and position of which can be changed, and the PLS models are established for all possible windows in a predetermined search region of the spectrum. The information waveband was selected according to the minimum SECV. In this study, the search range of the MW-PLS was full spectrum region (780–2498 nm) with 860 wavelengths, and the initial wavelength (I) and number of wavelengths (N) of window as well as the number of PLS factors (F) were set as $ I \in \{ 780,\;782, \ldots ,\;2498\} $, $ N \in \{ 1,\;2, \ldots ,\;200\} \cup \{ 210,\;220, \ldots ,\;860\} $, and $ F \in \{ 1,\;2, \ldots ,20\} $. The LOOCV for PLS models was performed in each combination of (I, N, F), and the corresponding SECV and R_P,CV were calculated. The optimal waveband with minimum SECV was selected to achieve the best prediction accuracy.

MC-UVE-PLS method

MC-UVE-PLS is a representative method for screening discrete wavelengths. For the method, lots of models are established with randomly selected calibration samples, then the coefficient stability of these models is calculated, and each variable is evaluated with the stability of the corresponding coefficient [19]. In this study, MC-UVE method was performed based on the full spectrum region, and Monte Carlo sampling operation 500 times. The number of variables was determined using the method in Ref. [19]. MC-UVE-PLS was rerun for 50 times and the best result was recorded for further analysis. The number of PLS factors F was set to be $ F \in \{ 1,\;2, \ldots ,30\} $.

OPWC-PLS method

Based on BLR, the best partner of each wavelength was screened for entire scanning region and a partner wavelength subset (PWS) is determined. Then, a new PWS of all wavelengths in the PWS are also determined according to above obtained correspondence. The same procedure was performed repeatedly until the results converged to optimal partner wavelength combination (OPWC). The specific steps are as follows:

Step 1 Assume that there are N wavelengths in the wavelength screening area $ \Delta $, namely, $ \Delta = \left\{ {\lambda_{1} ,\,\lambda_{2} , \ldots ,\,{\kern 1pt} \lambda_{N} {\kern 1pt} } \right\} $. For any fixed $ \lambda_{i} \in \Delta $, and $ \forall \lambda_{k} \in \Delta ,{\kern 1pt} {\kern 1pt} \;{\kern 1pt} k \ne i $, LOOCV was performed based on binary linear regression of wavelength combination $ (\lambda_{i} ,{\kern 1pt} \,\lambda_{k} ) $. The best partner of $ \lambda_{i} $ was identified and denoted as $ f(\lambda_{i} ) $ based on minimum $ {\text{SECV}}(\lambda_{i} ,{\kern 1pt} \lambda_{k} ) $. The formula is as follows,

$$ {\text{SECV}}(\lambda_{i} ,{\kern 1pt} f(\lambda_{i} )) = \mathop {\hbox{min} }\limits_{\begin{subarray}{l} k = 1,2, \cdots ,N \\ k \ne i \end{subarray} } {\text{SECV}}(\lambda_{i} ,{\kern 1pt} \lambda_{k} ) $$

The $ f(\Delta ) $ was partner wavelength subset (PWS⁽¹⁾) of $ \Delta $, and its number of wavelengths was denoted by N⁽¹⁾. Theoretically, the best partner $ f(\lambda_{i} ) $ for each wavelength $ \lambda_{i} $ is unique, but several different wavelengths may have the same best partner. If some $ \lambda $ was not a best partner of any wavelength, then $ \lambda \notin $ PWS⁽¹⁾, and N⁽¹⁾ < N.

Step 2 According to the projection $ f $ defined above, the partner wavelength subset (PWS⁽²⁾) of PWS⁽¹⁾ could be obtained. It will be proved later that PWS converges to stable number of wavelengths after finite projection iterations. Suppose that PWS converges after s-times iterations, N^(s) = N^(s+1). And the PWS^(s) was called optimal partner wavelength combination (OPWC). For OPWC, each wavelength was the best partner of some other wavelength.

The proof of convergence of algorithm

Proof

(1) If $ \forall {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} {\kern 1pt} \lambda_{i} \ne \lambda_{j} $, $ f(\lambda_{i} ) \ne {\kern 1pt} f(\lambda_{j} ) $, then the projection $ f $ is a one-to-one mapping function defined on $ \Delta $, $ f(\Delta ) = \Delta $, i.e. the PWS stop shrinking after this projection.

(2) If $ \exists {\kern 1pt} {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} \lambda_{i} \ne \lambda_{j} $, $ f(\lambda_{i} ) = f(\lambda_{j} ) $, then $ f(\Delta ) $ is a proper subset of $ \Delta $, which is set as $ f(\Delta )\; = \;\left\{ {f(\lambda_{i} )\left| {\lambda_{i} \in \Delta } \right.\} = \{ \lambda_{ 1}^{ ( 1 )} ,\lambda_{ 1}^{ ( 1 )} , \ldots \lambda_{{N^{(1)} }}^{ ( 1 )} } \right\} $, N⁽¹⁾ < N. Next further consider the projection of $ f(\Delta ) $, i.e.$ f^{(2)} (\Delta ) $: (a) If $ \forall {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} \lambda_{i}^{(1)} \ne \lambda_{j}^{(1)} $, $ f(\lambda_{i}^{(1)} ) \ne {\kern 1pt} f(\lambda_{j}^{(1)} ) $, then function $ f $ is a one-to-one mapping defined on the $ f(\Delta ) $, $ f^{(2)} (\Delta ) = f(\Delta ) $, i.e. the PWS stop shrinking after this projection. b) If $ \exists {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} {\kern 1pt} \lambda_{i}^{(1)} \ne \lambda_{j}^{(1)} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} f(\lambda_{i}^{(1)} ) = f(\lambda_{j}^{(1)} ),{\kern 1pt} $ then $ f^{(2)} (\Delta ) $ is a proper subset of $ f(\Delta ) $, which is set as $ f^{(2)} (\Delta ) = \left\{ {f(\lambda_{i}^{(1)} )\left| {\lambda_{i}^{(1)} \in f(\Delta )} \right.} \right\} $$ = \left\{ {\lambda_{ 1}^{ ( 2 )} ,{\kern 1pt} {\kern 1pt} \,\lambda_{ 2}^{ ( 2 )} ,{\kern 1pt} \ldots ,{\kern 1pt} \,\lambda_{{N^{(2)} }}^{ ( 2 )} } \right\} $, N⁽²⁾ < N⁽¹⁾ < N.

Similarly considered the projection of $ f^{(s - 1)} (\Delta ) $, i.e.$ f^{(s)} (\Delta ) $: (a) If $ \forall {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} \lambda_{i}^{(s - 1)} \ne \lambda_{j}^{(s - 1)} $, $ f(\lambda_{i}^{(s - 1)} ) \ne {\kern 1pt} f(\lambda_{j}^{(s - 1)} ) $, then the function $ f $ is a one-to-one mapping defined on the $ f^{(s - 1)} (\Delta ) $, $ f^{(s)} (\Delta ) = f^{(s - 1)} (\Delta ) $, i.e. the PWS stop shrinking after this projection. (b) If $ \exists {\kern 1pt} {\kern 1pt} {\kern 1pt} i,{\kern 1pt} {\kern 1pt} j,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i \ne j,{\kern 1pt} {\kern 1pt} \lambda_{i}^{(s - 1)} \ne \lambda_{j}^{(s - 1)} $, $ f(\lambda_{i}^{(s - 1)} ) = f(\lambda_{j}^{(s - 1)} ),{\kern 1pt} $ then $ f^{(s)} (\Delta ) $ is a proper subset of $ f^{(s - 1)} (\Delta ) $, which is set as $ f^{(s)} (\Delta ) = \{{f(\lambda_{i}^{(s - 1)} )}\left| {\lambda_{i}^{(s - 1)} \in f^{(s - 1)} (\Delta )} \right.\} $$ = \{ \lambda_{ 1}^{({\text{s)}}}, \lambda_{ 2}^{{ ( {\text{s)}}}}, \ldots, \lambda_{{N^{(s)} }}^{{ ( {\text{s)}}}} \} $,$ N^{(s)} < N^{(s - 1)} < \cdots < N $. Because the total number of wavelengths (N) is limited, the number of projections needed is limited.

In this study, the wavelength screening region for GLB spanned the entire scanning region (780–2498 nm), i.e. $ \Delta = \left\{ {780,{\kern 1pt} {\kern 1pt} 782, \ldots ,{\kern 1pt} {\kern 1pt} 2498} \right\} $, with 860 wavelengths. The number of PLS factors F was set to $ F \in \{ 1,\,2, \ldots ,\,20\} $.

The computer algorithms for the three methods discussed above were designed using MATLAB version 7.6.

Results and discussion

Results with MW-PLS

The NIR spectra of 230 human serum samples in the scanning area (780–2498 nm) were shown in Fig. 1. As can be seen from the figure, absorption at about 2000 nm and 2400 nm has obviously strong noise. In order to obtain satisfactory results, wavelength selection must be carried out to overcome noise interference. For comparison, PLS model of the full spectrum region was first established. The corresponding SECV and R_P,CV were 1.423 g L⁻¹ and 0.935, respectively.

MW-PLS method was performed to optimize waveband and improve prediction accuracy. Depending on minimum SECV value, the optimal MW-PLS model was selected out. The corresponding waveband was 1504 to 1820 nm, located in the long-NIR region (1100 to 2498 nm). Prediction effects (SECV and R_P,CV) and parameters of the above two methods were summarized in Table 2. The results showed that the predicted values were highly correlated with clinical measurements for the two methods, and comparing with optimal PLS model in full spectrum region, the optimal MW-PLS model achieved better prediction effect with fewer wavelengths.

Table 2 Prediction effects of three methods

Full size table

Results with MC-UVE-PLS

The MC-UVE method was performed for eliminating the uninformative variables. Based on the parameter settings in section “MC-UVE-PLS method”, 180 wavelengths were selected, and the SECV and R_P,CV for the corresponding PLS models were 1.153 g L⁻¹ and 0.948, respectively. Compared with the result of PLS in the full spectrum range, the prediction ability of this method was not significantly improved, which may be because it only eliminates non information variables without considering the influence of interference variables, while serum is a complex system with multiple interference variables.

Results with OPWC-PLS

The OPWC method was performed for screening information wavelength based on the steps mentioned in section “OPWC-PLS method”. Firstly, 104 best partners for all 860 wavelengths were determined according to the results of LOOCV-BLR analysis, and PWS⁽¹⁾ with 104 wavelengths was obtained. Thus, the number of wavelengths was greatly reduced after the first projection. The correspondence between all 860 wavelengths and their best partners was shown in Fig. 2. As shown in the figure, some wavelengths had the same best partner, such as the 2156 nm and 2190 nm as best partners of other wavelengths appeared 3 and 8 times, respectively, so projection $ f $ was not a one-to-one mapping function in the whole spectral region $ \Delta $. Obviously, $ f(\Delta ) $ was a subset of $ \Delta $ and the projection continues.

Based on the corresponding relationship determined above, the best partner of $ \lambda_{i}^{(1)} $ was easy to be selected, and the PWS⁽²⁾ was obtained. Repeated the same process for PWS⁽²⁾, and PWS⁽³⁾ was obtained. As the projection progresses, the number of wavelengths decreased gradually until the number of wavelengths for PWS⁽⁶⁾ no longer changed. The PWS⁽⁶⁾ was the OPWC and it had only 28 wavelengths. Figure 3 showed the 28 wavelengths and their best partners. As the figure showed, the 28 wavelengths are divided into 14 groups, and the two wavelengths in each group are the best partners for each other.

Based on PLS, the LOOCVs were performed for every PWS, and the corresponding minimum SECV value and number of wavelengths (N^(s)) used are shown in Fig. 4. As shown in the figure, the N^(s) and minimum SECV values have almost the same trend. After the first projection, both of them decrease rapidly, and the remaining wavelengths are more important, so as the number of projections increases, they slowly decrease. This may be due to the removal of a large amount of noise and background information from the original spectrum after the first projection, so both the N^(s) and minimum SECV values decrease rapidly. The partner wavelength subset of the original spectrum contains less redundant information, so the N^(s) and minimum SECV values decrease slowly in the later projection iteration.

Comparison of OPWC-PLS and MW-PLS methods

Screening the information wavelengths of GLB in the human serum of a multi-component complex system is difficult and complicated. The wavelengths selected by the OPWC-PLS and MW-PLS methods, which correspond to the information of GLB, were shown in Fig. 5. As indicated in Fig. 5, the wavelengths selected by the OPWC method have a wider distribution range and partially coincides with the wavelengths selected by MW-PLS. This may be because the local characteristics of MW-PLS method make some wavelengths cannot be detected, which reflects the complexity of NIR model optimization and the commonness and difference of different methods.

Figure 6 showed the relationship between the predicted and measured GLB values based on the MW-PLS and OPWC-PLS methods, respectively. The prediction effect and corresponding parameters N and F were summarized in Table 2. The SECV and R_P,CV were 0.813 g L⁻¹ and 0.978 with OPWC-PLS, and 0.804 g L⁻¹ and 0.979 with MW-PLS, respectively. The results show that, like MW-PLS, the prediction effect of OPWC-PLS was also obviously better than that of the whole spectrum PLS, and the OPWC is an effective method for screening wavelengths. The phenomenon conveys that better prediction results can be achieved with fewer wavelengths. Thus one can conclude that it is very necessary to first perform wavelength selection before building a calibration model. The two methods had achieved almost the same good prediction results (SECV and R_P,CV). However, the optimal OPWC-PLS model adopted only 28 wavelengths, while the other adopted 159 wavelengths. Therefore, the OPWC method has great prediction performance for wavelength selection.

The differences in prediction of the OPWC-PLS and MW-PLS methods for GLB illustrate that MW-PLS can achieve higher prediction accuracy, but it is time-consuming and employs more wavelengths, while OPWC-PLS can achieve similar prediction results with MW-PLS in less time. In addition, MW-PLS, as a continuous wavelength screening method, is more suitable for determining the object with relatively concentrated molecular absorption bands; while OPWC-PLS, as a discrete wavelength screening method, may be more suitable for determining the object with relatively fragmented molecular absorption bands.

Conclusion

The change of GLB content in human serum has important reference value for clinical trial and disease diagnosis. In this study, the OPWC-PLS method was employed for rapid analysis of GLB based on NIR spectroscopy. MW-PLS and MC-UVE-PLS methods were also employed for comparison. The results indicate that, OPWC-PLS and MW-PLS methods achieved satisfactory prediction results, while the MC-UVE-PLS method was not suitable for the data set of this study, and the prediction effect of the model is not significantly improved. The optimal OPWC-PLS model adopted 28 wavelengths, and corresponding SECV and R_P,CV were 0.813 g L⁻¹ and 0.978, respectively. The optimal MW-PLS model adopted 159 wavelengths, and corresponding SECV and R_P,CV were 0.804 g L⁻¹ and 0.979, respectively. The OPWC-PLS achieved almost the same prediction effect as MW-PLS with faster speed and fewer wavelengths. Therefore, OPWC is an efficient approach for information wavelength selection.

The predicted GLB values obtained by MW-PLS and OPWC-PLS were highly correlated with the reference values. Compared with traditional method, the method based on NIR spectroscopy has the merits of rapidity, simplicity and no chemical reagent. Therefore, the results have important reference value for the rapid determination of GLB. In addition, the wavelengths selected by the two methods are partially the same, reflecting the commonness and difference of different methods.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

GLB:: Globulin
NIR:: Near infrared
OPWC:: Optimal partner wavelength combination
MW-PLS:: Moving window partial least squares
MC-UVE:: Monte Carlo uninformative variable elimination
SECV:: Root-mean-square error of cross validation of prediction
R_P,CV :: Correlation coefficient of prediction
BLR:: Binary linear regression
PWS:: Partner wavelength subset
LOOCV:: Leave-one-out cross validation
SD:: Standard deviation

References

Chen JM, Peng LJ, Han Y et al (2018) A rapid quantification method for the screening indicator for β-thalassemia with near-infrared spectroscopy. Spectrochim Acta A. 193:499–506
Article CAS Google Scholar
Han Y, Pan T, Zhou HH, Yuan R (2018) ATR-FTIR spectroscopy with equidistant combination PLS method applied for rapid determination of glycated hemoglobin. Anal Methods 10:3455–3461
Article CAS Google Scholar
Yao LJ, Tang Y, Yin ZW et al (2017) Repetition rate priority combination method based on equidistant wavelengths screening with application to NIR analysis of serum albumin. Chemom Inte Lab Syst. 162:191–196
Article CAS Google Scholar
Han Y, Chen JM, Pan T, Liu GS (2015) Determination of glycated hemoglobin using near-infrared spectroscopy combined with equidistant combination partial least squares. Chemom Intell Lab Syst. 145:84–92
Article CAS Google Scholar
Lee Y, Lee S, In JY et al (2008) Prediction of plasma hemoglobin concentration by near-infrared spectroscopy. J Korean Med Sci 23:674–677
Article CAS Google Scholar
Pan T, Liu JM, Chen JM et al (2013) Rapid determination of preliminary thalassaemia screening indicators based on near-infrared spectroscopy with wavelength selection stability. Anal Methods 5(17):4355–4362
Article CAS Google Scholar
Yao LJ, Lyu N, Chen JM et al (2016) Joint analyses model for total cholesterol and triglyceride in human serum with near-infrared spectroscopy. Spectrochim Acta A. 159:53–59
Article CAS Google Scholar
Lyu N, Chen JM, Pan T et al (2016) Near-infrared spectroscopy combined with equidistant combination partial least squares applied to multi-index analysis of corn. Infrared Phys Technol 76:648–654
Article Google Scholar
Guo HS, Chen JM, Pan T et al (2014) Vis-NIR wavelength selection for non-destructive discriminant analysis of breed screening of transgenic sugarcane. Anal Methods 6(10):8810–8816
Article CAS Google Scholar
Chen JY, Iyo C, Kawano S (2002) Effect of multiplicative scatter correction on wavelength selection for near infrared calibration to determine fat content in raw milk. J Near Infrared Spec. 10(4):301–307
Article CAS Google Scholar
Liu ZY, Liu B, Pan T et al (2013) Determination of amino acid nitrogen in tuber mustard using near-infrared spectroscopy with waveband selection stability. Spectrochim Acta A. 102:269–274
Article CAS Google Scholar
Pan T, Li MM, Chen JM (2014) Selection method of quasi-continuous wavelength combination with applications to the near-infrared spectroscopic analysis of soil organic matter. Appl Spectrosc 68(3):263–271
Article CAS Google Scholar
Pan T, Han Y, Chen JM et al (2016) Optimal partner wavelength combination method with application to near-infrared spectroscopic analysis. Chemom Intell Lab Syst. 156:217–223
Article CAS Google Scholar
Chen JM, Pan T, Liu GS et al (2014) Selection of stable equivalent wavebands for near-infrared spectroscopic analysis of total nitrogen in soil. J Innov Opt Health Sci. 7(4):1–9
Article Google Scholar
Pan T, Chen ZH, Chen JM et al (2012) Near-infrared spectroscopy with waveband selection stability for the determination of COD in sugar refinery wastewater. Anal Methods 4(4):1046–1052
Article CAS Google Scholar
Li HD, Liang YZ, Xu QS et al (2009) Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Ana Chim Acta 648:77–84
Article CAS Google Scholar
Jiang JH, Berry RJ, Siesler HW et al (2002) Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. Anal Chem 74:3555–3565
Article CAS Google Scholar
Du YP, Liang YZ, Jiang JH et al (2004) Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares. Anal Chim Acta 501(2):183–191
Article CAS Google Scholar
Cai WS, Li YK, Shao XG (2008) A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometr Intell Lab. 90:188–194
Article CAS Google Scholar
Chen YF, Chen JM, Pan T et al (2015) Correlation coefficient optimization in partial least squares regression with application to ATR-FTIR spectroscopic analysis. Anal Methods 7:5780–5786
Article CAS Google Scholar
Kim YJ, Yoon G (2002) Multicomponent assay for human serum using mid-infrared transmission spectroscopy based on component-optimized spectral region selected by a first loading vector analysis in partial least-squares regression. Appl Spectrosc 56(5):625–632
Article CAS Google Scholar
Long XL, Liu GS, Pan T et al (2014) Waveband selection of reagent-free determination for thalassemia screening indicators using Fourier transform infrared spectroscopy with attenuated total reflection. J Biomed Opt 19(8):087004
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by Youth Innovation Talents Project of Colleges and Universities in Guangdong Province (No. Q18285) and Guangdong Ocean University Scientific Research Start-up Funding for the Doctoral Program (No. R17057).

Author information

Authors and Affiliations

Department of Data Science, Guangdong Ocean University, Haida Road 1, Mazhang District, Zhanjiang, 524088, China
Yun Han, Huihui Zhou & Xuesong Kuang
Zhanjiang No. 2 High School Hai Dong, Potou District, Zhanjiang, 524057, China
Yun Zhong

Authors

Yun Han
View author publications
You can also search for this author in PubMed Google Scholar
Yun Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Huihui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Kuang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YH analyzed the spectral data of human serum samples and optimized the wavelength model, and was a major contributor in writing the manuscript. YZ and HZ carried out the spectrum experiment. XK performed model validation. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xuesong Kuang.

Ethics declarations

Consent statement

This study was approved by Experimental Animal Management Committee of Guangdong Ocean University, and every individual participant provided written informed consent. All individual participants were voluntary and their all information is confidential. The study protocol was performed in accordance with relevant laws and institutional guidelines.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Han, Y., Zhong, Y., Zhou, H. et al. Optimal partner wavelength combination method applied to NIR spectroscopic analysis of human serum globulin. BMC Chemistry 14, 37 (2020). https://doi.org/10.1186/s13065-020-00689-z

Download citation

Received: 24 December 2019
Accepted: 16 May 2020
Published: 24 May 2020
DOI: https://doi.org/10.1186/s13065-020-00689-z

Optimal partner wavelength combination method applied to NIR spectroscopic analysis of human serum globulin

Abstract

Introduction