This paper proposes a chemometric method for evaluating the viability of spinach seeds using near infrared (NIR) spectroscopy and successive projections algorithms (SPA). An essential step of the procedure is to apply...
详细信息
This paper proposes a chemometric method for evaluating the viability of spinach seeds using near infrared (NIR) spectroscopy and successive projections algorithms (SPA). An essential step of the procedure is to apply the SPA to optimize the choice of variables for multivariate classification. Variable selection using SPA has been described as an optimization problem in which a cost function is minimized. Selecting the correct variables makes the chemometric models more complete, precise, accurate, and less complex. The NIR spectra were processed using the Savitszky-Golay and multiplicative scatter correction techniques. After that, the best wavelength subset was selected using SPA. Different classification techniques are then applied to the dimension-reduced data to determine the seeds' viability. The results show that the proposed method is less complex compared to existing canonical variance methods (1.7% miscalculation error in the proposed way) and is also easier to implement.
The chlorophyll content and hardness are critical indicators for evaluating vegetable quality. To overcome the drawbacks of traditional detection methods, Raman spectroscopy was investigated for the determination of c...
详细信息
The chlorophyll content and hardness are critical indicators for evaluating vegetable quality. To overcome the drawbacks of traditional detection methods, Raman spectroscopy was investigated for the determination of chlorophyll content and hardness in cucumbers. Cucumbers at different storage periods were analyzed and a successive projections algorithm - extreme learning machine (SPA-ELM) method was employed to establish a model for chlorophyll content and hardness. The Raman spectra were preprocessed to reduce noise and minimize the background fluorescence. Subsequently, SPA was used to select characteristic wavelengths for chlorophyll content and hardness using 19 and 26 characteristic wavelengths, respectively. ELM was employed to establish a model based on the selected characteristic wavelengths. The predicted results by ELM were compared with those obtained using partial least squares (PLS) and support vector machine (SVM). The results showed that the best accuracy was obtained using the SPA-ELM algorithm. The coefficients of determination (R-2) of SPA-ELM model for chlorophyll content and hardness were 0.9569 and 0.9659. The root mean square error (RMSE) values were 0.0038 and 0.3570, respectively. A good correlation coefficient and small RMSE value were obtained, indicating the results to be highly accurate and reliable. Raman spectroscopy combined with SPA-ELM method was shown to rapidly and accurately evaluate the chlorophyll content and hardness of cucumbers.
The aim of the successive projections algorithm (SPA) is to enhance the accuracy of multiple linear regressions (MLR) by minimizing the impact of collinearity effects in the calibration data set. Combining SPA with ML...
详细信息
The aim of the successive projections algorithm (SPA) is to enhance the accuracy of multiple linear regressions (MLR) by minimizing the impact of collinearity effects in the calibration data set. Combining SPA with MLR as a variable selection approach has resulted in the SPA-MLR method, which has been reported in literature to produce models with good prediction ability compared to conventional full-spectrum models obtained with partial-least-squares (PLS) in some cases. This paper proposes the addition of a filter step to the current version of the SPA algorithm to reduce the number of uninformative variables before the projection phase and assist the algorithm in selecting the best variables on subsequent steps. The proposed fSPA-MLR algorithm is evaluated in two case studies involving the near-infrared spectrometric analysis of pharmaceutical tablet and diesel/biodiesel mixture samples. Compared to PLS, the fSPA-MLR models demonstrate similar or better performance. Moreover, the fSPA-MLR models outperform the original SPA-MLR in both cross-validation and external prediction. The fSPA-MLR models deliver superior results regardless of the pre-processing algorithm tested, including firstderivative Savitzky-Golay (SG) and Standard Normal Variate (SNV), or even in raw spectra data.
This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the successive projections algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PL...
详细信息
This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the successive projections algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions.
Determining fat content in hamburgers is very important to minimize or control the negative effects of fat on human health, effects such as cardiovascular diseases and obesity, which are caused by the high consumption...
详细信息
Determining fat content in hamburgers is very important to minimize or control the negative effects of fat on human health, effects such as cardiovascular diseases and obesity, which are caused by the high consumption of saturated fatty acids and cholesterol. This study proposed an altemative analytical method based on Near Infrared Spectroscopy (NIR) and successive projections algorithm for interval selection in Partial Least Squares regression (iSPA-PLS) for fat content determination in commercial chicken hamburgers. For this, 70 hamburger samples with a fat content ranging from 14.27 to 32.12 mg kg(-1) were prepared based on the upper limit recommended by the Argentinean Food Codex, which is 20% (w w(-1)). NIR spectra were then recorded and then preprocessed by applying different approaches: base line correction, SNV, MSC, and Savitzky-Golay smoothing. For comparison, full-spectrum PLS and the Interval PIS are also used. The best performance for the prediction set was obtained for the first derivative Savitzky-Golay smoothing with a second-order polynomial and window size of 19 points, achieving a coefficient of correlation of 0.94, RMSEP of 1.59 mg kg(-1), REP of 7.69% and RPD of 3.02. The proposed methodology represents an excellent alternative to the conventional Soxhlet extraction method, since waste generation is avoided, yet without the use of either chemical reagents or solvents, which follows the primary principles of Green Chemistry. The new method was successfully applied to chicken hamburger analysis, and the results agreed with those with reference values at a 95% confidence level, making it very attractive for routine analysis. (C) 2017 Elsevier B.V. All rights reserved.
Randomization based methods for training neural networks have gained increasing attention in recent years and achieved remarkable performances on a wide variety of tasks. The interest in such methods relies on the fac...
详细信息
Randomization based methods for training neural networks have gained increasing attention in recent years and achieved remarkable performances on a wide variety of tasks. The interest in such methods relies on the fact that standard gradient based learning algorithms may often converge to local minima and are usually time consuming. Despite the good performance achieved by Randomization Based Neural Networks (RNNs), the random feature mapping procedure may generate redundant information, leading to suboptimal solutions. To overcome this problem, some strategies have been used such as feature selection, hidden neuron pruning and ensemble methods. Feature selection methods discard redundant information from the original dataset. Pruning methods eliminate hidden nodes with redundant information. Ensemble methods combine multiple models to generate a single one. Selective ensemble methods select a subset of all available models to generate the final model. In this paper, we propose a selective ensemble of RNNs based on the successive projections algorithm (SPA), for regression problems. The proposed method, named Selective Ensemble of RNNs using the successive projections algorithm (SERS), employs the SPA for three distinct tasks: feature selection, pruning and ensemble selection. SPA was originally developed as a feature selection technique and has been recently employed for RNN pruning. Herein, we show that it can also be employed for ensemble selection. The proposed framework was used to develop three selective ensemble models based on the three RNNs: Extreme Learning Machines (ELM), Feedforward Neural Network with Random Weights (FNNRW) and Random Vector Functional Link (RVFL). The performances of SERS-ELM, SERS-FNNRW and SERS-RVFL were assessed in terms of model accuracy and model complexity in several real world benchmark problems. Comparisons to related methods showed that SERS variants achieved similar accuracies with significant model complexity reduction. Amon
Multivariate models have been widely used in analytical problems involving quantitative and qualitative analyzes. However, there are cases in which a model is not applicable to spectra of samples obtained under new ex...
详细信息
Multivariate models have been widely used in analytical problems involving quantitative and qualitative analyzes. However, there are cases in which a model is not applicable to spectra of samples obtained under new experimental conditions or in an instrument not involved in the modeling step. A solution to this problem is the transfer of multivariate models, usually performed using standardization of the spectral responses or enhancement of the robustness of the model. This present paper proposes two new criteria for selection of robust variables for classification transfer employing the successive projections algorithm (SPA). These variables are then used to build models based on linear discriminant analysis (LDA) with low sensitivity with respect to the differences between the responses of the instruments involved. For this purpose, transfer samples are included in the calculation of the cost for each subset of variables under consideration. The proposed methods are evaluated for two case studies involving identification of adulteration of extra virgin olive oil (EVOO) and hydrated ethyl alcohol fuel (HEAF) using UV-Vis and NIR spectroscopy, respectively. In both cases, similar or better classification transfer results (obtained for a test set measured on the secondary instrument) employing the two criteria were obtained in comparison with direct standardization (DS) and piecewise direct standardization (PDS). For the UV-Vis data, both proposed criteria achieved the correct classification rate (CCR) of 85%, while the best CCR obtained for the standardization methods was 81% for DS. For the NIR data, 92.5% of CCR was obtained by both criteria as well as DS. The results demonstrated the possibility of using either of the criteria proposed for building robust models as an alternative to the standardization of spectral responses for transfer of classification. (C) 2017 Elsevier B.V. All rights reserved.
作者:
Wang, JunjieShi, TiezhuLiu, HuizengWu, GuofengShenzhen Univ
Natl Adm Surveying Mapping & Geoinformat Key Lab Geoenvironm Monitoring Coastal Zone Shenzhen 518060 Peoples R China Shenzhen Univ
Shenzhen Key Lab Spatial Temporal Smart Sensing & Shenzhen 518060 Peoples R China Shenzhen Univ
Coll Life & Marine Sci Shenzhen 518060 Peoples R China
Phosphorus (P) is essential for plant growth and development. Very few studies have reported the use of hyperspectral three-band vegetation indices (TBVIs) in foliar P estimation. Further, the optimal TBVI is generall...
详细信息
Phosphorus (P) is essential for plant growth and development. Very few studies have reported the use of hyperspectral three-band vegetation indices (TBVIs) in foliar P estimation. Further, the optimal TBVI is generally chosen from millions of all possible band combinations. This study aimed to investigate resampling and two wavelength selection methods (genetic algorithm (GA) and successive projections algorithm (SPA)) in deriving TBVIs for foliar P estimation and further to compare the performances of the newly developed TBVIs and published VIs. A total of 137 field-based canopy hyperspectral reflectance (350-2500 nm) of Carex (C. cinerascens) were obtained and reduced to 1603 wavelengths due to spectral noises. Considering both the original and first derivative reflectance spectra, their resampled wavelengths and selected wavelengths by GA and SPA were employed to derive TBVIs. A total of 24 selected TBVI models were calibrated for foliar P estimation with the training dataset, and they were independently validated with the test dataset. The root mean square error of validation (RMSEvai), determination coefficient of validation (R-val(2)) and residual prediction deviation (RPD) values were calculated to evaluate the performance of each model. The results demonstrated that 5474, 1972 and 1.2 s in average was taken in calculating all possible TBVIs using resampling, GA and SPA, respectively. Two SPA-based TBVIs, i.e. (rho 760-rho 2387)/(rho 723-rho 2387) (rho lambda, original reflectance) and (rho'(728)-rho'(319) + 2 rho'(714))/(rho'(729)+ rho'(1319)- 2 rho(714)) (gx, first derivative reflectance), had the best model performances (na, = 0.680, RMSEvai = 0.040%, RPD = 1.75;R-val(2) = 0.692, RMSEval = 0.039%, RPD =1.80) in foliar P estimation among the 24 TBVIs. Compared with 15 published VIs (R-val(2)< 0.64, RPD < 1.64), the two SPA-based TBVIs exhibited better validation performances. We concluded that SPA has the great potential for TBVI derivation due to the reduc
OBJECTIVE: To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifical...
详细信息
OBJECTIVE: To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. METHODS: The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 mu g mL(-1) for MP, and from 0.1 to 25 mu g mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. RESULTS: The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was ca
Extreme Learning Machine (ELM) is a recently proposed machine learning method with successful applications in many domains. The key strengths of ELM are its simple formulation and the reduced number of hyper-parameter...
详细信息
Extreme Learning Machine (ELM) is a recently proposed machine learning method with successful applications in many domains. The key strengths of ELM are its simple formulation and the reduced number of hyper-parameters. Among these hyper-parameters, the number of hidden nodes has significant impact on ELM performance since too few/many hidden nodes may lead to underfitting/overfitting. In this work, we propose a pruning strategy for ELM using the successive projections algorithm (SPA) as an approach to automatically find the number of hidden nodes. The SPA was originally proposed for variable selection. In this work, it is adapted in order to prune ELMs. The proposed method is compared to the Optimally Pruned Extreme Learning Machine algorithm (OP-ELM), which is considered as a state of the art method. Real world datasets were used to assess the performance of the proposed method for regression and classification problems. The application of the proposed model resulted in much simpler models with similar performance compared to the OP-ELM. For some classification instances, the performance of the proposed method outperformed the OP-ELM method.
暂无评论