Clustering has always been playing a vital role in many different disciplines because it is an important tool for analyzing a set of unknown input patterns. However, some important issues related to clustering, such a...
详细信息
ISBN:
(纸本)9781479938414
Clustering has always been playing a vital role in many different disciplines because it is an important tool for analyzing a set of unknown input patterns. However, some important issues related to clustering, such as automatically determining the number of clusters and partitioning non-linearly separable data, are never fully solved even though many researchers work on this subject for a long time. As such, a novel method based on the so-called elasticnet clustering algorithm is presented in this paper to deal with exactly the two issues: partitioning non-linearly separable data and automatically determining the number of clusters. To evaluate the performance of the proposed algorithm, several well-known datasets are used. The experimental results show that not only can the proposed algorithm find the appropriate number of clusters, but it can also provide a higher accuracy rate than all the other methods compared in this study for most datasets.
Background: SnoRNAs (Small nucleolar RNAs) are small RNA molecules with approximately 60-300 nucleotides in sequence length. They have been proved to play important roles in cancer occurrence and progression. It is of...
详细信息
Background: SnoRNAs (Small nucleolar RNAs) are small RNA molecules with approximately 60-300 nucleotides in sequence length. They have been proved to play important roles in cancer occurrence and progression. It is of great clinical importance to identify new snoRNAs as fast and accurately as possible. Objective: A novel algorithm, ESDA (elastically Sparse Partial Least Squares Discriminant Analysis), was proposed to improve the speed and the performance of recognizing snoRNAs from other RNAs in human genomes. Methods: In ESDA algorithm, to optimize the extracted information, kernel features were selected from the variables extracted from both primary sequences and secondary structures. Then they were used by SPLSDA (sparse partial least squares discriminant analysis) algorithm as input variables for the final classification model training to distinguish snoRNA sequences from other Human RNAs. Due to the fact that no prior biological knowledge is request to optimize the classification model, ESDA is a very practical method especially for completely new sequences. Results: 89 H/ACA snoRNAs and 269 C/D snoRNAs of human were used as positive samples and 3403 non-snoRNAs as negative samples to test the identification performance of the proposed ESDA. For the H/ACA snoRNAs identification, the sensitivity and specificity were respectively as high as 99.6% and 98.8%. For C/D snoRNAs, they were respectively 96.1% and 98.3%. Furthermore, we compared ESDA with other widely used algorithms and classifiers: SnoReport, RF (Random Forest), DWD (Distance Weighted Discrimination) and SVM (Support Vector Machine). The highest improvement of accuracy obtained by ESDA was 25.1%. Conclusion: Strongly proved the superiority performance of ESDA and make it promising for identifying SnoRNAs for further development of the precision medicine for cancers.
As an outstanding discriminant analysis technique, Fisher discriminant analysis (FDA) gained extensive attention in supervised dimensionality reduction and fault diagnosis fields. However, it typically ignores the mul...
详细信息
As an outstanding discriminant analysis technique, Fisher discriminant analysis (FDA) gained extensive attention in supervised dimensionality reduction and fault diagnosis fields. However, it typically ignores the multimodality within the measured data, which may cause infeasibility in practice. In addition, it generally incorporates all process variables without emphasizing the key faulty ones when modeling the complex process, thus leading to degraded fault classification capability and poor model interpretability. To ease the above two drawbacks of conventional FDA, this brief presents an advantageously sparse local FDA (SLFDA) model, it first preserves the within-class multimodality by introducing local weighting factors into scatter matrix. Then, the responsible faulty variables are identified automatically through the elastic net algorithm, and the current optimization problem is subsequently settled through the feasible gradient direction method. Since then, the local data structure characteristics are exploited from both the sample dimension and variable dimension so that the fault diagnosis performance and model interpretability are significantly enhanced. In addition, we naturally extend SLFDA model to nonlinear variant (i.e., sparse kernel local FDA) by the kernel trick, which is substantially more resistant to strong nonlinearity. The simulation studies on Tennessee Eastman (TE) benchmark process and real-world diesel engine working process both validate that the novel diagnosis strategy is more accurate and reliable than the existing state-of-the-art methods.
Aiming at the problem of insufficient accuracy and timeliness of transmission line parameters in the grid energy management system (EMS) parameter library, a dynamic optimization method of transmission line parameters...
详细信息
Aiming at the problem of insufficient accuracy and timeliness of transmission line parameters in the grid energy management system (EMS) parameter library, a dynamic optimization method of transmission line parameters based on grey support vector regression is proposed. Firstly, the influence of operating conditions and meteorological factors on the changes of parameters is analyzed. Based on this, the correlation quantification method of transmission line parameters is designed based on Pearson coefficient, and the influence coefficient value is obtained. Then, with the influence coefficient as the constraint condition, a method for selecting strong influence characteristics of line parameters based on improved elasticnet is proposed. Finally, based on the grey prediction theory, a grey support vector regression (GM-SVR) parameter optimization model is constructed to realize the dynamic optimization of line parameter values under the power grid operation state. The effectiveness and feasibility of the proposed method is verified through the commissioning of the reactance parameters of the actual local loop network transmission line.
Background: In a recent trial of milk oral immunotherapy (MOIT) with or without omalizumab in 55 patients with milk allergy treated for 28 months, 44 of 55 subjects passed a 10-g desensitization milk protein challenge...
详细信息
Background: In a recent trial of milk oral immunotherapy (MOIT) with or without omalizumab in 55 patients with milk allergy treated for 28 months, 44 of 55 subjects passed a 10-g desensitization milk protein challenge;23 of 55 subjects passed the 10-g sustained unresponsiveness (SU) challenge 8 weeks after discontinuing MOIT. Objective: We sought to determine whether IgE and IgG(4) antibody binding to allergenic milk protein epitopes changes with MOIT and whether this could predict the development of SU. Methods: By using a novel high-throughput Luminex-based assay to quantitate IgE and IgG(4) antibody binding to 66 sequential epitopes on 5 milk proteins, serum samples from 47 subjects were evaluated before and after MOIT. Machine learning strategies were used to predict whether a subject would have SU after 8 weeks of MOIT discontinuation. Results: MOIT profoundly altered IgE and IgG(4) binding to epitopes, regardless of treatment outcome. At the initiation of MOIT, subjects achieving SU exhibited significantly less antibody binding to 40 allergenic epitopes than subjects who were desensitized only (false discovery rate <= 0.05 and fold change > 1.5). Based on baseline epitope-specific antibody binding, we developed predictive models of SU. Using simulations, we show that, on average, IgE-binding epitopes alone perform significantly better than models using standard serum component proteins (average area under the curve, > 97% vs 80%). The optimum model using 6 IgE-binding epitopes achieved a 95% area under the curve and 87% accuracy. Conclusion: Despite the relatively small sample size, we have shown that by measuring the epitope repertoire, we can build reliable models to predict the probability of SU after MOIT. Baseline epitope profiles appear more predictive of MOIT response than those based on serum component proteins.
The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to devel...
详细信息
The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.
The prediction of protein conformation from its amino-acid sequence is one of the most prominent problems in computational biology. But it is NP-hard. Here, we focus on an abstraction widely studied of this problem, t...
详细信息
The prediction of protein conformation from its amino-acid sequence is one of the most prominent problems in computational biology. But it is NP-hard. Here, we focus on an abstraction widely studied of this problem, the two-dimensional hydrophobic-polar protein folding problem (2D HP PFP). Mathematical optimal model of free energy of protein is established. Native conformations are often sought using stochastic sampling methods, but which are slow. The elasticnet (EN) algorithm is one of fast deterministic methods as travelling salesman problem (TSP) strategies. However, it cannot be applied directly to protein folding problem, because of fundamental differences in the two types of problems. In this paper, how the 2D HP protein folding problem can be framed in terms of TSP is shown. Combination of the modified elastic net algorithm and novel local search method is adopted to solve this problem. To our knowledge, this is the first application of EN algorithm to 2D HP model. The results indicate that our approach can find more optimal conformations and is simple to implement, computationally efficient and fast. (c) 2006 Elsevier B.V. All rights reserved.
暂无评论