Due to the nature of microarray data, the analysis of genes/features for disease diagnosis is a challenging task. Generally, the data comes in the form of a 2D matrix, where the row represents the genes and the column...
详细信息
Due to the nature of microarray data, the analysis of genes/features for disease diagnosis is a challenging task. Generally, the data comes in the form of a 2D matrix, where the row represents the genes and the column indicates the various conditions. Bi-clustering is an emerging technique that can efficiently reveal patterns of genes. It can perform simultaneously with a subset of genes and conditions. Inspired by this, dynamic bi-clustering based on an improved genetic algorithm (GA) is proposed. The chromosomes are efficiently designed. In addition, the fitness function is derived by considering multiple conflicting objectives to measure the quality of a cluster. A novel mutation is designed by the correlation technique. The crossover and mutation rates are dynamically changed. The obtained outcomes of the proposed approach are compared with the various existing approaches, such as traditional GA, the dynamic dame parallel GA, the evolutionary local search algorithm, bi-phase evolutionary searching, and the evolutionary bi-clustering algorithm. Further, statistical tests such as the analysis of variance and Friedman test are executed to show the significance of the proposed model. A biological analysis is also performed.
Background and Objective: The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selec...
详细信息
Background and Objective: The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. Methods: In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. Results: We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. Conclusions: The hybrid feature selection method proposed in this paper helps address the issue of highdimensional microarray data with few samples. It can select a small subset of
Cancer remains a significant cause of mortality, and the application of microarray technology has opened new avenues for cancer diagnosis and treatment. However, due to the challenges in sample acquisition, the geneti...
详细信息
Cancer remains a significant cause of mortality, and the application of microarray technology has opened new avenues for cancer diagnosis and treatment. However, due to the challenges in sample acquisition, the genetic dimension of microarray data surpasses the sample dimension, resulting in high-dimensional small sample data. Effective feature selection is crucial for identifying biomarkers and facilitating further analysis. However, existing methods struggle to fully exploit the interdependencies among genes, such as regulatory networks and pathways, to guide the feature selection process and construct efficient classification models. In this paper, we propose a novel feature selection algorithm and classification model based on graph neural networks to address these challenges. Our proposed method employs a multidimensional graph to capture intricate gene interactions. We leverage link prediction techniques to enhance the graph structure relationships and employ a multidimensional node evaluator alongside a supernode discovery algorithm based on spectral clustering for initial node filtering. Subsequently, a hierarchical graph pooling technique based on downsampling is used to further refine node selection for feature extraction and model building. We evaluate the proposed method on nine publicly available microarray datasets, and the results demonstrate its superiority over both classical and advanced feature selection techniques in various evaluation metrics. This highlights the effectiveness and advancement of our proposed approach in addressing the complexities associated with microarray data analysis and cancer classification.
Gene selection is a process of selecting discriminative genes from microarray data that helps to diagnose and classify cancer samples effectively. Swarm intelligence evolution-based gene selection algorithms can never...
详细信息
Gene selection is a process of selecting discriminative genes from microarray data that helps to diagnose and classify cancer samples effectively. Swarm intelligence evolution-based gene selection algorithms can never circumvent the problem that the population is prone to local optima in the process of gene selection. To tackle this challenge, previous research has focused primarily on two aspects: mitigating premature convergence to local optima and escaping from local optima. In contrast to these strategies, this paper introduces a novel perspective by adopting reverse thinking, where the issue of local optima is seen as an opportunity rather than an obstacle. Building on this foundation, we propose MOMOGS-PCE, a novel gene selection approach that effectively exploits the advantageous characteristics of populations trapped in local optima to uncover global optimal solutions. Specifically, MOMOGS-PCE employs a novel population initialization strategy, which involves the initialization of multiple populations that explore diverse orientations to foster distinct population characteristics. The subsequent step involved the utilization of an enhanced NSGA-II algorithm to amplify the advantageous characteristics exhibited by the population. Finally, a novel exchange strategy is proposed to facilitate the transfer of characteristics between populations that have reached near maturity in evolution, thereby promoting further population evolution and enhancing the search for more optimal gene subsets. The experimental results demonstrated that MOMOGS-PCE exhibited significant advantages in comprehensive indicators compared with six competitive multi-objective gene selection algorithms. It is confirmed that the "reverse-thinking" approach not only avoids local optima but also leverages it to uncover superior gene subsets for cancer diagnosis.
Background EndoPredict (R) (EP) is a multigene assay to predict distant recurrence risk in luminal breast cancer. EP measures the expression of 12 genes in primary tumor by qRT-PCR from formalin-fixed paraffin-embedde...
详细信息
Background EndoPredict (R) (EP) is a multigene assay to predict distant recurrence risk in luminal breast cancer. EP measures the expression of 12 genes in primary tumor by qRT-PCR from formalin-fixed paraffin-embedded (FFPE) tissues and calculates EP risk score that indicates the risk of distant recurrence. We evaluated the performance of EP in predicting distant recurrence risk using microarray data from fresh frozen (FF) tissues. We also examined the applicability of EP to microarray data from FFPE tissues. Methods We analyzed the publicly available data of 431 node-negative and 270 node-positive patients with luminal breast cancer who received endocrine therapy alone. We evaluated the prognostic value of EP using microarray data from FF tissues. Next, we created an algorithm to calculate EP risk score using microarray data from FFPE tissues. We examined the correlation coefficient of EP risk score and concordance rate of EP risk high/low using microarray data from FFPE/FF tissue pairs in a validation set of 39 patients. Results In 431 node-negative patients, the distant recurrence-free survival (DRFS) rate was significantly worse in those with high EP risk scores (P = 3.68 x 10(-6), log-rank). The 5-year DRFS was 95.2% in those with low EP risk score. In the validation set, the correlation coefficient of EP risk score was 0.93 and the concordance rate of EP risk high/low was 91.7%. Conclusions EP using microarray data from FF tissues was useful in predicting distant recurrence risk in luminal breast cancer, and EP might be utilized in microarray data from FFPE tissues.
microarray data have become an integral part of the clinical and drug discovery process. Due to its voluminous and heterogeneous nature, the question arises of the interpretability and stability of the traditional gen...
详细信息
microarray data have become an integral part of the clinical and drug discovery process. Due to its voluminous and heterogeneous nature, the question arises of the interpretability and stability of the traditional gene selection method. To enhance the stability of the gene selection method, so that the results are better explicable, an ameliorated Extended ReliefF gene selection algorithm is proposed. It encodes gene affinity information using a new mathematical formula based on Bayes' theorem and Manhattan distance for calculating the nearest neighbor in a pooled sample. It works in four aspects: initializing sample gene weight, improving gene weight, maximizing sample gene weight and finally adopting mutation operation. The proposed method selects the most informative genes which are highly perceptive to the prognosis of the disease. Further, to accomplish the accuracy and stability of the algorithm, soft classification is performed on Relieved_F, STIR, VLS-RelifF, I-RelieF, conventional ReliefF and proposed extended ReliefF algorithms using three classifiers namely Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) on ten microarray datasets. According to the findings, MLP training times are much longer than those of RF and SVM. From a network perspective, SVM is much faster at training, whereas MLP excels in terms of accuracy. With a rise in gene similarity among the genes selected from the multiple training sets, the approach becomes more stable. As a result, it can be seen that the recommended gene selection algorithm greatly outperforms the other feature selection methods in terms of accuracy and stability.
In feature selection research, simultaneous multi-class feature selection technologies are popular because they simultaneously select informative features for all classes. Recursive feature elimination (RFE) methods a...
详细信息
In feature selection research, simultaneous multi-class feature selection technologies are popular because they simultaneously select informative features for all classes. Recursive feature elimination (RFE) methods are state-of-the-art binary feature selection algorithms. However, extending existing RFE algorithms to multi-class tasks may increase the computational cost and lead to performance degradation. With this motivation, we introduce a unified multi-class feature selection (UFS) framework for randomization-based neural networks to address these challenges. First, we propose a new multi-class feature ranking criterion using the output weights of neural networks. The heuristic underlying this criterion is that "the importance of a feature should be related to the magnitude of the output weights of a neural network". Subsequently, the UFS framework utilizes the original features to construct a training model based on a randomization-based neural network, ranks these features by the criterion of the norm of the output weights, and recursively removes a feature with the lowest ranking score. Extensive experiments on 15 real-world datasets suggest that our proposed framework outperforms state-of-the-art algorithms. The code of UFS is available at https://***/SVMrelated/***.
The microarray data contains the high volume of genes having multiple values of expressions and small number of samples. Therefore, the selection of gene from microarray data is an extremely challenging and important ...
详细信息
The microarray data contains the high volume of genes having multiple values of expressions and small number of samples. Therefore, the selection of gene from microarray data is an extremely challenging and important issue to analyze the biological behavior of features. In this context, dynamic scaling factor based differential evolution (DE) with multi-layer perceptron (MLP) is designed for selection of genes from pathway information of microarray data. At first DE is employed to select the relevant and lesser number of genes. Then MLP is used to build a classifier model over the selected genes. A suitable and efficient representation of vector is designed for DE. The fitness function is derived separately as T-score, classification accuracy and weight sum approach of both. Simulation and further analysis is performed in terms of sensitivity, specificity, accuracy and F-score. Moreover, statistical and biological analysis are also conducted.
Backgroundmicroarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large ...
详细信息
Backgroundmicroarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer *** study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes *** proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
In the last decades, data has grown exponentially with respect to the number of samples and features. This makes the feature selection (FS) more challenging. In this paper, an optimization method called the multimodal...
详细信息
In the last decades, data has grown exponentially with respect to the number of samples and features. This makes the feature selection (FS) more challenging. In this paper, an optimization method called the multimodal optimization (MMO) technique is employed to find multiple optimal solutions instead of a single solution. The main contribution of the MMO technique is to provide multiple optimal solutions, instead of a single solution. Using the hidden information in the data and creating an ensemble of classifiers, the potential and information of multiple answers provided by MMO are used to address the issue of FS from microarray data. After pre-processing of the data, to benefit from the potential and information of multiple answers, the optimal features subset are obtained by a firefly-based MMO algorithm. The mutual information method is used as the fitness function to evaluate the proposed subset of features. Then, each feature subset is used to train a classifier and the classifiers are trained by the data, the features of which are presented by a MMO algorithm, and these classifiers make an ensemble. To select a proper combination, a particle swarm optimization algorithm is used. Finally, the algorithm for the datasets of the microarray is evaluated in terms of cancer diagnosis. The proposed method efficiency is evaluated by applying on 11 datasets. The results indicate the superiority and proper performance of the multimodal FS method compared to other methods.
暂无评论