检索结果-内蒙古大学图书馆

An integrative gene selection with association analysis for microarray data classification (vol 18, pg 739, 2014)

INTELLIGENT data ANALYSIS 2014年第5期18卷 993-993页

作者： Fang, Ong Huey Mustapha, Norwati Sulaiman, Md. Nasir UCSI Univ FOBIS Sch Informat Technol Kuala Lumpur Malaysia Univ Putra Malaysia FSKTM Dept Comp Sci Serdang 43400 Selangor Malaysia

The rising interest in integrative approach has shifted gene selection from purely data-centric to incorporating additional biological knowledge. Integrative gene selection is viewed as a promising approach in microarray data classification that took into consideration the complex relationships among genes. However, in most of the existing methods, the selection of genes is still based on expression values alone and biological knowledge is integrated at the end of analysis to verify experimental results or to gain biological insights. Thus, this paper proposed an integrative gene selection based on filter method and association analysis for selecting genes that are not only differentially expressed but also informative for classification. Association analysis is employed to integrate microarray data with multiple types of biological knowledge simultaneously, and to identify groups of genes that are frequently co-occurred in target samples. It has been tested on four cancer-related datasets, and two types of biological knowledge are incorporated, namely Gene Ontology (GO) and KEGG Pathways (KEGG). The experimental results show that the recommended GO based models, KEGG based models, and GO-KEGG based models outperformed the expression-only models by attaining better classification accuracies with lesser number of genes. The performance of the integrative models verified the efficiency and scalability of association analysis in mining microarray data.

关键词： Association analysis classification gene selection integrative microarray data

来源：评论

学校读者我要写书评

暂无评论

Methodology to identify a gene expression signature by merging microarray datasets

引用

COMPUTERS IN BIOLOGY AND MEDICINE 2023年 159卷 106867-106867页

作者： Fajarda, Olga Almeida, Joao Rafael Duarte-Pereira, Sara Silva, Raquel M. Oliveira, Jose Luis Univ Aveiro DETI IEETA LASI Aveiro Portugal Univ A Coruna Dept Computat La Coruna Spain Univ Aveiro Dept Med Sci Aveiro Portugal Univ Aveiro iBiMED Inst Biomed Aveiro Portugal Univ Catolica Portuguesa Fac Dent Med FMD Ctr Interdisciplinary Res Hlth CIIS Viseu Portugal

A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://***/bioinformatics-ua/MicroGES.

关键词： microarray data Gene expression signature Random forest LSVM Neural network Heart failure Autism spectrum disorder

来源：评论

学校读者我要写书评

暂无评论

Feature selection using guided population based genetic algorithm with modified crossover and parent selection

引用

APPLIED SOFT COMPUTING 2025年 172卷

作者： Naskar, Anurup Ghosh, Soumyajit Kundu, Mahantapas Sarkar, Ram Jadavpur Univ Dept Comp Sci & Engn Kolkata India Calcutta Univ AK Choudhury Sch Informat Technol Kolkata India

In the contemporary landscape, the imperative for cost-effective solutions is paramount, especially when dealing with extensively large dimensional datasets like gene expression datasets. The use of machine learning and data mining techniques in processing these voluminous and complex datasets presents a significant challenge in terms of time and resource consumption. A notable obstacle in dataset analysis is the prevalence of extraneous features or attributes. This is particularly evident innumerous medical datasets, which are often burdened with unnecessary attributes, complicating the task of classifications or prediction algorithms in obtaining precise results. However, the application of metaheuristic optimization algorithms shows remarkable proficiency in isolating pertinent feature vectors, thus markedly improving the efficiency and cost-effectiveness of data processing endeavors. We propose a novel feature selection method using a Genetic Algorithm (GA) that enhances initial population diversity by clustering features during initialization. The paper also introduces a modified crossover technique for generating offspring and employs an adaptive threshold-based Roulette Wheel for parent selection, ensuring effective feature selection. We evaluate the proposed feature selection method on 17 UCI datasets with 3 of them having a very high number of features and the obtained results are found to be better than many state-of-the-art methods both in terms of the classification accuracy and the reduction in the number of features. We also apply our method on 5 microarray-based gene expression datasets, used for the prediction of cancer, in order to ensure scalability and robustness of our method as a feature selector in real-life scenarios. This link provides the source code of the proposed method.

关键词： Genetic algorithm Modified crossover Feature selection Adaptive threshold based roulette UCI data microarray data

来源：评论

学校读者我要写书评

暂无评论

ieGENES: A machine learning method for selecting differentially expressed genes in cancer studies

引用

JOURNAL OF BIOMEDICAL INFORMATICS 2025年 164卷 104803页

作者： Xia, Xiao-Lei Zhou, Shang-Ming Liu, Yunguang Lin, Na Overton, M. Queens Univ Belfast Inst Elect Commun & Informat Technol Ctr Secure Informat Technol Belfast BT3 9DT North Ireland Queens Univ Belfast Hlth Data Res Wales & Northern Ireland 97 Lisburn Rd Belfast BT9 7AE North Ireland Univ Plymouth Fac Hlth Ctr Hlth Technol Plymouth PL4 8AA England Youjiang Med Univ Nationalities Affiliated Hosp Dept Pediat Baise Peoples R China Queens Univ Belfast Patrick G Johnston Ctr Canc Res 97 Lisburn Rd Belfast BT9 7AE North Ireland

Gene selection is crucial for cancer classification using microarray data. In the interests of improving cancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for gene selection. First we proposed a parsimonious kernel machine regularization (PKMR) model by using ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable estimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identify relevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy. In particular, we developed a new methodology to optimally update model parameters upon gene removal. The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods. Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. In terms of gene selection accuracy, the ieGENES outperformed multiple wrapper methods on 5 out of 6 datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements (p<0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENES technique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing with existing techniques. It offers a promising tool for identifying biologically relevant genes in microarray data analysis and biomarker discovery for cancer research.

关键词： Differentially expressed genes Cancer Gene detection microarray data Kernel machines Machine learning

来源：评论

学校读者我要写书评

暂无评论

Classification of breast cancer using microarray gene expression data: A survey

引用

JOURNAL OF BIOMEDICAL INFORMATICS 2021年 117卷 103764-103764页

作者： Abd-Elnaby, Muhammed Alfonse, Marco Roushdy, Mohamed Ain Shams Univ Fac Comp & Informat Sci Cairo Egypt Future Univ Fac Comp & Informat Technol New Cairo Egypt

Cancer, in particular breast cancer, is considered one of the most common causes of death worldwide according to the world health organization. For this reason, extensive research efforts have been done in the area of accurate and early diagnosis of cancer in order to increase the likelihood of cure. Among the available tools for diagnosing cancer, microarray technology has been proven to be effective. microarray technology analyzes the expression level of thousands of genes simultaneously. Although the huge number of features or genes in the microarray data may seem advantageous, many of these features are irrelevant or redundant resulting in the deterioration of classification accuracy. To overcome this challenge, feature selection techniques are a mandatory preprocessing step before the classification process. In the paper, the main feature selection and classification techniques introduced in the literature for cancer (particularly breast cancer) are reviewed to improve the microarray-based classification.

关键词： Feature selection Machine learning Cancer classification microarray data

来源：评论

学校读者我要写书评

暂无评论

Efficient Hybrid-Robust Approach for Cancer Biomarker Discovery Using Omics data

引用

IEEE ACCESS 2025年 13卷 51130-51149页

作者： Sid, Karima Zertal, Soumia Batouche, Mohamed Zerabi, Soumeya Oum El Bouaghi Univ Dept Math & Comp Sci Oum El Bouaghi 04000 Algeria Abdelhamid Mehri Constantine 2 Univ Lab Data Sci Comp & Artificial Intelligence LISIA Constantine 25016 Algeria Oum El Bouaghi Univ Dept Math & Comp Sci Lab Artificial Intelligence & Autonomous Things LI Oum El Bouaghi 04000 Algeria Princess Nourah bint Abdulrahman Univ Dept Informat Technol CCIS RC Riyadh 11671 Saudi Arabia Abdelhamid Mehri Constantine 2 Univ Dept Comp Sci & Its Applicat Lab Data Sci Comp & Artificial Intelligence LISIA Constantine 25016 Algeria

DNA microarray datasets, also known as "omics" data, are important for the diagnosis of numerous diseases, including cancer and tumors. In the analysis of these data, feature selection techniques and classification algorithms are the workhorse for choosing candidate genes that serve as cancer biomarkers. However, microarray datasets present a challenge;they contain a greater number of features than the samples, which affects the performance of algorithms used in the analysis process. In order to extract precise information, it is necessary to employ a method that is both robust and performant. This paper emphasizes the importance of accurate and stable gene selection for the discovery of knowledge derived from high-dimensional data. A novel hybrid framework was put forth for consideration, comprising three distinct stages: Clustering, Parallel Filtering, and Hybrid-Parallel Optimization. In each step, a combination of techniques and algorithms is used to improve the results in terms of stability and/or accuracy. The proposal is evaluated and tested according to different scenarios;using thirteen gene expression datasets and two classifiers: Artificial Neural Network (ANN) and Na & iuml;ve Bayes (NB). Comparison with related work demonstrates the efficacy of this approach, which enhances classification accuracy and stability while reducing the number of selected genes.

关键词： Feature extraction Stability criteria Classification algorithms Cancer Clustering algorithms Gene expression Accuracy Approximation algorithms Metaheuristics Training Biomarker discovery microarray data feature selection classification optimization algorithms Markov blanket approximation

来源：评论

学校读者我要写书评

暂无评论

A recursive framework for improving the performance of multi-objective differential evolution algorithms for gene selection

引用

SWARM AND EVOLUTIONARY COMPUTATION 2024年 87卷

作者： Li, Min Zhao, Yangfan Cao, Rutun Wang, Junke Wu, Depeng Nanchang Inst Technol Sch Informat Engn 289 Tianxiang Rd Nanchang Jiangxi Peoples R China

Gene selection is a pivotal process in machine-learning-driven medical diagnostics, where the goal is to identify a subset of genes from microarray expression profiles that can enhance the predictive accuracy of classifiers for disease diagnosis. The two key objectives of gene selection are to reduce the dimensionality of the data and to improve the accuracy of disease diagnosis, which is typically a multi-objective optimization problem. In recent years, multi-objective evolutionary algorithms (MOEAs) have gained wide attention in feature selection research, and several related algorithms have been produced. However, most algorithms tend to get stuck in local optimality when searching for solutions from a high-dimensional space. To solve the gene selection problem effectively, this study introduces a recursive multi-objective differential evolution algorithm with elite recursive strategy (RMODE-E) and a recursive multi-objective differential evolution algorithm with Pareto front recursive strategy (RMODE-P). RMODE-E amalgamates the features selected by the top E elite individuals, RMODE-P consolidates the features selected by the Pareto front set, and the combined features then serve as the foundation for subsequent recursive rounds of searching. The proposed feature subspace combination strategy not only reduces the recursive search space but also improves the capacity to find globally optimal feature subsets. Extensive experiments were conducted to compare our proposed algorithms with eight state-of-the-art evolutionary algorithms to validate their effectiveness. Experimental results demonstrate that RMODE-P has better global search capability as it achieves better best classification accuracy, mean classification accuracy, and minimal gene subset size.

关键词： Gene selection Feature selection microarray data Differential evolution algorithm Multi-objection optimization

来源：评论

学校读者我要写书评

暂无评论

Optimizing gene selection for Alzheimer's disease classification: A Bayesian approach to filter and embedded techniques

引用

APPLIED SOFT COMPUTING 2024年 167卷

作者： Guelib, Bouchra Bounab, Rayene Aliouane, Salah Eddine Hermessi, Haithem Khlifa, Nawres Zarour, Karim Univ Abdelhamid MEHRI Constantine 2 LIRE Lab Constantine Algeria Constantine 1 Frere Mentouri Univ Lab Microbiol Engn & Applicat Constantine Algeria Univ Tunis El Manar Higher Inst Comp Sci Lab Informat Modeling & Informat & Knowledge Proc Intelligent Syst Imaging & Artificial Vis SIIVA Ariana Tunisia Univ Tunis El Manar Higher Inst Med Technol Tunis Res Lab Biophys & Med Technol Tunis Tunisia

Alzheimer's disease (AD) classification, which is crucial for identifying AD-associated genes, relies heavily effective feature selection (FS) to tackle the curse of dimensionality. Traditional methods like filter, wrapper, and embedded techniques have their drawbacks, including ignoring feature independence, sensitivity classifier choices, and high computational costs. Hybrid approaches combining these methods seek to harness their collective strengths but face challenges, particularly in selecting the optimal number of features from each method. This selection is typically manual or requires time-intensive k-fold cross-validation (KFCV), significantly increasing computational demands and complicating the process with the need for extensive parameter optimization across families, thereby escalating the complexity and resource requirements of model development. To overcome these challenges, this work proposes a framework for optimal FS and classification AD using a combination of filter and embedded techniques, enhanced with hyperparameter tuning. Firstly, gene expression data (GED) from the AD Neuroimaging Initiative (ADNI) is preprocessed. Then, Chi-square filter selection is applied to decrease correlated features. Next, Logistic Regression with ElasticNet penalty (LREN) is employed to further refine the feature set. Finally, Bayesian Optimization (BO) is introduced to

关键词： Gene expression microarray data Filter selection Embedded selection Hyperparameter tuning Alzheimer's disease

来源：评论

学校读者我要写书评

暂无评论

Feature Selection of Gene Expression data Using a Modified Artificial Fish Swarm Algorithm With Population Variation

引用

IEEE ACCESS 2024年 12卷 72688-72706页

作者： Li, Zong-Zheng Wang, Fang-Ling Qin, Feng Yusoff, Yusliza Binti Zain, Azlan Mohd Univ Teknol Malaysia Fac Comp Skudai 81310 Johor Bahru Malaysia

microarray data is of great significance for cancer identification at the gene level. In the microarray dataset, only a small number of characteristic genomes have significant classification and identification rates for cancer. How to extract a small number of characteristic genes from a large number of microarray data is a classic NP-hard problem. This paper proposes a practical hybrid approach to implement the feature selection of gene expression from the microarray by combining the F-score algorithm and an improved artificial fish swarm algorithm with population variation (FSA-PV). Firstly, the F-score algorithm eliminates a large number of useless and redundant features in the dataset. Then, FSA-PV is discussed to obtain the ability to jump out of the local optimum while retaining the excellent feature of the subset as much as possible, and the adaptive step and visual are used to adjust the search space and to move the range of the algorithm in different environments to improve the local optimization and global optimization abilities. In addition, a naive Bayesian classifier is used to test the classification accuracy of subsets. Eight classical datasets are used to verify the performance of the proposed mechanism in the experiment part. The results reveal that the classification accuracy using the FSA-PV is significant superior to other algorithms in Breast dataset, and the classification accuracy is more than 90% in 8 cases. It further indicates the robustness and feasibility of the FSA-PV in the gene selection process.

关键词： Classification algorithms Feature extraction Visualization Support vector machines Clustering algorithms Wrapping Gene expression Fish schools Feature selection gene expression microarray data modified artificial fish swarm algorithm population variation

来源：评论

学校读者我要写书评

暂无评论

A novel grey wolf optimization algorithm based on geometric transformations for gene selection and cancer classification

引用

JOURNAL OF SUPERCOMPUTING 2024年第4期80卷 4808-4840页

作者： Dabba, Ali Tari, Abdelkamel Meftali, Samy Mohamed Boudiaf Univ Fac Math & Comp Sci Comp Sci Dept Msila Algeria Lab Informat & Its Applicat Msila LIAM Msila Algeria Abderrahmane Mira Univ Comp Sci Dept Fac Sci Bejaia Algeria Med Comp Lab LIMED Msila Algeria Univ Lille Lille France Res Ctr Comp Sci Signal & Automat Control Lille C Lille France

Cancer classification based on microarray data plays a very important role in cancer diagnosis and detection. Indeed, since microarray data contains a huge number of genes and a small number of samples, it is also nonlinear and noisy, which has led to the need to find a way to reduce the data dimensionality. In order to solve this problem, we need to find an effective way to help biologists and medical research scientists. This paper proposes a new bio-inspired algorithm for cancer classification in gene selection called Binary Grey Wolf Optimization Algorithm (BGWOA), which is based on hybridization between Minimum Redundancy-Maximum Relevance (MRMR) and a novel Binary Grey Wolf algorithm. The BGWOA is composed of two stages: The first stage consists of the MRMR pre-filter to obtain the set of relevant genes that reduces the dimensionality of the data sets. The second stage consists of a new Binary Grey Wolf algorithm based on direct similarity and centroid known in the geometric field to update the positions of grey wolves in order to exploit and explore the search spaces. As well, we used a fitness function that depends on the SVM with LOOCV classifier and the rate of unselected genes to evaluate the presented solutions. The primary goal of the last stage is to identify the best relevant subset of genes among those obtained in the first stage. This research used eight microarray datasets to evaluate and compare the proposed method with other existing algorithms. The experimental results produced in this research are able to provide a higher classification accuracy with fewer genes compared to many recently published algorithms. Specifically, the proposed method achieves 100% classification accuracy in five reference datasets with a number of genes ranging from 12 to 25. Therefore, this indicates that our research is promising and significant.

关键词： Genes selection Grey wolf optimization Geometric transformation Direct similarity Centroid microarray data Cancer classification Bio-inspired algorithms Molecular biology Minimum redundancy-maximum relevance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：