检索结果-内蒙古大学图书馆

Gene Selection in Multi-class Imbalanced microarray datasets Using Dynamic Length Particle Swarm Optimization

CURRENT BIOINFORMATICS 2021年第5期16卷 734-748页

作者： Priya, R. Devi Sivaraj, R. Kongu Engn Coll Dept Informat Technol Erode Tamil Nadu India Nandha Engn Coll Dept Comp Sci & Engn Erode Tamil Nadu India

Background: microarray gene expression datasets usually contain a large number of genes that complicate further operations like classification, clustering and other kinds of analysis. During the classification process, the identification of salient genes is a brainstorming task and needs a careful selection. Methods: The classification of multi-class datasets is more critical when compared with binary classification. When there are multiple class labels, chances are more likely that the datasets are imbalanced. Large variations can be seen in the number of samples belonging to each class, and hence the classification process may go biased with incorrect samples chosen for training. There is no sufficient research work available to address all these three scenarios together in microarray datasets. Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets using Retained Tomek Link algorithm and iii) Performs gene selection for multi-class classification using Dynamic Length Particle Swarm Optimization (DPSO). Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets, and the final classification performance is seen to be encouraging and better than other compared methods.

关键词： Feature weighing retained tomek link dynamic PSO apriori algorithm microarray datasets

来源：评论

学校读者我要写书评

暂无评论

CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets

引用

COMPUTATIONAL BIOLOGY AND CHEMISTRY 2018年 73卷 171-178页

作者： Ebrahimpour, Mohammad K. Nezamabadi-Pour, Hossein Eftekhari, Mandi Shahid Bahonar Univ Kerman Comp Engn Dept Kerman Iran Shahid Bahonar Univ Kerman Elect Engn Dept Kerman Iran

Recently, advances in bioinformatics lead to microarray high dimensional datasets. These kinds of datasets are still challenging for researchers in the area of machine learning since they suffer from small sample size and extremely large number of features. Therefore, feature selection is the problem of interest in the learning process in this area. In this paper, a novel feature selection method based on a global search (by using the main concepts of divide and conquer technique) which is called CCFS, is proposed. The proposed CCFS algorithm divides vertically (on features) the dataset by random manner and utilizes the fundamental concepts of cooperation coevolution by using a filter criterion in the fitness function in order to search the solution space via binary gravitational search algorithm. For determining the effectiveness of the proposed method some experiments are carried out on seven binary microarray high dimensional datasets. The obtained results are compared with nine state-of-the-art feature selection algorithms including Interact (INT), and Maximum Relevancy Minimum Redundancy (MRMR). The average outcomes of the results are analyzed by a statistical non-parametric test and it reveals that the proposed method has a meaningful difference to the others in terms of accuracy, sensitivity, specificity and number of selected features. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： Meta-heuristics Cooperating coevolving feature selection microarray datasets Divide and conquered algorithms

来源：评论

学校读者我要写书评

暂无评论

Markov Blanket: Efficient Strategy for Feature Subset Selection Method for High Dimensionality microarray Cancer datasets

Markov Blanket: Efficient Strategy for Feature Subset Select...

引用

作者： Abdala Nour Laurentian University

学位级别：硕士

Currently, feature subset selection methods are very important, especially in areas of application for which datasets with tens or hundreds of thousands of variables (genes) are available. Feature subset selection methods help us select a small number of variables out of thousands of genes in microarray datasets for a more accurate and balanced classification. Efficient gene selection can be considered as an easy computational hold of the subsequent classification task, and can give subset of gene set without the loss of classification performance. In classifying microarray data, the main objective of gene selection is to search for the genes while keeping the maximum amount of relevant information about the class and minimize classification errors. In this paper, explain the importance of feature subset selection methods in machine learning and data mining fields. Consequently, the analysis of microarray expression was used to check whether global biological differences underlie common pathological features in different types of cancer datasets and identify genes that might anticipate the clinical behavior of this disease. Using the feature subset selection model for gene expression contains large amounts of raw data that needs analyzing to obtain useful information for specific biological and medical applications. One way of finding relevant (and removing redundant ) genes is by using the Bayesian network based on the Markov blanket [1]. We present and compare the performance of the different approaches to feature (genes) subset selection methods based on Wrapper and Markov Blanket models for the five-microarray cancer datasets. The first way depends on the Memetic algorithms (MAs) used for the feature selection method. The second way uses MRMR (Minimum Redundant Maximum Relevant) for feature subset selection hybridized by genetic search optimization techniques and afterwards compares the Markov blanket model s performance with the most common classical classifica

关键词： microarray datasets feature selection methods genetic algorithms memetic algorithms overfitting problem fitness function crossover mutation Markov Blanket minimum redundancy-maximum relevant support vector machine naive Bayes k-nearest-neighbor ensemble classifier Bayesian networks

来源：评论

学校读者我要写书评

暂无评论

Fusing Decision Trees Based on Genetic Programming for Classification of microarray datasets

Fusing Decision Trees Based on Genetic Programming for Class...

引用

10th International Conference on Intelligent Computing (ICIC)

作者： Liu, KunHong Tong, MuChenxuan Xie, ShuTong Zeng, ZhiHao Xiamen Univ Software Sch Xiamen Fujian Peoples R China Xiamen Univ Dept Elect Engn Xiamen Fujian Peoples R China Jimei Univ Sch Comp Engn Xiamen Fujian Peoples R China

ISBN: (纸本)9783319093390;9783319093383

In this paper, a genetic programming(GP) based new ensemble system is proposed, named as GPES. Decision tree is used as base classifier, and fused by GP with three voting methods: min, max and average. In this way, each individual of GP acts as an ensemble system. When the evolution process of GP ends, the final ensemble committee is selected from the last generation by a forward search algorithm. GPES is evaluated on microarray datasets, and results show that this ensemble system is competitive compared with other ensemble systems.

关键词： microarray datasets genetic programming(GP) ensemble system decision tree

来源：评论

学校读者我要写书评

暂无评论

Optimizing medical data classification: integrating hybrid fuzzy joint mutual information with binary Cheetah optimizer algorithm

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2025年第4期28卷 1-32页

作者： Hegazy, Ah. E. Hafiz, B. Makhlouf, M. A. Salem, Omar A. M. Suez Canal Univ Fac Comp & Informat Ismailia Egypt Henan Univ Sch Comp & Informat Engn Kaifeng Peoples R China

Traditional classification algorithms struggle with the high dimensionality of medical data, resulting in reduced performance in tasks like disease diagnosis. Feature selection (FS) has emerged as a crucial preprocessing step to mitigate these challenges by extracting relevant features and improving classification accuracy. This paper proposes a hybrid FS method, FJMIBCOA, which integrates Fuzzy Joint Mutual Information (FJMI) as a filter measure and Binary Cheetah Optimizer Algorithm (BCOA) as a wrapper method. Unlike existing hybrid FS methods, the proposed method employs FJMI to address uncertainty in feature relationships, providing several advantages such as handling both discrete and continuous features, accommodating linear and non-linear relationships, noise robustness and effectively utilizing intra- and inter-class information. It also employs BCOA as a wrapper method, requiring a few parameters, minimizing computational overhead and enhancing classification robustness, making it an efficient and adaptable solution for FS in complex medical datasets. The proposed method is validated on 23 medical datasets and 14 high-dimensional microarray datasets, demonstrating excellent performance in terms of fitness value, accuracy and feature size. FJMIBCOA surpasses existing methods in medical datasets by achieving higher accuracy in 78.26% of datasets while reducing the feature size by 84.79%. Similarly, in microarray datasets, it improves accuracy in 78.58% of datasets with an impressive 95.08% reduction in feature size. Furthermore, FJMIBCOA achieves superior accuracy in 60% of datasets while selecting fewer features in 78.57% of datasets as compared to previous studies. Statistical testing indicates that FJMIBCOA outperforms other methods significantly. The proposed method enhances diagnosis accuracy and minimizes medical testing requirements, making it suitable for real-world, high-dimensional datasets and decision-making in medical data analysis. The findings

关键词： Classification Feature selection Cheetah optimizer algorithm Fuzzy joint mutual information Medical datasets microarray datasets

来源：评论

学校读者我要写书评

暂无评论

A hybrid intelligent optimization algorithm to select discriminative genes from large-scale medical data

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2024年第12期15卷 5921-5948页

作者： Wang, Tao Jia, LiYun Xu, JiaLing Gad, Ahmed G. Ren, Hai Salem, Ahmed Hebei Univ Architecture Informat Engn Coll Zhangjiakou 075000 Peoples R China Hebei Univ Architecture Dept Math & Phys Zhangjiakou 075000 Peoples R China Kafrelsheikh Univ Fac Comp & Informat Kafrelsheikh 33516 Egypt Arab Acad Sci Technol & Maritime Transport AASTMT Coll Comp & Informat Technol Cairo 2033 Egypt

Identifying disease-related genes is an ongoing study issue in biomedical analysis. Many research has recently presented various strategies for predicting disease-related genes. However, only a handful of them were capable of identifying or selecting relevant genes with a low computational burden. In order to tackle this issue, we introduce a new filter-wrapper-based gene selection (GS) method based on metaheuristic algorithms (MHAs) in conjunction with the k-nearest neighbors (k-NN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k{\hbox {-NN}}}$$\end{document}) classifier. Specifically, we hybridize two MHAs, bat algorithm (BA) and JAYA algorithm (JA), embedded with perturbation as a new perturbation-based exploration strategy (PES), to obtain JAYA-bat algorithm (JBA). The fact that JBA outperforms 10 state-of-the-art GS methods on 12 high-dimensional microarray datasets (ranging from 2000 to 22,283 features or genes) is impressive. It is also noteworthy that relevant genes are first selected via a filter-based method called mutual information (MI), and then further optimized by JBA to select the near-optimal genes in a timely fashion. Comparing the performance analysis of 11 well-known original MHAs, including BA and JA, the proposed JBA achieves significantly better results with improvement rates of 12.36%, 12.45%, 97.88%, 9.84%, 12.45%, and 12.17% in terms of fitness, accuracy, gene selection ratio, precision, recall, and F1-score, respectively. The results of Wilcoxon's signed-rank test at a significance level of alpha=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.05$$\end{document} furth

关键词： Bat algorithm Bioinformatics Biomedical analysis Feature selection Gene expression Gene selection High-dimensional data JAYA algorithm Metaheuristics microarray datasets Mutual information Swarm intelligence

来源：评论

学校读者我要写书评

暂无评论

Deep learning-based microarray cancer classification and ensemble gene selection approach

引用

IET SYSTEMS BIOLOGY 2022年第3-4期16卷 120-131页

作者： Rezaee, Khosro Jeon, Gwanggil Khosravi, Mohammad R. Attar, Hani H. Sabzevari, Alireza Meybod Univ Dept Biomed Engn Meybod Iran Incheon Natl Univ Coll Informat Technol Dept Embedded Syst Engn Incheon South Korea Persian Gulf Univ Dept Comp Engn Bushehr Iran Zarqa Univ Dept Energy Engn Zarqa Jordan

Malignancies and diseases of various genetic origins can be diagnosed and classified with microarray data. There are many obstacles to overcome due to the large size of the gene and the small number of samples in the microarray. A combination strategy for gene expression in a variety of diseases is described in this paper, consisting of two steps: identifying the most effective genes via soft ensembling and classifying them with a novel deep neural network. The feature selection approach combines three strategies to select wrapper genes and rank them according to the k-nearest neighbour algorithm, resulting in a very generalisable model with low error levels. Using soft ensembling, the most effective subsets of genes were identified from three microarray datasets of diffuse large cell lymphoma, leukaemia, and prostate cancer. A stacked deep neural network was used to classify all three datasets, achieving an average accuracy of 97.51%, 99.6%, and 96.34%, respectively. In addition, two previously unreported datasets from small, round blue cell tumors (SRBCTs)and multiple sclerosis-related brain tissue lesions were examined to show the generalisability of the model method.

关键词： soft ensembling lab-on-a-chip ensemble gene selection approach feature selection medical computing deep learning (artificial intelligence) k-nearest neighbour algorithm diseases genetic origins microarray data diffuse large cell lymphoma deep learning-based microarray cancer classification gene expression stacked deep neural network wrapper genes data analysis feature selection approach microarray datasets cancer pattern classification genetics effective genes prostate cancer brain tumours low error levels combination strategy

来源：评论

学校读者我要写书评

暂无评论

P3H4 and PLOD1 expression associates with poor prognosis in bladder cancer

引用

CLINICAL & TRANSLATIONAL ONCOLOGY 2022年第8期24卷 1524-1532页

作者： Zhang, Junjie Dong, Yang Shi, Zhenduo He, Houguang Chen, Jiangang Zhang, Shaoqi Wu, Wei Zhang, Qianjin Han, Conghui Hao, Lin Soochow Univ Med Coll Suzhou 215123 Jiangsu Peoples R China Xuzhou Cent Hosp Dept Urol 199 Jiefang South Rd Xuzhou 221009 Jiangsu Peoples R China

Purpose The prolyl 3-hydroxylase family member 4 gene (P3H4) is involved in the development of human cancers. The association of P3H4 with bladder cancer (BC) prognosis is unclear. This study aimed to analyze the association of P3H4 with BC prognosis. Methods RNA-Seq data were downloaded from The Cancer Genome Atlas project and BC microarray datasets (GSE13507, GSE31684, and GSE32548) were downloaded from the Gene Expression Omnibus database. We analyzed the differences in P3H4 expression levels between BC tumors and non-tumor tissues and between samples with different clinical information. The association of P3H4 and P3H4-related genes with BC prognosis and the possibility of using P3H4 expression as a prognostic biomarker in BC patients were also analyzed. RevMan was used to perform the meta-analysis. Results P3H4 was upregulated in BC tissues compared with the adjacent non-tumor tissues (p = 4.06e-08). Univariate Cox regression analysis and meta-analysis showed that high P3H4 expression level contributed to a poor BC prognosis (Hazard ratio, HR = 1.348, 95% CI 1.140-1.594, p = 4.89e-04;meta-analysis: HR = 1.45, 95% CI 1.10-1.91;p = 9.00e-03). Among the genes related to P3H4, the PLOD1 gene was closely associated with P3H4 expression (r = 0.620, p = 2.49e-44). Also, a meta-analysis showed that PLOD1 expression was associated with a poor prognosis in BC patients (HR = 1.77, 95% CI 1.31-2.38;p = 2.00e-04). Conclusions The P3H4 and PLOD1 genes might be used as reliable prognostic biomarkers for BC.

关键词： Bladder cancer Prognostic biomarker Meta-analysis Overall survival microarray datasets

来源：评论

学校读者我要写书评

暂无评论

The imbalance problem: A comparison of sampling approaches using different parameters and feature selection methods in the context of classification

引用

EXPERT SYSTEMS 2024年第8期41卷 e13591-e13591页

作者： Morillo-Salas, Jose L. Bolon-Canedo, Veronica Alonso-Betanzos, Amparo Univ A Coruna Res Ctr Informat & Commun Technol CIT La Coruna Spain Univ A Coruna Res Ctr Informat & Commun Technol CIT Campus Elvina La Coruna 15071 Spain

A common situation in classification tasks is to deal with unbalanced datasets, an issue that appears when the majority class(es) has a large number of samples compared to the minority class(es). This problem is even more significant when the datasets have a large number of features but only a few samples, as is the case with microarray datasets. Traditionally, an approach to alleviate this problem has been the application of sampling methods to obtain more balanced classes, increasing the number of samples in the minority class (replicating samples or generating new synthetic samples), or decreasing the number of samples in the majority class. In this study, we have compared different balancing methods, including a novel method that applies sampling in both the minority and majority classes. The interest in applying feature selection in combination with balancing methods has also been explored. In view of the results, a recommendation of sampling method, feature selection, and classifier is proposed to improve the classification results according to the type of dataset.

关键词： classification feature selection microarray datasets oversampling unbalanced datasets

来源：评论

学校读者我要写书评

暂无评论

Accurate analysis for univariate-based filter methods for microarray data classification

引用

JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY 2024年 18卷

作者： Rebbah, Fatima Ezzahra Chamlal, Hasna Ouaderhman, Tayeb Hassan II Univ Casablanca Fac Sci Ain Chock Dept Math & Comp Sci Fundamental & Appl Math Lab Casablanca Morocco Hassan II Univ Fac Sci Ain Chock Dept Math & Comp Sci Fundamental & Appl Math Lab Casablanca 20100 Morocco

microarray expression datasets generate a huge number of genes, but only a few genes provide information about cancer diseases. In this context, feature selection approaches have been developed to deal with this problem. Filter-based methods, in particular, select the relevant genes and remove the irrelevant ones using different evaluation metrics. In this study, we shed light on nine univariate filter methods. Three categories of filter methods were investigated using eight microarray datasets, including binary and multi-class samples. The support vector machine and Naive Bayes classifiers were used to assess classification accuracy. Different comparison methods were used to assist the researchers in visualizing the performance of each studied filter. Precisely, statistical tests were applied in terms of classification accuracy, and the feature ranking similarity of the filter methods was studied based on a rank correlation measure.

关键词： Feature selection univariate filter method classification microarray datasets

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：