检索结果-内蒙古大学图书馆

Using Pre-existing microarray datasets to Increase Experimental Power: Application to Insulin Resistance

PLOS COMPUTATIONAL BIOLOGY 2010年第3期6卷 e1000718-e1000718页

作者： Daigle, Bernie J., Jr. Deng, Alicia McLaughlin, Tracey Cushman, Samuel W. Cam, Margaret C. Reaven, Gerald Tsao, Philip S. Altman, Russ B. Stanford Univ Dept Genet Sch Med Stanford CA 94305 USA Stanford Univ Div Cardiovasc Med Sch Med Stanford CA 94305 USA Stanford Univ Div Endocrinol Sch Med Stanford CA 94305 USA NIDDK NIH Bethesda MD USA Stanford Univ Dept Bioengn Sch Med Stanford CA 94305 USA

Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources;thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available s

关键词： .microarrays Replicates Medicine Stanford Pre-existing DE genes microarray datasets Reaven SAGAT

来源：评论

学校读者我要写书评

暂无评论

Gene Selection in Multi-class Imbalanced microarray datasets Using Dynamic Length Particle Swarm Optimization

引用

CURRENT BIOINFORMATICS 2021年第5期16卷 734-748页

作者： Priya, R. Devi Sivaraj, R. Kongu Engn Coll Dept Informat Technol Erode Tamil Nadu India Nandha Engn Coll Dept Comp Sci & Engn Erode Tamil Nadu India

Background: microarray gene expression datasets usually contain a large number of genes that complicate further operations like classification, clustering and other kinds of analysis. During the classification process, the identification of salient genes is a brainstorming task and needs a careful selection. Methods: The classification of multi-class datasets is more critical when compared with binary classification. When there are multiple class labels, chances are more likely that the datasets are imbalanced. Large variations can be seen in the number of samples belonging to each class, and hence the classification process may go biased with incorrect samples chosen for training. There is no sufficient research work available to address all these three scenarios together in microarray datasets. Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets using Retained Tomek Link algorithm and iii) Performs gene selection for multi-class classification using Dynamic Length Particle Swarm Optimization (DPSO). Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets, and the final classification performance is seen to be encouraging and better than other compared methods.

关键词： Feature weighing retained tomek link dynamic PSO apriori algorithm microarray datasets

来源：评论

学校读者我要写书评

暂无评论

CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets

引用

COMPUTATIONAL BIOLOGY AND CHEMISTRY 2018年 73卷 171-178页

作者： Ebrahimpour, Mohammad K. Nezamabadi-Pour, Hossein Eftekhari, Mandi Shahid Bahonar Univ Kerman Comp Engn Dept Kerman Iran Shahid Bahonar Univ Kerman Elect Engn Dept Kerman Iran

Recently, advances in bioinformatics lead to microarray high dimensional datasets. These kinds of datasets are still challenging for researchers in the area of machine learning since they suffer from small sample size and extremely large number of features. Therefore, feature selection is the problem of interest in the learning process in this area. In this paper, a novel feature selection method based on a global search (by using the main concepts of divide and conquer technique) which is called CCFS, is proposed. The proposed CCFS algorithm divides vertically (on features) the dataset by random manner and utilizes the fundamental concepts of cooperation coevolution by using a filter criterion in the fitness function in order to search the solution space via binary gravitational search algorithm. For determining the effectiveness of the proposed method some experiments are carried out on seven binary microarray high dimensional datasets. The obtained results are compared with nine state-of-the-art feature selection algorithms including Interact (INT), and Maximum Relevancy Minimum Redundancy (MRMR). The average outcomes of the results are analyzed by a statistical non-parametric test and it reveals that the proposed method has a meaningful difference to the others in terms of accuracy, sensitivity, specificity and number of selected features. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： Meta-heuristics Cooperating coevolving feature selection microarray datasets Divide and conquered algorithms

来源：评论

学校读者我要写书评

暂无评论

A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets

引用

NUCLEIC ACIDS RESEARCH 2008年第9期36卷 2926-2938页

作者： Wei, Hairong Kuan, Pei Fen Tian, Shulan Yang, Chuhu Nie, Jeff Sengupta, Srikumar Ruotti, Victor Jonsdottir, Gudrun A. Keles, Sunduz Thomson, James A. Stewart, Ron WiCell Res Inst Madison WI 53707 USA Univ Wisconsin Med Sci Ctr Dept Stat Madison WI 53706 USA Univ Wisconsin Genome Ctr Wisconsin Madison WI 53706 USA Univ Wisconsin Wisconsin Natl Primate Res Ctr Madison WI 53715 USA Univ Wisconsin Sch Med & Publ Hlth Dept Anat Madison WI 53706 USA Univ Wisconsin Dept Biostat & Med Informat Madison WI 53706 USA

Well-defined relationships between oligonucleotide properties and hybridization signal intensities (HSI) can aid chip design, data normalization and true biological knowledge discovery. We clarify these relationships using the data from two microarray experiments containing over three million probes from 48 high-density chips. We find that melting temperature (T-m) has the most significant effect on HSI while length for the long oligonucleotides studied has very little effect. Analysis of positional effect using a linear model provides evidence that the protruding ends of probes contribute more than tethered ends to HSI, which is further validated by specifically designed match fragment sliding and extension experiments. The impact of sequence similarity (SeqS) on HSI is not significant in comparison with other oligonucleotide properties. Using regression and regression tree analysis, we prioritize these oligonucleotide properties based on their effects on HSI. The implications of our discoveries for the design of unbiased oligonucleotides are discussed. We propose that isothermal probes designed by varying the length is a viable strategy to reduce sequence bias, though imposing selection constraints on other oligonucleotide properties is also essential.

关键词： .melting intensities regression microarray datasets normalization high-density protruding hybridization signal long oligonucleotides SeqS

来源：评论

学校读者我要写书评

暂无评论

Fusing Decision Trees Based on Genetic Programming for Classification of microarray datasets

Fusing Decision Trees Based on Genetic Programming for Class...

引用

10th International Conference on Intelligent Computing (ICIC)

作者： Liu, KunHong Tong, MuChenxuan Xie, ShuTong Zeng, ZhiHao Xiamen Univ Software Sch Xiamen Fujian Peoples R China Xiamen Univ Dept Elect Engn Xiamen Fujian Peoples R China Jimei Univ Sch Comp Engn Xiamen Fujian Peoples R China

ISBN: (纸本)9783319093390;9783319093383

In this paper, a genetic programming(GP) based new ensemble system is proposed, named as GPES. Decision tree is used as base classifier, and fused by GP with three voting methods: min, max and average. In this way, each individual of GP acts as an ensemble system. When the evolution process of GP ends, the final ensemble committee is selected from the last generation by a forward search algorithm. GPES is evaluated on microarray datasets, and results show that this ensemble system is competitive compared with other ensemble systems.

关键词： microarray datasets genetic programming(GP) ensemble system decision tree

来源：评论

学校读者我要写书评

暂无评论

Top-down Mining of Top-k Frequent Closed Patterns from microarray datasets

Top-down Mining of Top-k Frequent Closed Patterns from Micro...

引用

2011 International Conference on Intelligent Computing and Integrated Systems(ICISS 2011)

作者： HaiPing HUANG 1,2 YuQing MIAO 2 JianJun SHI 2 1 Department of Teaching and Practice, Guilin University of Electronic Technology 2 School of Computer Science and Engineering, Guilin University of Electronic Technology Guilin, China

Mining frequent closed patterns from microarray datasets has attracted more attention. However, most previous studies needed users to specify a minimum support threshold. In practice, it is not easy for users to set an appropriate minimum support threshold and discover the interesting patterns from huge frequent closed patterns. In this paper, we proposed an alternative mining task that mines top-k frequent closed patterns of length no less than min from microarray datasets, where k is the desired number of frequent closed patterns to be mined. An efficient algorithm TBtop is developed adopting top-down breadth-first search strategy. Our performance study showed that the strategy was effective in pruning search space. And in most cases, the algorithm TBtop outperformed the algorithm CARPENTER.

关键词： data mining frequent closed top-k microarray datasets top-down

来源：评论

学校读者我要写书评

暂无评论

Markov Blanket: Efficient Strategy for Feature Subset Selection Method for High Dimensionality microarray Cancer datasets

Markov Blanket: Efficient Strategy for Feature Subset Select...

引用

作者： Abdala Nour Laurentian University

学位级别：硕士

Currently, feature subset selection methods are very important, especially in areas of application for which datasets with tens or hundreds of thousands of variables (genes) are available. Feature subset selection methods help us select a small number of variables out of thousands of genes in microarray datasets for a more accurate and balanced classification. Efficient gene selection can be considered as an easy computational hold of the subsequent classification task, and can give subset of gene set without the loss of classification performance. In classifying microarray data, the main objective of gene selection is to search for the genes while keeping the maximum amount of relevant information about the class and minimize classification errors. In this paper, explain the importance of feature subset selection methods in machine learning and data mining fields. Consequently, the analysis of microarray expression was used to check whether global biological differences underlie common pathological features in different types of cancer datasets and identify genes that might anticipate the clinical behavior of this disease. Using the feature subset selection model for gene expression contains large amounts of raw data that needs analyzing to obtain useful information for specific biological and medical applications. One way of finding relevant (and removing redundant ) genes is by using the Bayesian network based on the Markov blanket [1]. We present and compare the performance of the different approaches to feature (genes) subset selection methods based on Wrapper and Markov Blanket models for the five-microarray cancer datasets. The first way depends on the Memetic algorithms (MAs) used for the feature selection method. The second way uses MRMR (Minimum Redundant Maximum Relevant) for feature subset selection hybridized by genetic search optimization techniques and afterwards compares the Markov blanket model s performance with the most common classical classifica

关键词： microarray datasets feature selection methods genetic algorithms memetic algorithms overfitting problem fitness function crossover mutation Markov Blanket minimum redundancy-maximum relevant support vector machine naive Bayes k-nearest-neighbor ensemble classifier Bayesian networks

来源：评论

学校读者我要写书评

暂无评论

Optimizing medical data classification: integrating hybrid fuzzy joint mutual information with binary Cheetah optimizer algorithm

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2025年第4期28卷 1-32页

作者： Hegazy, Ah. E. Hafiz, B. Makhlouf, M. A. Salem, Omar A. M. Suez Canal Univ Fac Comp & Informat Ismailia Egypt Henan Univ Sch Comp & Informat Engn Kaifeng Peoples R China

Traditional classification algorithms struggle with the high dimensionality of medical data, resulting in reduced performance in tasks like disease diagnosis. Feature selection (FS) has emerged as a crucial preprocessing step to mitigate these challenges by extracting relevant features and improving classification accuracy. This paper proposes a hybrid FS method, FJMIBCOA, which integrates Fuzzy Joint Mutual Information (FJMI) as a filter measure and Binary Cheetah Optimizer Algorithm (BCOA) as a wrapper method. Unlike existing hybrid FS methods, the proposed method employs FJMI to address uncertainty in feature relationships, providing several advantages such as handling both discrete and continuous features, accommodating linear and non-linear relationships, noise robustness and effectively utilizing intra- and inter-class information. It also employs BCOA as a wrapper method, requiring a few parameters, minimizing computational overhead and enhancing classification robustness, making it an efficient and adaptable solution for FS in complex medical datasets. The proposed method is validated on 23 medical datasets and 14 high-dimensional microarray datasets, demonstrating excellent performance in terms of fitness value, accuracy and feature size. FJMIBCOA surpasses existing methods in medical datasets by achieving higher accuracy in 78.26% of datasets while reducing the feature size by 84.79%. Similarly, in microarray datasets, it improves accuracy in 78.58% of datasets with an impressive 95.08% reduction in feature size. Furthermore, FJMIBCOA achieves superior accuracy in 60% of datasets while selecting fewer features in 78.57% of datasets as compared to previous studies. Statistical testing indicates that FJMIBCOA outperforms other methods significantly. The proposed method enhances diagnosis accuracy and minimizes medical testing requirements, making it suitable for real-world, high-dimensional datasets and decision-making in medical data analysis. The findings

关键词： Classification Feature selection Cheetah optimizer algorithm Fuzzy joint mutual information Medical datasets microarray datasets

来源：评论

学校读者我要写书评

暂无评论

Ensemble of feature selection methods: A hesitant fuzzy sets approach

引用

APPLIED SOFT COMPUTING 2017年 50卷 300-312页

作者： Ebrahimpour, Mohammad Kazem Eftekhari, Mahdi Shahid Bahonar Univ Kerman Dept Comp Engn Kerman Iran

Recently, there has been a great attention to develop feature selection methods on the microarray high dimensional datasets. In this paper, an innovative method based on Maximum Relevancy and Minimum Redundancy (MRMR) approach by using Hesitant Fuzzy Sets (HFSs) is proposed to deal with feature subset selection;the method is called MRMR-HFS. MRMR-HFS is a novel filter-based feature selection algorithm that selects features by ensemble of ranking algorithms (as the measure of feature-class relevancy that must be maximized) and similarity measures (as the measure of feature-feature redundancy that must be minimized). The combination of ranking algorithms and similarity measures are done by using the fundamental concepts of information energies of HFSs. The proposed method has been inspired from Correlation based Feature Selection (CFS) within the sequential forward search in order to present a robust feature selection tool to solve high dimensional problems. To evaluate the effectiveness of the MRMR-HFS, several experimental results are carried out on nine well-known microarray high dimensional datasets. The obtained results are compared with those of other similar state-of-the-art algorithms including Correlation-based Feature Selection (CFS), Fast Correlation-based Filter (FCBF), Intract (INT), and Maximum Relevancy Minimum Redundancy (MRMR). The outcomes of comparison carried out via some non-parametric statistical tests confirm that the MRMR-HFS is effective for feature subset selection in high dimensional datasets in terms of accuracy, sensitivity, specificity, G-mean, and number of selected features. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Hesitant fuzzy sets Feature selection High dimensional datasets Big data Imbalanced datasets microarray datasets

来源：评论

学校读者我要写书评

暂无评论

A hybrid intelligent optimization algorithm to select discriminative genes from large-scale medical data

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2024年第12期15卷 5921-5948页

作者： Wang, Tao Jia, LiYun Xu, JiaLing Gad, Ahmed G. Ren, Hai Salem, Ahmed Hebei Univ Architecture Informat Engn Coll Zhangjiakou 075000 Peoples R China Hebei Univ Architecture Dept Math & Phys Zhangjiakou 075000 Peoples R China Kafrelsheikh Univ Fac Comp & Informat Kafrelsheikh 33516 Egypt Arab Acad Sci Technol & Maritime Transport AASTMT Coll Comp & Informat Technol Cairo 2033 Egypt

Identifying disease-related genes is an ongoing study issue in biomedical analysis. Many research has recently presented various strategies for predicting disease-related genes. However, only a handful of them were capable of identifying or selecting relevant genes with a low computational burden. In order to tackle this issue, we introduce a new filter-wrapper-based gene selection (GS) method based on metaheuristic algorithms (MHAs) in conjunction with the k-nearest neighbors (k-NN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${k{\hbox {-NN}}}$$\end{document}) classifier. Specifically, we hybridize two MHAs, bat algorithm (BA) and JAYA algorithm (JA), embedded with perturbation as a new perturbation-based exploration strategy (PES), to obtain JAYA-bat algorithm (JBA). The fact that JBA outperforms 10 state-of-the-art GS methods on 12 high-dimensional microarray datasets (ranging from 2000 to 22,283 features or genes) is impressive. It is also noteworthy that relevant genes are first selected via a filter-based method called mutual information (MI), and then further optimized by JBA to select the near-optimal genes in a timely fashion. Comparing the performance analysis of 11 well-known original MHAs, including BA and JA, the proposed JBA achieves significantly better results with improvement rates of 12.36%, 12.45%, 97.88%, 9.84%, 12.45%, and 12.17% in terms of fitness, accuracy, gene selection ratio, precision, recall, and F1-score, respectively. The results of Wilcoxon's signed-rank test at a significance level of alpha=0.05\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.05$$\end{document} furth

关键词： Bat algorithm Bioinformatics Biomedical analysis Feature selection Gene expression Gene selection High-dimensional data JAYA algorithm Metaheuristics microarray datasets Mutual information Swarm intelligence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：