microarray data are primarily used by gene regulatory network inference algorithms. We used dynamic artificial gene regulatory networks to evaluate the adequacy of time course microarray data to support the inference ...
详细信息
ISBN:
(纸本)9783319327037;9783319327013
microarray data are primarily used by gene regulatory network inference algorithms. We used dynamic artificial gene regulatory networks to evaluate the adequacy of time course microarray data to support the inference process. We evaluated the effect of different ways that genes can be triggered on the performance of an inference algorithm. We evaluated the effect of sparseness of a network on the inference performance. Finally we evaluated the effect of noise in microarray data on the inference process.
Feature selection is often required as a preliminary step for many pattern recognition problems. However, most of the existing algorithms only work in a centralized fashion, i.e. using the whole dataset at once. In th...
详细信息
Feature selection is often required as a preliminary step for many pattern recognition problems. However, most of the existing algorithms only work in a centralized fashion, i.e. using the whole dataset at once. In this research a new method for distributing the feature selection process is proposed. It distributes the data by features, i.e. according to a vertical distribution, and then performs a merging procedure which updates the feature subset according to improvements in the classification accuracy. The effectiveness of our proposal is tested on microarray data, which has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. The results on eight microarray datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets. (C) 2015 Elsevier B.V. All rights reserved.
This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five indivi...
详细信息
This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray datadatasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice. (C) 2015 Elsevier B.V. All rights reserved.
In this research an efficient gene selection method called Discriminant Mutual Information (DMI) algorithm is proposed. The DMI algorithm sequentially induces discrimination and relevance to identify the most signific...
详细信息
In this research an efficient gene selection method called Discriminant Mutual Information (DMI) algorithm is proposed. The DMI algorithm sequentially induces discrimination and relevance to identify the most significant genes for tumor classification. In particular, in the first step the entire gene population is decorrelated by the formation of gene-sets such that the genes with similar characteristics form a single gene-set. The mutual information criterion is further employed to identify the most representative gene of each gene-set. Extensive experiments have been conducted on six publicly available databases where the proposed DMI algorithm has shown good results compared to a number of state-of-the-art approaches. Extensive computational analysis clearly reflects the computational efficiency of the proposed approach, typically it requires only a few seconds for experimentation on standard microarray datasets.
Development on microarray technology may lead to opportunities in bioinformatics and makes it possible to diagnose cancer on the level of gene expression. Many adverse factors, such as small number of samples with hig...
详细信息
Development on microarray technology may lead to opportunities in bioinformatics and makes it possible to diagnose cancer on the level of gene expression. Many adverse factors, such as small number of samples with high-dimensional characteristics and data class imbalances, pose challenges to traditional machine learning methods. Numerous researchers had worked on these problems and obtained significant achievements. This paper describes the data sets used in study, summarizes the approaches for cancer diagnosis based on microarray data, and provides outlook on future research direction.
microarray data has small samples and high dimension, and it contains a significant amount of irrelevant and redundant genes. This paper proposes a hybrid ensemble method based on double disturbance to improve classif...
详细信息
microarray data has small samples and high dimension, and it contains a significant amount of irrelevant and redundant genes. This paper proposes a hybrid ensemble method based on double disturbance to improve classification performance. Firstly, original genes are ranked through reliefF algorithm and part of the genes are selected from the original genes set, and then a new training set is generated from the original training set according to the previously selected genes. Secondly, D bootstrap training subsets are produced from the previously generated training set by bootstrap technology. Thirdly, an attribute reduction method based on neighborhood mutual information with a different radius is used to reduce genes on each bootstrap training subset to produce new training subsets. Each new training subset is applied to train a base classifier. Finally, a part of the base classifiers are selected based on the teaching-learning-based optimization to build an ensemble by weighted voting. Experimental results on six benchmark cancer microarray datasets showed proposed method decreased ensemble size and obtained higher classification performance compared with Bagging, AdaBoost, and Random Forest.
Classification of microarray data with high dimension and small sample size is a complex task. This work explores the optimal search space appropriate for classification. Here the crush of dimensionality is handled wi...
详细信息
ISBN:
(纸本)9788132222088;9788132222071
Classification of microarray data with high dimension and small sample size is a complex task. This work explores the optimal search space appropriate for classification. Here the crush of dimensionality is handled with a three stages dimension reduction technique. At the first stage, statistical measures are used to remove genes that do not contribute for classification. In the second stage, more noisy genes are removed by considering signal to noise ratio (SNR). In the third stage, principal component analysis (PCA) method is used to further reduce the dimension. Further, how much to reduce at each stage is crucial to develop an efficient classifier. Combination of different proportion of reduction at each stage is considered in this study to find appropriate combination for each dataset which maximizes the classifier performance. Help of naive Bayes classifier is taken here to find appropriate combination of reduction.
A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy sets, which is useful for discovering overlapping biclusters. Fuzzy clustering is well known as a robust and efficient w...
详细信息
ISBN:
(纸本)9781467375818
A attractive way to perform biclustering of genes and conditions is to adopt the notion of fuzzy sets, which is useful for discovering overlapping biclusters. Fuzzy clustering is well known as a robust and efficient way to reduce computation cost to obtain the better results. However, this approach is not explored very well. In this paper, we propose a new algorithm called, RefineBicluster for biclustering of microarray data using the fuzzy approach. This algorithm adopts the strategy of one bicluster at a time, assigning to each data matrix element, i.e. each gene and for each condition, a membership to bicluster. The biclustering problem, in where one would maximize the size of the bicluster and minimize the residual, is faced as the optimization of a proper functional. Applied on continuous synthetic datasets, our algorithm outperforms other biclustering algorithms for microarray data.
Feature selection from microarray data has become an ever evolving area of research. Numerous techniques have widely been applied for extraction of genes which are expressed differentially in microarray data. Some of ...
详细信息
ISBN:
(纸本)9781479944453
Feature selection from microarray data has become an ever evolving area of research. Numerous techniques have widely been applied for extraction of genes which are expressed differentially in microarray data. Some of these comprise of studies related to fold-change approach, classical t-statistics and modified t-statistics. It has been found that the gene lists returned by these methods are dissimilar. In this work we compare the outputs of two different feature selection methods using three classifiers based on different algorithms namely the Random Forest Ensemble based method, the Support vector machine (SVM) and the KNN methods, using the prediction accuracy of the test datasets.
The appearance of the microarray technology has attracted the scientific community and industry;with its ability of measuring simultaneously the activity and interactions of thousands of genes. This advanced technolog...
详细信息
ISBN:
(纸本)9781467396691
The appearance of the microarray technology has attracted the scientific community and industry;with its ability of measuring simultaneously the activity and interactions of thousands of genes. This advanced technology was applied for enormous issues such as drug discovery, gene discovery, diagnosis and prognosis of disease and toxicological research. Despite the fact that microarray applications have known birth in many biological studies, the handling and analysis of the data obtained are not trivial tasks. For these reasons, it has been focused on the present paper on the PCA classification technique and Neural Network for microarray data;in the object of reducing the large data and producing informative results. The methodology proposes an approach based on MLP neural network to resolve the problem of lung cancer classification based on microarray data. The approach consists on data reduction by using the PCA Technique, followed by a classification based on MLP network, feed-forward neural network known by its stable learning. The effectiveness of the implemented method was evaluated by measuring the correct classification rate performed on lung cancer gene expression dataset and compared to results obtained by other methods that use the same data.
暂无评论