A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally...
详细信息
A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, the row variance, and the size of the bicluster. Different strategies to escape local minima are introduced and compared. Experimental results on several microarray data sets show that the method is able to find significant biclusters, also from a biological point of view. (c) 2007 Elsevier Inc. All rights reserved.
Gene expression data are a key factor for the success of medical diagnosis, and two-stage classification methods are therefore developed for processing microarray data. The first stage for this kind of classification ...
详细信息
Gene expression data are a key factor for the success of medical diagnosis, and two-stage classification methods are therefore developed for processing microarray data. The first stage for this kind of classification methods is to select a pre-specified number of genes, which are likely to be the most relevant to the occurrence of a disease, and passes these genes to the second stage for classification. In this paper, we use four gene selection mechanisms and two classification tools to compose eight two-stage classification methods, and test these eight methods on eight microarray data sets for analyzing their performance. The first interesting finding is that the genes chosen by different categories of gene selection mechanisms are less than half in common but result in insignificantly different classification accuracies. A subset-gene-ranking mechanism can be beneficial in classification accuracy, but its computational effort is much heavier. Whether the classification tool employed at the second stage should be accompanied with a dimension reduction technique depends on the characteristics of a data set. (c) 2006 Elsevier Ltd. All rights reserved.
Missing values in microarray data can significantly affect subsequent analysis, thus it is important to estimate these missing values accurately. In this paper, a sequential local least squares imputation (SLLSimpute)...
详细信息
Missing values in microarray data can significantly affect subsequent analysis, thus it is important to estimate these missing values accurately. In this paper, a sequential local least squares imputation (SLLSimpute) method is proposed to solve this problem. It estimates missing values sequentially from the gene containing the fewest missing values and partially utilizes these estimated values. In addition, an automatic parameter selection algorithm, which can generate an appropriate number of neighboring genes for each target gene, is presented for parameter estimation. Experimental results confirmed that SLLSimpute method exhibited better estimation ability compared with other currently used imputation methods. (C) 2008 Elsevier Ltd. All rights reserved.
Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been update...
详细信息
Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://***.
In microarray data analysis, each gene expression sample has thousands of genes and reducing such high dimensionality is useful for both visualization and further clustering of samples. Traditional principal component...
详细信息
In microarray data analysis, each gene expression sample has thousands of genes and reducing such high dimensionality is useful for both visualization and further clustering of samples. Traditional principal component analysis (PCA) is a commonly used method which has problems. Nonnegative Matrix Factorization (NMF) is a new dimension reduction method. In this paper we compare NMF and PCA for dimension reduction. The reduced data is used for visualization, and clustering analysis via k-means on 11 real gene expression datasets. Before the clustering analysis, we apply NMF and PCA for reduction in visualization. The results on one leukemia dataset show that NMF can discover natural clusters and clearly detect one mislabeled sample while PCA cannot. For clustering analysis via k-means, NMF most typically outperforms PCA. Our results demonstrate the superiority of NMF over PCA in reducing microarray data. (C) 2007 Elsevier Inc. All rights reserved.
Feature selection has been used widely for a variety of data, yielding higher speeds and reduced computational cost for the classification process. However, it is in microarray datasets where its advantages become mor...
详细信息
ISBN:
(纸本)9781424417391
Feature selection has been used widely for a variety of data, yielding higher speeds and reduced computational cost for the classification process. However, it is in microarray datasets where its advantages become more evident and are more required. In this paper we present a novel approach to accomplish this based on the concept of discernibility that we introduce to depict how separated the classes of a dataset are. We develop and test two independent feature selection methods that follow this approach. The results of oar experiments on four microarray datasets show that discernibility-based feature selection reduces the dimensionality of the datasets involved without compromising the performance of the classifiers.
For cancer prediction using large-scale gene expression data, it often helps to incorporate gene interactions in the model. However it is not straightforward to simultaneously select important genes while modeling gen...
详细信息
For cancer prediction using large-scale gene expression data, it often helps to incorporate gene interactions in the model. However it is not straightforward to simultaneously select important genes while modeling gene interactions. Some heuristic approaches have been proposed in the literature. In this paper, we study a unified modeling approach based on the l(1) penalized likelihood estimation that can simultaneously select important genes and model gene interactions. We will illustrate its competitive performance through simulation studies and applications to public microarray data. (c) 2012 Elsevier Ltd. All rights reserved.
Several biclustering algorithms have been proposed in different fields of microarray data analysis. We present a new approach that improves their performance in using the ensemble methods. An ensemble biclustering is ...
详细信息
Several biclustering algorithms have been proposed in different fields of microarray data analysis. We present a new approach that improves their performance in using the ensemble methods. An ensemble biclustering is considered and formalized by a problem of binary triclustering. We propose a simple and efficient algorithm to solve it. To illustrate the interest of our ensemble approach, numerical experiments are performed on both artificial and real datasets with two biclustering algorithms commonly used in bioinformatics. (C) 2012 Elsevier Ltd. All rights reserved.
In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data tech...
详细信息
ISBN:
(纸本)9783540785675
In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes high complexity for cluster analysis. This challenge has raised the demand for feature selection - an effective dimensionality reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than that of the pure K-means.
Lymph node metastasis is an important prognostic factor in oral squamous cell carcinoma. However, the lack of significant biomarkers for lymph node metastasis can cause patients to be inappropriately treated and produ...
详细信息
Lymph node metastasis is an important prognostic factor in oral squamous cell carcinoma. However, the lack of significant biomarkers for lymph node metastasis can cause patients to be inappropriately treated and produce a poor prognosis. Therefore, there is a need to identify gene sets that are associated with lymph node metastasis. In this study, we used three expression datasets obtained from a public database and selected candidate gene sets that were related with lymph node metastasis from two datasets and a combined dataset. We evaluated the selected gene set using OOB error rates in a validation dataset. The gene set detected from the combined dataset classified the lymph node status more accurately in the validation dataset and clear expression patterns classifying the lymph node status based on chromosomal location were observed. The combined dataset holds promise for use as a more accurate candidate gene set for the diagnosis of lymph node metastasis and the selected gene set could be used for biological validation in further studies. (C) 2011 Elsevier Ltd. All rights reserved.
暂无评论